Pros and Cons with Power Query cataloging
Introduction
The first article of this series shows how to catalog files and folders using Power Query.
Now that you know the mechanism as to how to do it, this article discusses whether you should do so, looking at the advantages and disadvantages.
Advantages of using Power Query
Firstly, Power Query is free. You can download it from the Microsoft website and install it within MS Excel, and create a catalog without needing an external program.
It integrates within Excel by adding an extra toolbar, which allows additional functionality as well, allowing you database functionality within a spreadsheet.
It is produced by one of the world’s greatest companies, and has been available (and is still being developed) for several years.
There have been ways to catalog files and folders before, but it is a step-up from using the following MS-DOS command:
CMD *.* /s /b >>c:\b\cat.txt
Disadvantages – Software Requirements
The first major disadvantage is the fact that merely having Excel on your computer is not enough. You have to have a very specific type of Excel on your computer. Prior to installing Power Query, I was unaware that there were different versions of Excel. However, I was unable to use it in my initial installation. Power Query Premium requires the following:
- Windows 7 or greater (or Service 2008 R2 or greater),
- Microsoft Office 2010 Professional Plus with Software Assurance, or Microsoft Office 2013 Professional Plus, Office 365 ProPlus or Excel 2013 Standalone, and
- Internet Explorer 9 (so it cannot be installed on a clean install of Windows 7).
These are very specific versions of Microsoft Office or Microsoft Excel. Most people will not have these versions, including most who sign up for Office 365. This is because the standard Office 365 installation is likely to be the Home or Personal or Home & Student version, not the ProPlus, which costs around twice as much.
Thankfully, it looks like Microsoft is lifting these requirements in Office 2016, but for now, it’s tough to fulfil the above.
Ease of use
Believe it or not, according to Microsoft, “Microsoft Power Query for Excel provides an intuitive user interface for data discovery, data transformation and enrichment”.
The first article in this series shows how to perform this catalog. I do not believe that it is “intuitive” at all. I was unable to find any relevant Google sites which details this process.
Available columns
The columns which are exported into Excel are shown in this query below:
As can be seen in the above, the available columns are:
- Content. This is a Binary representation of the file
- Name. The File Name with the extension.
- Extension. The extension by itself.
- Date accessed, date modified and date created. From Windows Explorer. As discussed in the article “Why Windows Dates are not reliable”, the value of this data is questionable.
- Attributes – this again is not exported well into Excel, and
- Folder Path. The full folder path.
And that’s it.
The attributes column looks interesting, but when you click on it, the sort of information that is available is as follows:
- Content Type – this is related to the file extension.
- Kind – ditto.
- Size – got in a separate column.
- ReadOnly, Hidden, System Directory and Archive. These collectively are known as attributes, and in Filecats Standard I catalog them in a single column.
- Device – I assume this will always be False for a file, as a file is not a device.
- Normal,
- Temporary,
- SparseFile,
- ReparsePoint,
- Compressed,
- Offline,
- NotContentIndexed and
- Encrypted.
While some of the above might be relevant, these are not the most interesting attributes or “document properties” that most people would want to use. I’ll talk more about other metadata which should be exportable separately in my third article.
Reliability
Power Query doesn’t always export the data from the Query Window to Excel. In only my second catalog it, said:
“There isn’t enough memory to complete this action.
Try using less data or closing other applications.
To increase memory availability, consider:
– Using a 64-bit version of Microsoft Excel.
– Adding memory to your device.”
To put this in context, I was cataloging about 600 files on a computer with 4 Gb of RAM with nothing else open apart from Excel and Power View.
🙁
Presentation
Your catalog is exported as a table. There is no attempt at presentation – it is not even in a table – and things such as “Content” and “Attributes”, which can be investigated in the Query Editor above, lose such properties and just become unexpandable words – i.e. the Content column says “Binary” for every type of file, whether it be a text file or an Excel spreadsheet.
There’s no attempt at prettying this table, such as adding colours or anything or note.
How can it be improved?
The next article in this series contrasts MS Power Query with Filecats Standard, and shows what more can be done with making cataloging easy.