Quickly summarise your new data by file extension
This is the second of a series of articles which concern how best to investigate tens of thousands of files that you are seeing for the first time. The previous article addressed when that might occur, and how to examine and export the folder structure so you can start to understand it.
This article addresses how to look at the different file types in order to see sequences of similar documents. But how can you best do this?
Why would you want to look for strings of similar documents?
As described in the previous article about folder hierarchy, different people may save the same or similar data in different places. It may be that you have Minutes of Meetings in several different folders. It is also possible that none of those folders contain a complete record, as people may have been on holiday when a particular meeting took place. However, when all these sources are put together, you may get a complete picture.
There may be a similar situation for photographs, spreadsheets, or email messages, i.e. a combination of separate folders may be needed to get the entirety of the relevant data.
Generally the best source of information to manipulate is that version in its native format, e.g. Microsoft Excel, whereas if the same information exists in pdf format, it may be harder to edit but easier to view in its final form.
Whatever the type of data you need to find, it is important that your searches lead to complete results. This is especially true if you don’t really know exactly what data you have been given, and so are having to do almost a forensic investigation – i.e. you find that you have one important file, and deduce that you probably have additional similar files. Or you may want to do an overview, e.g. to see all the PowerPoint presentations in one search.
Disadvantages of searching using Windows Explorer
It is possible to search for a particular file extension in Windows Explorer. To do this, go to its search box at the top-right hand corner and enter your search term in this format: *.docx .
This will find all files with a docx extension (which is the standard Microsoft Word 2007 format).
However, this will not find other types of Microsoft Word format, so you may need to adjust it so that it will find other such documents like this: *.doc* . Sadly, the search term that you need to use is therefore not really in a user-friendly format.
Additionally, the above search wouldn’t find all of the documents which Word can open, such as text files, HTML files or XML files. It may be possible that some of your best data are in text format, but if you didn’t know that these were in your data, how would you to know to look for them?
You could do a very general search: *.* , but there are numerous problems with the results from Windows Explorer.
- You get a very long list, with no ability to manage it other than sorting. You can’t, for example, group the data by file type and then collapse these groupings like you can do for Outlook emails.
- Every time you do re-sort, Windows may rescan the files from scratch. This can be a major nuisance if the files are contained on an unindexed external hard drive or over a slow network.
- Every time you change the criteria, the files will be re-scanned afresh.
- The standard information displayed for each file is limited, and if you were to add additional columns, then Explorer would again rescan the data.
- If you did add additional metadata columns to Windows Explorer, it can be very slow at retrieving this information.
However, the biggest drawbacks are:
- You cannot export the data. You cannot summarise the data. You cannot do anything other than see it in a list, open documents, and copy or bookmark specific files or folders.
- You cannot annotate the data (apart from Tagging or editing the metadata), or email the list to others (unless you use a tool such as the Snipping Tool to capture what is being shown on the screen).
What should the solution look like?
Ideally, you should have the following functions in mind when investigating different file types:
- You should be able to export your results into an Excel spreadsheet or some other table.
- You should be able to filter the list by file type without the computer having to rescan the files again.
- You should be able to create analyses of this information, quickly and easily.
- You should be able to open the original files from that spreadsheet or table.
- After opening the file, you should be able to annotate the spreadsheet, so that you can keyword it.
- You should also have links to that folder in Windows Explorer.
- You should be able to copy and paste information from the table into emails or other documents, or circulate the Microsoft Excel spreadsheet by email.
- You should be able to create this information from Windows Explorer itself with just a few clicks.
- If you want to show document properties, you should be able to retrieve this metadata into the same table or spreadsheet.
It should be fast, it should be easy. It should make a task which can be very laborious in Windows Explorer almost fun.
My solution to this problem is the Filecats range of programs. These programs integrate with Windows Explorer, so you can launch them by just right-hand clicking on the folder and selecting the relevant program.
It will quickly catalogue all the files and folders into a spreadsheet or stand-alone table, and if you want metadata (document properties), you could also include that as well. You can filter the file type in Microsoft Excel, and creating a PivotTable so you can count how many of each file type you have is very easy as well. If you don’t want to have the catalogue in Excel, then you can filter and create similar analyses in a standalone table easily as well.
The next in the series will look at examining your new data by date range.