Reviewing MS Office files


In previous articles, I talked about cataloguing a new hard drive, USB memory stick or network, and discussed approaches for investigating the file structure, the type of files, and the date ranges.

In this fourth article I will be discussing in particular Microsoft Office files, namely MS Word, MS Excel and MS PowerPoint. I will be dealing with emails exported from Outlook in my sixth article.

So what are the particular considerations when investigating Microsoft Office files.

What clues can be investigated?

cache_36917853cb05As discussed in my previous articles, it is important that you identify important strings of files such as Minutes of Meetings in Word or Progress Reports in Excel, and that you get as complete a set as possible, as they may be stored in various locations in your data.

One thing that I find essential when trying to find “missing” documents are the document properties, also known as metadata. These are attributes which are included by Microsoft invisibly when the document is saved. Some are added when the files is saved for the first time, whereas others are updated each time it is resaved. So how can this help you?

Whoever has created the first Minutes of Meeting is likely to re-use the same document, albeit re-saved with a different filename and location, when producing subsequent meetings. This ensures consistency of style, and avoids him having to redo work such as the header, footer, and attendees.

If he were to miss a meeting, then it is likely that the next notetaker would use the same Word document as a template, for exactly the same reasons.

If this is likely for Word documents, then it is much more likely for organised data contained in spreadsheets such as progress reports. The structure of these reports is unlikely to be significantly changed from one meeting to another. This is because people who read these reports like to see consistency, to enable them to ascertain at what stage agreed actions have reached, and it helps the report’s creators and maintainers to focus their mind as to what should be reported on.

It is the case that, part of the way through a project, the structure of something like a progress report can be radically altered for operational reasons. However, once changed, this new report format is likely to remain for some time.

How can metadata help?

cache_36917853cb05In both these cases, while the contents are different, the same report is reused multiple times. In this case, many of the document properties will remain unaltered. For example, the “Title” of a document may be set first time it is saved, but is significantly less likely to changed in subsequent versions.

The “Author” or “Creator”, together with the “Company” is again set when the document is first saved and, while document properties can be updated in Office or in Windows Explorer, that is not likely to happen. The same is true of the “Template” used to create the document.

Similarly, the “Content created” time will be as the date the document was first created, but this is not capable of being subsequently changed (apart from starting a new document afresh).

All of the above terms are Microsoft Office metadata (advanced properties), and they can all be read in MS Office and in Windows Explorer. However, the disadvantage of reading them there is that you can generally only read one file at once. If you do manage to read Office document properties from more one than file in Explorer, then that list is not exportable to other packages so that you can manipulate it. This, together with other drawbacks to Windows Explorer, was discussed in previous articles.

How to read MS Office document properties efficiently

What is required, instead, is a solution which creates a structured list, either in a spreadsheet or table, of all the files and folders within a specific folder, together with the option of extracting the document properties as well. The advantages of using such a metadata extractor include:

  • Being able to filter on a specific person or company.
  • Creating analyses of relevant files.
  • Being able to open documents directly from the spreadsheet or table.
  • Finding out what metadata is available.

Filecats Professional and Filecats Metadata are metadata extractors which can do the above, and more. The former exports it into a spreadsheet and therefore requires Microsoft Excel to be installed on the computer beforehand. The latter creates a self-contained table, and therefore does not require Excel. However, if you already have Office installed, Filecats Professional allows you to instantly use all of the power of Excel to filter and create PivotTables.

Additional document properties which these programs can extract include:

  • Statistical data such as word, paragraph and page count.
  • Data about the document’s creation, such as Creator, Company, Template and Title.
  • Data about the document’s last save, such as Date Last Printed or Saved or Total Editing Time.
  • Data shown in Windows Explorer, such as File name, Folder, Size, Dates, Categories, Tags and File Type.

All of this at your fingertips. It’s very quick to launch from your Windows 7, 8 or 10 computer – just two clicks from Windows Explorer, and then select the degree of file properties required.

Do you want to see it in action? Then have a look at the video below.

There’s a free 7-day trial, so you can test it for yourself and see how much information there is hidden away. What have you got to lose?

Other articles

What are Microsoft Office document properties?

What specifically are Microsoft Word document properties?

How can I access them in Office?

How can I access them in Windows Explorer?

How can I access them in VBA?

What document properties are actually used in real life in Word?

What metadata are actually used in Excel spreadsheets?

In PowerPoint presentations, what metadata do people use?

More articles.

Leave a Reply

Your email address will not be published. Required fields are marked *