Metadata contained in actual Microsoft Word files

Page: 1 2 3 4 5 6 7 8

Whilst it may be interesting to know in theory what metadata can be used in Word documents, it may be different what document properties are actually used in real Word documents.

To find this out, 994 Word documents were downloaded at random (see methodology), and the metadata extractor Filecats Professional has been used to extract the metadata to see what information is actually available.

The results are as follows:

Results of Microsoft Word metadata analysis - also contained in the spreadsheet on this page.

Common Metadata for Microsoft Word Documents

  • All of the files included the Dates accessed, content created, modified (from Windows Explorer) and Content created (from Microsoft Office). Most of the files also include Date last saved (from Microsoft Office). It should be noted that the Windows Explorer dates are not necessarily the same as the Microsoft Office dates, and that the Microsoft Office dates should be preferred. For more information, see the article “Why are the Windows Explorer dates not reliable?
  • 19 out of 20 files (95%) include a Revision number and Template. Of these, 90% are based on a “Normal” template, leaving only 10% based on a different template.
  • Around 19 out of 20 files also include Microsoft Office statistics (character count, word count, line count, paragraph count and pages), which raises the question: Why do the other 5.5% of the files not include this. It is probable that they were created from outside of Microsoft Office and saved as a doc files from that program without all of the metadata which Microsoft Word provides.
  • Additionally, 16 files have a template of “Normal_Wordconv” and therefore were clearly converted from another application. They include some of the statistics (such as Word count and Pages), but have a zero Line count and Paragraph count. 2 of these documents have a character count of -32766, which clearly indicates something wrong with the metadata calculation (perhaps an overall error).
  • Over 9 out of 10 files have a Last saved by, Authors, Creators and Participants. Despite the last 3 being shown in Windows as separate metadata properties, in each case the Author was the same as the Creator and the Participants. Also, only 5 of these files had more than one author.

Lesser Used Document Properties for Microsoft Word Files

  • 58% of these files have a different “Last saved by” to its Author, and 6% of the files have a user of “User”, “Usuario”, “Admin”, “Owner” or “Preferred Customer”.
  • Nearly 6 out of 10 files had a Company indicated. This field is usually filled in when Microsoft Office is created, which may indicate that more than 4 out of 10 of these documents were created by home users. However, 45 documents had a Company indicating “Hewlett-Packard”, 39 had “Microsoft”, and 19 had “Home”, thereby indicating unreliable data. This leaves over 50% which presumably has reliable Company data.
  • Over 1 in 2 files had a Last Printed date, and Title information. This 51% for Title can be broken down in 78% for .doc documents (Office 2003 format) and only 24 % for .docx documents (Office 2007 format). This may be because, when a document was initially saved, earlier versions of Microsoft Word saved the first line of a document as the Title information, something ceased when Office 2007 came out.
  • Information which is infrequently used are Tags (about 1 file in 40), byte count (1 in 50), categories (1 in 75) and Manager (less than 1 file in 100). Whilst available, use of this metadata has not caught on.

The analysis is contained in the spreadsheet below.

Download Analysis of Microsoft Office documents – 24 October 2014
Analysis of Microsoft Office documents – 24 October 2014
ByLanguage141024.xls
Microsoft Excel sheet [1.8 MB]

This is one of three articles, the others regarding Metadata actually used in spreadsheets, and document properties actually used in PowerPoint presentations. Other articles can be found here.

Did you like this article? If so, please click the “Like” or “Tweet” button to the left of this paragraph, or Share this page using the buttons to the right of this paragraph – it does make a difference. Thank you.

Do you want a way to harness the metadata such as the properties shown in this article from your documents? If so, download a free-trial of the Metadata Extractor Filecats Professional if you have Microsoft Excel, or download Filecats Metadata if you don’t.

cache_32807054 cache_32807055 cache_32807057 cache_32807056 cache_32807059 cache_32807058 cache_32807060 cache_32807061 cache_36390229ef18

Leave a Reply

Your email address will not be published. Required fields are marked *