Accessing metadata from multiple Microsoft Office documents

Introduction

My previous article discussed the various different metadata fields that are available in Microsoft Office documents.

This article will discuss ways of accessing these fields in multiple documents. It will start with looking at VBA, and then talk about an easier way.

Using Visual Basic for Applications

As shown in the VBA article, it is possible to read document properties in VBA. In order to read multiple files, there is a three-step process.

  1. Obtain a list of the files you want to read.
  2. Open each document in turn.
  3. Read and commit to memory the fields that you are interested in.

This example will focus on a VBA module in Microsoft Excel which will open all Microsoft Excel spreadsheets in the folder C:b, and retrieve the “Author” property.

As an introduction, variables are initialised and a reference to the current spreadsheet is made.

Dim introw As Integer, mySheet
Dim strFolder As String, strFile As String, strExtension As String

Sub Catalog()
introw = 1
Set mySheet = ActiveSheet
strFolder = "C:\b\"
strExtension = "*.xls*"

'INSERT NEXT CODE HERE

End Sub

Firstly, generating the list of files can be done in several ways. One of the easiest is to use the DIR function as follows. The following will retrieve the first file, and then loop, accessing the next file, until there are no files left.

strFile = Dir(strFolder & strExtension)

Do Until strFile = ""
    Call OpenFileAndCatalog
    strFile = Dir
    DoEvents
Loop

Once you have the name of the next file, you can then open the file within Excel as follows.

Sub OpenFileAndCatalog()
On Error GoTo MyError
Workbooks.Open strFolder & strFile, UpdateLinks:=False, ReadOnly:=True
'INSERT NEXT CODE HERE
Exit Sub

MyError:
Exit Sub
End Sub

Now that the file is open, you can access any of the document properties. For this case, the “Author” property gets written into the initial spreadsheet, along with the file name, and the row number is updated.

mySheet.Cells(introw, 1) = strFile
mySheet.Cells(introw, 2) = ActiveWorkbook.BuiltinDocumentProperties("Author")
introw = introw + 1

'INSERT NEXT CODE HERE

Now the file which has just been opened has to be closed, and the next file name is retrieved.

ActiveWorkbook.Close SaveChanges:=False

The completed workbook can be opened below.

Download Open All Files
11 February 2015
OpenAllfiles150211.xls
Microsoft Excel sheet [35.5 KB]

But is this a good method?

Advantages and Disadvantages of using VBA to extract Document Properties

The main advantage of using the above is that the code is yours, and you can adjust it as you wish. While the spreadsheet is opened, you can extract any additional data from the document and save it into your workbook. You can also define which folders or files you want to use.

One of the main disadvantages of using the above revolves on the fact that you have to open the spreadsheet. This means that:

  1. It takes time for the computer to open the file. It may be only (say) 5 seconds per file, but for 100 files that’s over 8 minutes.
  2. There is increased network traffic (if the files are stored on a network, as opposed to a local drive),
  3. If any of the files are password protected, you have to allow for entering the password. (And if you are accessing several thousand files, then you may find that the routine stops and waits for you to enter the password – not good if you have left it to run overnight),
  4. If you open a file, other people cannot open it for editing until it is closed (the file is “locked”).

Can it be done faster?

A better way is to be able to examine the metadata without needing to open the document. The advantages to this are:

Filecats Professional speed

  1. Decreased time. Instead of 5 seconds to open a file, you can catalog much more quickly. On one computer, 542 files were cataloged in 23 seconds, including all of the Standard metadata and the Microsoft Office metadata shown in the previous article, into either a spreadsheet or a table. This is over one hundred faster than the other speed.
  2. Reduced network traffic. Because the entire file does not have to be accessed as part of the opening process, but only part of the file, network traffic is significantly reduced.
  3. Because files are not being opened, password protection is not an issue.
  4. Because files are not being opened, files being locked from others for editing is not an issue.

Why reinvent the wheel? If you can have a program that can do all of that, then you can use that as the basis of any additional analysis.

Two programs which can do the above are Filecats Professional and Filecats Metadata. Both create a table of files, folders and subfolders with hyperlinks and metadata from Microsoft Office, photographs, audio, video and MSG (exported from Outlook) files.

  • Filecats Professional creates a spreadsheet within your existing Microsoft Excel installation, from where you can filter, sort and create PivotTables.
  • Filecats Metadata creates a stand-alone table from which you can copy, sort, filter and create analyses. It does not need Excel installed.

There is a free 7-day trial, and should save you a lot of time, allowing you to concentrate on using this date. Why not download it now?

Want more articles?

Leave a Reply

Your email address will not be published. Required fields are marked *