Invisible (Confidential) Information In Electronic Documents

The Problem

Many documents created by software contain far more than what you can see. As an example, documents written with Microsoft Word contain “metadata” – information about who wrote the document, when it was revised, by whom, and other such information. The technical definition of metadata is data about data. When viewed, this metadata provides information that you may not want others to see. Some metadata can be seen when you view the document properties.

A Microsoft support document explains (in particular to Word) as follows[ref]”How to minimize metadata in Word 2003″; Note that Microsoft sometimes changes its url for a variety of reasons. If this happens the document can be located using the article number 825576.[/ref]:

“This article describes various methods that you can use to minimize the metadata in your Word documents.

When you create, open, or save a document in Microsoft Office Word 2003, the document may contain content that you may not want to share with others when you distribute the document electronically. This information is known as metadata. Metadata is used for a variety of purposes to enhance the editing, viewing, filing, and retrieval of Office documents.

Some metadata is easily accessible through the Word user interface. Other metadata is only accessible through extraordinary means, such as by opening a document in a low-level binary file editor.

Metadata is created in a variety of ways in Word documents. As a result, there is no single method to remove all such content from your documents.”

Of greater significance is that in certain situations, previous versions (as an example of deletions in text) of documents can be retrieved and viewed. Amendments to documents could reveal to an opposing party legal strategies or areas of concern that a client may not want disclosed to the other side.

Lawyers obviously have a duty to avoid disclosing confidential information that could be harmful to the client. The relevant Rule 24 of the Legal Profession (Professional Conduct) Rules, seems to state in absolute terms that “An advocate and solicitor shall not in any way, directly or indirectly…” disclose such confidential information without the client’s consent. At its most generous, the lawyer is required to take reasonable steps to ensure that no information relating to the representation of a client is disclosed without the client’s consent. At its strictest, the lawyer has to take all steps to avoid this.

How to Reduce/Remove Metadata?

To comply with this duty of confidentiality, lawyers should take steps to remove metadata from electronic documents that could potentially be disclosed to the public or to the other party[ref]There is another issue with electronic discovery of such documents. Removal of metadata from electronic documents is potentially tampering with evidence and gives rise to its own set of ethical issues.[/ref]. As the Microsoft article states, there is no one way of doing this.The problem is not that metadata is added to documents. The problem is that it cannot be easily removed from documents.

The Microsoft article describes some ways of removing metadata and they vary depending on which version of Microsoft Word you are using. In addition to this, there are third party applications that can remove such metadata from Microsoft documents[ref]The website is sponsored by one such company.[/ref].

There is one approach to reducing (not eliminate) metadata that only requires the use of Microsoft Word. This requires diligence on the part of the lawyers and their staff to go through each step for each document that has to be cleaned. The first step is to make sure that the document is as clean as it can be within Word[ref]See the above Microsoft article (at footnote 1) for details on how to do this.[/ref]. You then save the document in Rich Text Format. You can examine the document with Notepad to ensure that there is no confidential information in the metadata. You will see metadata like font names, formatting instructions, etc., but such information is usually benign[ref]The exception to the benign metadata would be if someone had deliberately included a defamatory paragraph with a font color of white when the background is white. You could search for such text and remove them manually. Sending out copies of Word documents in .DOC format is risky, not just because of hidden metadata, but because of macro viruses.[/ref]. You can then read it back into a new document in Microsoft Word ready to be sent electronically to the opposing party.

If Rich Text Format won’t work for you, either because the document contains pictures or complex formatting, then the next best option is to print the document to PDF using Acrobat Distiller as opposed to the more convenient PDFwriter or PDFmaker. Since PDFwriter/PDFmaker converts the document directly from Word, some metadata is also captured in the converted PDF document. However understand that this is still not foolproof since PDF documents also contain metadata. In particular, Acrobat comments might include converted comments from Word.

Finally, if you are really and truly paranoid about metadata, you can print the document to paper and scan it in again. The only metadata should be that concerning the scanning process.

Blacking-Out Information in Word and PDF Documents

Sometimes it desirable to black-out certain words from a document, while still indicating where the original information was located. Perhaps certain information has been held inadmissible, but the document must still be presented in court.

One way to do it would be to use the highlighter function in Word, much like a black marking pen, that would make certain xxxxxxxxxxxxxxxx appear blacked out. Obviously someone with access to the Word document could copy the text, change the highlighting, and make the words appear again. But if the document was printed, the blacked-out portion remains hidden. You might think that if the document was converted to PDF, it would also remain hidden.

Unfortunately if the Select Text tool in Adobe Acrobat is used to highlight the blacked-out text and the selection is copied to Microsoft Word, the hidden words can be read again.

An Ethical Issue: Adverse Use of Metadata

Suppose now armed with this knowledge about the existence of metadata, you examine a Word document that the opposing lawyer has sent to you, and discover an abundance of metadata. Can you make use of such information?

At least one bar association has taken the position that it is unethical to examine this hidden information[ref]The New York State Bar Association. Opinion No. 749 ( Dec. 14, 2001)[/ref]. In that opinion, the New York Bar Association recognised that, although the transmitting party intended to transmit the “visible” document, “absent an explicit direction to the contrary counsel plainly does not intend the lawyer to receive the ‘hidden’ material or information.” Based on this premise – that the transmitting lawyer was unintentional in his disclosure of the meta data – the bar association concluded that the metadata could not be accessed: “it is a deliberate act by the receiving lawyer, not carelessness on the part of the sending lawyer, that would lead to the disclosure of client confidences and secrets.”

Whether this will be the position in other jurisdictions remains to be seen. However such a ruling does give rise to many difficulties. What happens if, by examining the metadata, it is revealed that a particular person had viewed and modified a document when his testimony was that he did not? It is difficult to see any court in Singapore saying that such information is inadmissible. Given the beginnings of electronic discovery and the use of electronic evidence, the New York position not to review metadata may not be sustainable for much longer.

Even if there is an ethical prohibition against the misuse of metadata (and that may be a doubtful position), lawyers must caution clients about such information when they are preparing a document or, later, transmitting it[ref]The Law Society of Singapore’s Guidance Note of 1 October 2001 on Ethics and Information Technology clearly places the burden on the law firm to take appropriate measures to ensure confidentiality of e-mails. By extension, it should apply to documents attached to e-mails.[/ref].


Technology is a wonderful tool, but failing to understand it can lead to disastrous results. Understanding the presence of metadata, and working to reduce its dissemination may reduce risk to your law firm and to your clients. However, metadata in a document is not a bad thing. It helps in managing the document. It is only bad when it is embedded into a document without your knowledge, and accessible by others who might have malicious intent.