In 2003, the British government published a dossier on Iraq's weapons programs as a Word document. Journalists quickly discovered the metadata revealed the document had been plagiarized from a graduate student's thesis and edited by multiple government officials—information the authors never intended to share. The scandal became known as the "Dodgy Dossier."
Every PDF you create or share contains hidden information. Some of it is benign—creation dates, software used. Some of it can be deeply revealing—author names, computer usernames, edit histories, GPS coordinates from embedded photos. Understanding this metadata is essential for anyone sharing documents professionally or publicly.
What Is PDF Metadata?
Metadata is "data about data"—information that describes the document rather than the document's visible content. PDFs can contain multiple types of metadata:
Document Information Dictionary
The basic metadata every PDF contains:
- Title: Document title (often auto-filled from filename)
- Author: Creator's name (often from software registration)
- Subject: Document description
- Keywords: Searchable tags
- Creator: Application that created the original (e.g., "Microsoft Word")
- Producer: Application that made the PDF (e.g., "Adobe PDF Library")
- CreationDate: When the PDF was created
- ModDate: When it was last modified
XMP Metadata
Extended metadata in XML format, capable of storing much more:
- Detailed creation/modification history
- Copyright and rights management
- Custom fields defined by organizations
- Thumbnail previews
- Document identifiers and versioning
Embedded Object Metadata
Images and other objects embedded in PDFs carry their own metadata:
- EXIF data: Camera model, settings, date/time
- GPS coordinates: Where a photo was taken
- Software history: Applications that processed the image
- Thumbnail images: Sometimes containing data cropped from the visible image
Real risk: In 2012, antivirus pioneer John McAfee was located by authorities in Guatemala after a journalist's photo of him contained GPS coordinates in the EXIF data. The photo was published in a PDF article.
What Metadata Reveals About You
| Metadata Field | What It Can Reveal | Risk Level |
|---|---|---|
| Author | Full name, sometimes username | Medium |
| Creator application | Software you use, version numbers | Low |
| Creation/mod dates | When you worked on document | Low-Medium |
| File path (sometimes) | Username, folder structure, project names | High |
| Embedded image GPS | Physical location | Critical |
| Edit history | Previous versions, deleted content | High |
| Comments/annotations | Internal discussions, reviewer names | High |
| Fonts used | Organization's font licenses | Low |
How to View PDF Metadata
Adobe Acrobat Reader (Free)
- Open the PDF
- File → Properties (or Ctrl+D / Cmd+D)
- View Description, Security, Fonts, and Advanced tabs
This shows basic metadata but not all hidden data.
Adobe Acrobat Pro
For comprehensive inspection:
- Tools → Redact → Remove Hidden Information
- Or: Tools → Protection → Remove Hidden Information
- View detailed list of all metadata, comments, hidden layers, etc.
ExifTool (Free, Command Line)
The most thorough option for technical users:
exiftool document.pdf
This reveals everything—including metadata most GUI tools don't show. Example output:
File Name : contract.pdf
File Size : 2.4 MB
File Type : PDF
PDF Version : 1.7
Creator : Microsoft Word 2019
Producer : Adobe PDF Library 15.0
Create Date : 2024:01:15 14:32:45-05:00
Modify Date : 2024:01:15 16:45:12-05:00
Author : John Smith
Title : Service Agreement Draft v3
Subject :
Keywords :
XMP Toolkit : Adobe XMP Core 5.6-c148
Online Tools
Services like PDF Analyzer or Metadata2Go can inspect PDFs without installing software. However, consider privacy implications before uploading sensitive documents to third-party services.
Hidden Data Beyond Basic Metadata
PDFs can contain information that isn't strictly "metadata" but is equally hidden:
Incremental Saves
PDFs use "incremental saves"—when you edit a PDF, the new data is appended rather than replacing the old. Previous versions of content may still exist in the file, potentially including:
- Text that was deleted
- Images that were replaced
- Pages that were removed
- Earlier versions of redacted content
Critical: In 2011, the TSA accidentally published a redacted PDF of their airport screening manual. The "redacted" black boxes were simply drawn over the text—anyone could copy-paste the hidden content beneath.
Hidden Layers
PDFs support layers that can be toggled visible/invisible. Sensitive content on "hidden" layers is still in the file and extractable.
Annotations and Comments
Review comments, sticky notes, and markup may be hidden from view but remain in the document. These often contain candid internal discussions.
Form Field Data
Previously entered form data can persist even after fields appear empty.
Embedded Files
PDFs can contain attached files (documents, spreadsheets, etc.) that aren't visible on any page but travel with the PDF.
Removing Metadata: Complete Sanitization
Adobe Acrobat Pro Method
- Open the PDF
- Tools → Redact → Remove Hidden Information
- Click "Remove" to delete all found items, or review individually
- Alternatively: Tools → Protection → Sanitize Document (removes everything at once)
- Save as a new file
The "Sanitize Document" feature removes:
- Metadata
- File attachments
- Hidden text and layers
- Comments and markup
- Bookmarks
- Unreferenced data from incremental saves
ExifTool Method (Free)
Remove all metadata while keeping document content:
exiftool -all= document.pdf
This removes XMP and document info but doesn't address incremental saves or hidden layers.
Print to PDF Method
A simple technique that works surprisingly well:
- Open the PDF in any viewer
- Print to a new PDF (File → Print → Save as PDF)
- The new PDF contains only visible content
This flattens the document, removing:
- Most metadata (new metadata reflects the print action)
- Hidden layers
- Incremental save history
- Comments and annotations
Limitations: May reduce quality slightly and won't remove embedded image EXIF data.
QPDF (Free, Command Line)
Linearize and clean up PDF structure:
qpdf --linearize input.pdf output.pdf
This rewrites the PDF cleanly, removing incremental save debris.
When to Remove Metadata
Always Remove Before:
- Public publishing: Documents on websites, in press releases
- Legal submissions: Unless metadata is specifically required
- Competitive situations: RFPs, proposals, contracts with other parties
- Whistleblowing/anonymous sharing: Any situation requiring anonymity
- Sharing with untrusted parties: When you don't control how the document will be used
Consider Keeping When:
- Internal documents: Metadata helps with document management
- Archival purposes: Creation dates and authorship matter for records
- Collaborative editing: Track changes and comments are the point
- Legal discovery: Metadata may be legally required to preserve
Metadata for Forensics and Verification
The flip side of metadata privacy is its value for document authentication:
Detecting Forgeries
Metadata inconsistencies can reveal document manipulation:
- Creation date after modification date
- Software that didn't exist when the document was supposedly created
- Mismatched time zones
- Producer application inconsistent with claimed source
Establishing Authenticity
For important documents, metadata can prove:
- When a contract was actually created
- Who authored an agreement
- That a document hasn't been modified since signing
Legal Discovery
In litigation, metadata is often critical evidence. Intentionally stripping metadata from documents subject to legal hold can constitute spoliation—destruction of evidence.
Best Practices for Organizations
1. Establish a Metadata Policy
Define when metadata should be preserved vs. removed:
- Internal documents: preserve for document management
- External publications: remove before release
- Legal documents: follow counsel guidance
2. Standardize Author Information
Configure software to use organizational names rather than individual usernames:
- Word/Office: File → Options → General → User name
- Adobe: Edit → Preferences → Identity
3. Pre-Publication Review
Add metadata review to your publication workflow. Before any document goes public, verify it contains only intended information.
4. Training
Ensure staff understand metadata risks. The "Dodgy Dossier" wasn't a technical failure—it was a human awareness failure.
Conclusion
Every PDF tells two stories: the visible content you intended to share, and the hidden metadata you probably didn't think about. In most cases, this hidden data is harmless. But when privacy matters—and it often matters more than we realize—understanding and controlling metadata is essential.
Before sharing any important PDF: check the metadata, decide what should stay and go, and sanitize accordingly. It takes seconds and prevents the kind of embarrassment that makes headlines.
Create Clean PDFs from the Start
Down2PDF generates minimal, clean PDFs from Markdown—no hidden Office metadata, no edit history, just your content.
Try Down2PDF Free