Understanding PDF Metadata: What's Hidden in Your Files

In 2003, the British government published a dossier on Iraq's weapons programs as a Word document. Journalists quickly discovered the metadata revealed the document had been plagiarized from a graduate student's thesis and edited by multiple government officials—information the authors never intended to share. The scandal became known as the "Dodgy Dossier."

Every PDF you create or share contains hidden information. Some of it is benign—creation dates, software used. Some of it can be deeply revealing—author names, computer usernames, edit histories, GPS coordinates from embedded photos. Understanding this metadata is essential for anyone sharing documents professionally or publicly.

What Is PDF Metadata?

Metadata is "data about data"—information that describes the document rather than the document's visible content. PDFs can contain multiple types of metadata:

Document Information Dictionary

The basic metadata every PDF contains:

Title: Document title (often auto-filled from filename)
Author: Creator's name (often from software registration)
Subject: Document description
Keywords: Searchable tags
Creator: Application that created the original (e.g., "Microsoft Word")
Producer: Application that made the PDF (e.g., "Adobe PDF Library")
CreationDate: When the PDF was created
ModDate: When it was last modified

XMP Metadata

Extended metadata in XML format, capable of storing much more:

Detailed creation/modification history
Copyright and rights management
Custom fields defined by organizations
Thumbnail previews
Document identifiers and versioning

Embedded Object Metadata

Images and other objects embedded in PDFs carry their own metadata:

EXIF data: Camera model, settings, date/time
GPS coordinates: Where a photo was taken
Software history: Applications that processed the image
Thumbnail images: Sometimes containing data cropped from the visible image

Real risk: In 2012, antivirus pioneer John McAfee was located by authorities in Guatemala after a journalist's photo of him contained GPS coordinates in the EXIF data. The photo was published in a PDF article.

What Metadata Reveals About You

Metadata Field	What It Can Reveal	Risk Level
Author	Full name, sometimes username	Medium
Creator application	Software you use, version numbers	Low
Creation/mod dates	When you worked on document	Low-Medium
File path (sometimes)	Username, folder structure, project names	High
Embedded image GPS	Physical location	Critical
Edit history	Previous versions, deleted content	High
Comments/annotations	Internal discussions, reviewer names	High
Fonts used	Organization's font licenses	Low

How to View PDF Metadata

Adobe Acrobat Reader (Free)

Open the PDF
File → Properties (or Ctrl+D / Cmd+D)
View Description, Security, Fonts, and Advanced tabs

This shows basic metadata but not all hidden data.

Adobe Acrobat Pro

For comprehensive inspection:

Tools → Redact → Remove Hidden Information
Or: Tools → Protection → Remove Hidden Information
View detailed list of all metadata, comments, hidden layers, etc.

ExifTool (Free, Command Line)

The most thorough option for technical users:

exiftool document.pdf

This reveals everything—including metadata most GUI tools don't show. Example output:

File Name                       : contract.pdf
File Size                       : 2.4 MB
File Type                       : PDF
PDF Version                     : 1.7
Creator                         : Microsoft Word 2019
Producer                        : Adobe PDF Library 15.0
Create Date                     : 2024:01:15 14:32:45-05:00
Modify Date                     : 2024:01:15 16:45:12-05:00
Author                          : John Smith
Title                           : Service Agreement Draft v3
Subject                         :
Keywords                        :
XMP Toolkit                     : Adobe XMP Core 5.6-c148

Online Tools

Services like PDF Analyzer or Metadata2Go can inspect PDFs without installing software. However, consider privacy implications before uploading sensitive documents to third-party services.

Hidden Data Beyond Basic Metadata

PDFs can contain information that isn't strictly "metadata" but is equally hidden:

Incremental Saves

PDFs use "incremental saves"—when you edit a PDF, the new data is appended rather than replacing the old. Previous versions of content may still exist in the file, potentially including:

Text that was deleted
Images that were replaced
Pages that were removed
Earlier versions of redacted content

Critical: In 2011, the TSA accidentally published a redacted PDF of their airport screening manual. The "redacted" black boxes were simply drawn over the text—anyone could copy-paste the hidden content beneath.

Hidden Layers

PDFs support layers that can be toggled visible/invisible. Sensitive content on "hidden" layers is still in the file and extractable.

Annotations and Comments

Review comments, sticky notes, and markup may be hidden from view but remain in the document. These often contain candid internal discussions.

Form Field Data

Previously entered form data can persist even after fields appear empty.

Embedded Files

PDFs can contain attached files (documents, spreadsheets, etc.) that aren't visible on any page but travel with the PDF.

Removing Metadata: Complete Sanitization

Adobe Acrobat Pro Method

Open the PDF
Tools → Redact → Remove Hidden Information
Click "Remove" to delete all found items, or review individually
Alternatively: Tools → Protection → Sanitize Document (removes everything at once)
Save as a new file

The "Sanitize Document" feature removes:

Metadata
File attachments
Hidden text and layers
Comments and markup
Bookmarks
Unreferenced data from incremental saves

ExifTool Method (Free)

Remove all metadata while keeping document content:

exiftool -all= document.pdf

This removes XMP and document info but doesn't address incremental saves or hidden layers.

Print to PDF Method

A simple technique that works surprisingly well:

Open the PDF in any viewer
Print to a new PDF (File → Print → Save as PDF)
The new PDF contains only visible content

This flattens the document, removing:

Most metadata (new metadata reflects the print action)
Hidden layers
Incremental save history
Comments and annotations

Limitations: May reduce quality slightly and won't remove embedded image EXIF data.

QPDF (Free, Command Line)

Linearize and clean up PDF structure:

qpdf --linearize input.pdf output.pdf

This rewrites the PDF cleanly, removing incremental save debris.

When to Remove Metadata

Always Remove Before:

Public publishing: Documents on websites, in press releases
Legal submissions: Unless metadata is specifically required
Competitive situations: RFPs, proposals, contracts with other parties
Whistleblowing/anonymous sharing: Any situation requiring anonymity
Sharing with untrusted parties: When you don't control how the document will be used

Consider Keeping When:

Internal documents: Metadata helps with document management
Archival purposes: Creation dates and authorship matter for records
Collaborative editing: Track changes and comments are the point
Legal discovery: Metadata may be legally required to preserve

Metadata for Forensics and Verification

The flip side of metadata privacy is its value for document authentication:

Detecting Forgeries

Metadata inconsistencies can reveal document manipulation:

Creation date after modification date
Software that didn't exist when the document was supposedly created
Mismatched time zones
Producer application inconsistent with claimed source

Establishing Authenticity

For important documents, metadata can prove:

When a contract was actually created
Who authored an agreement
That a document hasn't been modified since signing

Legal Discovery

In litigation, metadata is often critical evidence. Intentionally stripping metadata from documents subject to legal hold can constitute spoliation—destruction of evidence.

Best Practices for Organizations

1. Establish a Metadata Policy

Define when metadata should be preserved vs. removed:

Internal documents: preserve for document management
External publications: remove before release
Legal documents: follow counsel guidance

2. Standardize Author Information

Configure software to use organizational names rather than individual usernames:

Word/Office: File → Options → General → User name
Adobe: Edit → Preferences → Identity

3. Pre-Publication Review

Add metadata review to your publication workflow. Before any document goes public, verify it contains only intended information.

4. Training

Ensure staff understand metadata risks. The "Dodgy Dossier" wasn't a technical failure—it was a human awareness failure.

Conclusion

Every PDF tells two stories: the visible content you intended to share, and the hidden metadata you probably didn't think about. In most cases, this hidden data is harmless. But when privacy matters—and it often matters more than we realize—understanding and controlling metadata is essential.

Before sharing any important PDF: check the metadata, decide what should stay and go, and sanitize accordingly. It takes seconds and prevents the kind of embarrassment that makes headlines.

Create Clean PDFs from the Start

Down2PDF generates minimal, clean PDFs from Markdown—no hidden Office metadata, no edit history, just your content.

Try Down2PDF Free