Down2PDF

Free Markdown Editor with Live Preview, PDF Export & Table Generator

Understanding PDF Metadata: What's Hidden in Your Files

In 2003, the British government published a dossier on Iraq's weapons programs as a Word document. Journalists quickly discovered the metadata revealed the document had been plagiarized from a graduate student's thesis and edited by multiple government officials—information the authors never intended to share. The scandal became known as the "Dodgy Dossier."

Every PDF you create or share contains hidden information. Some of it is benign—creation dates, software used. Some of it can be deeply revealing—author names, computer usernames, edit histories, GPS coordinates from embedded photos. Understanding this metadata is essential for anyone sharing documents professionally or publicly.

What Is PDF Metadata?

Metadata is "data about data"—information that describes the document rather than the document's visible content. PDFs can contain multiple types of metadata:

Document Information Dictionary

The basic metadata every PDF contains:

  • Title: Document title (often auto-filled from filename)
  • Author: Creator's name (often from software registration)
  • Subject: Document description
  • Keywords: Searchable tags
  • Creator: Application that created the original (e.g., "Microsoft Word")
  • Producer: Application that made the PDF (e.g., "Adobe PDF Library")
  • CreationDate: When the PDF was created
  • ModDate: When it was last modified

XMP Metadata

Extended metadata in XML format, capable of storing much more:

  • Detailed creation/modification history
  • Copyright and rights management
  • Custom fields defined by organizations
  • Thumbnail previews
  • Document identifiers and versioning

Embedded Object Metadata

Images and other objects embedded in PDFs carry their own metadata:

  • EXIF data: Camera model, settings, date/time
  • GPS coordinates: Where a photo was taken
  • Software history: Applications that processed the image
  • Thumbnail images: Sometimes containing data cropped from the visible image

Real risk: In 2012, antivirus pioneer John McAfee was located by authorities in Guatemala after a journalist's photo of him contained GPS coordinates in the EXIF data. The photo was published in a PDF article.

What Metadata Reveals About You

Metadata Field What It Can Reveal Risk Level
Author Full name, sometimes username Medium
Creator application Software you use, version numbers Low
Creation/mod dates When you worked on document Low-Medium
File path (sometimes) Username, folder structure, project names High
Embedded image GPS Physical location Critical
Edit history Previous versions, deleted content High
Comments/annotations Internal discussions, reviewer names High
Fonts used Organization's font licenses Low

How to View PDF Metadata

Adobe Acrobat Reader (Free)

  1. Open the PDF
  2. File → Properties (or Ctrl+D / Cmd+D)
  3. View Description, Security, Fonts, and Advanced tabs

This shows basic metadata but not all hidden data.

Adobe Acrobat Pro

For comprehensive inspection:

  1. Tools → Redact → Remove Hidden Information
  2. Or: Tools → Protection → Remove Hidden Information
  3. View detailed list of all metadata, comments, hidden layers, etc.

ExifTool (Free, Command Line)

The most thorough option for technical users:

exiftool document.pdf

This reveals everything—including metadata most GUI tools don't show. Example output:

File Name                       : contract.pdf
File Size                       : 2.4 MB
File Type                       : PDF
PDF Version                     : 1.7
Creator                         : Microsoft Word 2019
Producer                        : Adobe PDF Library 15.0
Create Date                     : 2024:01:15 14:32:45-05:00
Modify Date                     : 2024:01:15 16:45:12-05:00
Author                          : John Smith
Title                           : Service Agreement Draft v3
Subject                         :
Keywords                        :
XMP Toolkit                     : Adobe XMP Core 5.6-c148

Online Tools

Services like PDF Analyzer or Metadata2Go can inspect PDFs without installing software. However, consider privacy implications before uploading sensitive documents to third-party services.

Hidden Data Beyond Basic Metadata

PDFs can contain information that isn't strictly "metadata" but is equally hidden:

Incremental Saves

PDFs use "incremental saves"—when you edit a PDF, the new data is appended rather than replacing the old. Previous versions of content may still exist in the file, potentially including:

  • Text that was deleted
  • Images that were replaced
  • Pages that were removed
  • Earlier versions of redacted content

Critical: In 2011, the TSA accidentally published a redacted PDF of their airport screening manual. The "redacted" black boxes were simply drawn over the text—anyone could copy-paste the hidden content beneath.

Hidden Layers

PDFs support layers that can be toggled visible/invisible. Sensitive content on "hidden" layers is still in the file and extractable.

Annotations and Comments

Review comments, sticky notes, and markup may be hidden from view but remain in the document. These often contain candid internal discussions.

Form Field Data

Previously entered form data can persist even after fields appear empty.

Embedded Files

PDFs can contain attached files (documents, spreadsheets, etc.) that aren't visible on any page but travel with the PDF.

Removing Metadata: Complete Sanitization

Adobe Acrobat Pro Method

  1. Open the PDF
  2. Tools → Redact → Remove Hidden Information
  3. Click "Remove" to delete all found items, or review individually
  4. Alternatively: Tools → Protection → Sanitize Document (removes everything at once)
  5. Save as a new file

The "Sanitize Document" feature removes:

  • Metadata
  • File attachments
  • Hidden text and layers
  • Comments and markup
  • Bookmarks
  • Unreferenced data from incremental saves

ExifTool Method (Free)

Remove all metadata while keeping document content:

exiftool -all= document.pdf

This removes XMP and document info but doesn't address incremental saves or hidden layers.

Print to PDF Method

A simple technique that works surprisingly well:

  1. Open the PDF in any viewer
  2. Print to a new PDF (File → Print → Save as PDF)
  3. The new PDF contains only visible content

This flattens the document, removing:

  • Most metadata (new metadata reflects the print action)
  • Hidden layers
  • Incremental save history
  • Comments and annotations

Limitations: May reduce quality slightly and won't remove embedded image EXIF data.

QPDF (Free, Command Line)

Linearize and clean up PDF structure:

qpdf --linearize input.pdf output.pdf

This rewrites the PDF cleanly, removing incremental save debris.

When to Remove Metadata

Always Remove Before:

  • Public publishing: Documents on websites, in press releases
  • Legal submissions: Unless metadata is specifically required
  • Competitive situations: RFPs, proposals, contracts with other parties
  • Whistleblowing/anonymous sharing: Any situation requiring anonymity
  • Sharing with untrusted parties: When you don't control how the document will be used

Consider Keeping When:

  • Internal documents: Metadata helps with document management
  • Archival purposes: Creation dates and authorship matter for records
  • Collaborative editing: Track changes and comments are the point
  • Legal discovery: Metadata may be legally required to preserve

Metadata for Forensics and Verification

The flip side of metadata privacy is its value for document authentication:

Detecting Forgeries

Metadata inconsistencies can reveal document manipulation:

  • Creation date after modification date
  • Software that didn't exist when the document was supposedly created
  • Mismatched time zones
  • Producer application inconsistent with claimed source

Establishing Authenticity

For important documents, metadata can prove:

  • When a contract was actually created
  • Who authored an agreement
  • That a document hasn't been modified since signing

Legal Discovery

In litigation, metadata is often critical evidence. Intentionally stripping metadata from documents subject to legal hold can constitute spoliation—destruction of evidence.

Best Practices for Organizations

1. Establish a Metadata Policy

Define when metadata should be preserved vs. removed:

  • Internal documents: preserve for document management
  • External publications: remove before release
  • Legal documents: follow counsel guidance

2. Standardize Author Information

Configure software to use organizational names rather than individual usernames:

  • Word/Office: File → Options → General → User name
  • Adobe: Edit → Preferences → Identity

3. Pre-Publication Review

Add metadata review to your publication workflow. Before any document goes public, verify it contains only intended information.

4. Training

Ensure staff understand metadata risks. The "Dodgy Dossier" wasn't a technical failure—it was a human awareness failure.

Conclusion

Every PDF tells two stories: the visible content you intended to share, and the hidden metadata you probably didn't think about. In most cases, this hidden data is harmless. But when privacy matters—and it often matters more than we realize—understanding and controlling metadata is essential.

Before sharing any important PDF: check the metadata, decide what should stay and go, and sanitize accordingly. It takes seconds and prevents the kind of embarrassment that makes headlines.

Create Clean PDFs from the Start

Down2PDF generates minimal, clean PDFs from Markdown—no hidden Office metadata, no edit history, just your content.

Try Down2PDF Free