How Metadata Leaks PHI in Healthcare File Sharing
— Written by Brendan, Founder of FileShot.io
Healthcare providers encrypt files, sign Business Associate Agreements, and implement access controls to protect patient data. But there is a gap that most HIPAA compliance checklists miss: file metadata. Every medical image, PDF report, and Word document shared between providers, insurers, and patients carries hidden metadata that can leak protected health information (PHI) without anyone knowing.
What Is File Metadata in Healthcare?
Metadata is data about the data. When a radiologist saves a DICOM image, the file does not contain just the X-ray or MRI scan. It also contains the patient's name, date of birth, medical record number, referring physician, institution name, and the date and time of the study. When a doctor creates a referral letter in Word, the document stores the author name, organization, creation date, revision history, and sometimes tracked changes that include deleted text.
This metadata exists in the file itself, not in the clinical system. When the file leaves the EHR and is shared via email, cloud storage, or a file sharing service, the metadata travels with it — even if the visible content of the file has been de-identified.
DICOM Metadata: The Biggest Risk in Medical Imaging
DICOM (Digital Imaging and Communications in Medicine) is the standard format for medical images. A single DICOM file can contain over 100 metadata fields. Among them:
- Patient Name (tag 0010,0010)
- Patient ID / Medical Record Number (tag 0010,0020)
- Patient Date of Birth (tag 0010,0030)
- Patient Sex (tag 0010,0040)
- Referring Physician's Name (tag 0008,0090)
- Institution Name (tag 0008,0080)
- Study Date and Time (tags 0008,0020 and 0008,0030)
- Accession Number (tag 0008,0050) — links the image to a specific clinical order
- Study Description (tag 0008,1030) — describes the procedure (e.g., "CHEST PA AND LATERAL")
Every one of these fields is protected health information under HIPAA. When a physician exports a DICOM image to share with a specialist, all of this metadata is embedded in the file. If the image is sent via unencrypted email or uploaded to a cloud service without a Business Associate Agreement, the metadata alone constitutes a potential HIPAA violation — even if the sender believed they were only sharing the image itself.
PDF Metadata in Clinical Documents
PDF files exported from EHR systems often carry metadata that reveals information beyond the visible text:
- Author name — Often set to the clinician's name or the EHR system name
- Title field — May contain a patient name or medical record number (e.g., "Lab Report - John Smith - MRN 12345")
- Creation and modification timestamps — Can reveal when a patient was seen
- Producer / Creator application — Identifies the EHR system (Epic, Cerner, etc.)
- Custom XMP fields — Some EHR exports embed additional patient identifiers in XMP metadata
- Embedded fonts and images — PDFs can contain embedded objects that carry their own metadata
A common scenario: a hospital's billing department generates a PDF statement and sends it to a patient's insurance provider. The visible content shows only procedure codes and amounts. But the PDF's title field reads "Discharge Summary - Jane Doe" and the author field lists the treating physician. The metadata leaks the patient's identity and the fact that they were hospitalized.
Word and Excel Metadata in Referral Letters
Microsoft Office documents used for referral letters, care summaries, and internal reports contain extensive metadata:
- Author and last modified by — Names of staff who created and edited the document
- Comments and tracked changes — Deleted text that was part of the revision process may contain clinical notes, alternative diagnoses, or internal discussions about the patient
- Document properties — Title, subject, and category fields often contain patient identifiers
- Embedded objects — Charts, images, or linked spreadsheets with their own metadata
- Hidden text and fields — Mail merge fields that reference source databases with PHI
Tracked changes are particularly dangerous. A physician might create a referral letter, change the diagnosis wording during review, and send the final version to a specialist. The visible text shows the approved wording. But if tracked changes were not accepted and purged before sharing, the original wording — which might include a more specific or sensitive diagnosis — is still readable in the file.
Image Metadata from Clinical Photos
Smartphone photos of wounds, skin conditions, or dental work taken for clinical documentation carry standard EXIF data:
- GPS coordinates — Reveals the location where the photo was taken (the clinic, hospital, or patient's home)
- Device serial number — Identifies the specific phone used
- Timestamps — Exact date and time of the appointment
- Camera settings — While not PHI themselves, combined with GPS and timestamp data they narrow identification
If a dermatologist takes a photo of a skin condition for a second opinion and emails it to a colleague, the GPS metadata in the photo places the patient at a specific clinic on a specific date. Combined with appointment records, this is enough to identify the patient — even if the patient's face is not in the photo.
Why Encryption Alone Does Not Solve Metadata Leaks
Many healthcare organizations focus on encrypting files in transit (TLS) and at rest (AES-256). This protects against interception and server breaches. But encryption protects the file as a whole unit. Once the recipient decrypts and opens the file, all the metadata is fully readable.
The problem is not that files are intercepted in transit. The problem is that metadata survives the entire lifecycle: creation, sharing, storage, and eventual archiving. The metadata stays in the file permanently unless it is explicitly stripped before sharing.
HIPAA Requirements for Metadata
HIPAA's Privacy Rule defines 18 identifiers as protected health information. File metadata can contain several of these:
- Names (patient name in DICOM, author name in PDFs)
- Dates (study dates, file creation dates, appointment timestamps)
- Geographic data (GPS coordinates in clinical photos)
- Medical record numbers (DICOM Patient ID, PDF title fields)
- Device identifiers (EXIF device serial numbers, DICOM station name)
Under HIPAA's Safe Harbor de-identification method (45 CFR 164.514(b)(2)), all 18 identifiers must be removed from data before it can be considered de-identified. If a file's visible content is de-identified but its metadata still contains a patient name or medical record number, the file is NOT de-identified under HIPAA.
How to Prevent Metadata Leaks in Healthcare File Sharing
Step 1: Strip Metadata Before Sharing
Before sharing any file outside your organization, remove all metadata:
- DICOM images: Use DICOM de-identification tools (dcm4che, DicomCleaner, RSNA Clinical Trial Processor) to remove patient identifiers from DICOM tags while preserving the image data
- PDFs: Use to strip all metadata in your browser without uploading the file to any server
- Word/Excel: Use Document Inspector in Word to remove all metadata, tracked changes, comments, and hidden content before sharing
- Clinical photos: Strip EXIF data, especially GPS location, before sharing. FileShot's Metadata Scrubber handles this for images
Step 2: Use Zero-Knowledge Encrypted File Sharing
After stripping metadata, share files through an encrypted service that minimizes its own metadata collection. FileShot's zero-knowledge architecture means:
- Files are encrypted in the browser before upload — the server stores only ciphertext
- No patient data touches the server in readable form
- Automatic expiration means files do not persist indefinitely
- Download limits and password protection add access controls
- Audit trails track when files were accessed
Step 3: Implement Policies and Training
- Add metadata removal to your file sharing checklist alongside encryption and access control
- Train staff that de-identification means removing metadata, not just redacting visible text
- Use automated DICOM de-identification in your PACS workflow before files are exported
- Review PDF title and author fields before sharing clinical documents externally
- Disable GPS tagging on clinical photography devices
Real-World Metadata Breach Scenarios
These scenarios illustrate how metadata creates risk even when the visible file content appears safe:
- Radiology second opinion: A radiologist exports an MRI from the PACS and emails the DICOM file to an outside specialist. The image shows an anonymized scan, but DICOM tags still contain the patient's full name, date of birth, and MRN.
- Insurance claim submission: A billing department emails a PDF claim. The PDF title field reads "Smith, John - Hip Replacement - 2026-03-15" even though the visible content only shows CPT codes.
- Research study sharing: A researcher shares de-identified clinical photos for a study. The EXIF GPS data places each photo at a specific rural clinic with only one provider, effectively re-identifying the patients through location.
- Referral letter revision history: A referral letter sent to a consulting physician contains tracked changes showing the original draft text: "suspect malignancy, rule out metastatic disease." The final visible text reads "follow-up imaging recommended." The tracked changes reveal the more sensitive clinical reasoning.
Frequently Asked Questions
Is metadata removal required by HIPAA?
HIPAA does not specifically mention "metadata removal" by that name. However, HIPAA's Safe Harbor de-identification standard requires removing 18 categories of identifiers from any data shared for non-treatment purposes. If file metadata contains any of those identifiers — and it usually does — then removing it is required for de-identification compliance.
Does FileShot strip metadata automatically?
FileShot encrypts files as-is for secure sharing. For metadata removal, use the separate before uploading. This keeps the two functions separate: scrub first (privacy), then encrypt and share (security).
Can metadata be removed from DICOM images without affecting the image?
Yes. DICOM metadata is stored in separate header tags from the pixel data. Tools like dcm4che and RSNA Clinical Trial Processor can selectively remove or replace patient-identifying tags while preserving the full diagnostic quality image data. The image itself is untouched.
Conclusion
Encryption protects files in transit and at rest. Metadata removal protects the data inside the file from leaking when it is opened. Both are necessary for HIPAA-compliant file sharing. Strip metadata before sharing, use zero-knowledge encrypted file sharing for the transfer, and train staff that de-identification goes beyond redacting visible text.
For HIPAA-compliant healthcare file sharing with zero-knowledge encryption, see FileShot for Healthcare. For metadata removal tools, use the .
Related Guides
- Word Metadata Remover — strip tracked changes and author data from clinical Word documents
- How to Remove Metadata from PDF — strip patient data from shared PDF reports
- What Is Encrypted File Sharing? — understand zero-knowledge encryption for HIPAA compliance
- How to Password Protect Any File — add password protection to healthcare documents