PDFs have long been one of the most trusted document formats in the world — widely used for invoices, reports, contracts, and even resumes. But in the cybersecurity world, that same trust has made PDFs a perfect disguise for malicious intent. Cybercriminals know that people rarely suspect a simple PDF attachment, and that's exactly why it has become a popular weapon in phishing and malware campaigns.
To tackle this growing threat, cybersecurity firm Proofpoint has released a new open-source tool called PDF Object Hashing — a cutting-edge technique designed to expose malicious PDFs by examining their internal "fingerprints."
The Problem: PDFs Have Become Cybercrime's Favorite Delivery Vehicle
Email attachments may seem harmless, but PDFs are increasingly being used as gateways to malware, credential theft, and business email compromise (BEC). A single click on a PDF attachment can open the door to a wide range of attacks — from credential-harvesting forms to remote access trojans hidden behind innocent-looking invoices.
What makes the situation worse is the complexity and flexibility of the PDF format itself. PDFs were built to ensure compatibility across different systems, and that versatility has a dark side. The format allows for:
In short, attackers can modify a PDF's structure in countless ways without changing how it looks — making it nearly impossible for traditional antivirus signatures or file hashes to catch every variant.
The Breakthrough: Fingerprinting the Structure, Not the Content
This is where Proofpoint's PDF Object Hashing approach shines. Instead of analyzing what's inside the document — such as links, images, or text — it looks at how the document is built.
Here's how it works:
The tool parses the internal hierarchy of a PDF, identifying the building blocks that define its structure — objects like:
These elements are then arranged in order and combined into a unique, stable "fingerprint" that represents the document's structure.
Think of it like the digital equivalent of recognizing someone's handwriting or rhythm of speech — even if they change the words, the underlying pattern remains familiar. Proofpoint compares this to "imphash" in executable analysis, where malware samples with the same import structure can be linked to the same author or toolkit.
Why This Matters: Resilience Against Evasive Techniques
Attackers often tweak minor parts of a malicious PDF — such as updating images, URLs, or filenames — to bypass signature-based detection. With PDF Object Hashing, those superficial edits no longer matter.
Since the tool focuses on object-level structure, it can spot related malicious files that share the same underlying layout, even when the visible content is different. This allows threat hunters to cluster related files, identify attack patterns, and attribute campaigns to specific threat actors — all without needing to decrypt or fully open the files.
In practice, this means a cybersecurity team can rapidly create detection rules that hold up even as attackers change tactics.
Real-World Application: Tracking Active Threat Groups
Proofpoint didn't just develop the tool in theory — they've already put it to work.
One case involved UAC-0050, a threat cluster known for targeting Ukraine. The group distributed encrypted PDFs posing as OneDrive documents, which eventually delivered NetSupport RAT, a remote access trojan. Traditional PDF scanners struggled to detect these files because of encryption. However, by applying the object hashing technique, Proofpoint identified structural similarities among the files, enabling analysts to create quick and accurate signatures for blocking the threat.
Another example is UNK_ArmyDrive, an India-based group active since mid-2025. This actor uses fake government documents from Bangladesh as lures in business email compromise campaigns. Despite cosmetic changes between files, the object hashes revealed clear overlaps — exposing the common toolkit and methods behind the attacks.
A Step Forward for Threat Detection
The release of PDF Object Hashing as an open-source tool represents a significant step forward for defenders. It shifts the detection paradigm from content-based inspection to structure-based attribution, giving researchers and SOC teams a new layer of visibility into malicious documents.
By focusing on the "DNA" of a PDF rather than its ever-changing contents, this technique promises greater resilience against obfuscation and a more reliable way to connect the dots between seemingly unrelated attacks.
In a world where attackers constantly evolve, having a method to see beyond the surface — into the very anatomy of a document — could make all the difference.
Final Thoughts
Proofpoint's innovation underscores a key principle in modern cybersecurity: it's not just about what the attacker sends — it's about how they build it.
As cyber threats grow more sophisticated, defenders need tools that can keep up. PDF Object Hashing gives security teams exactly that — the ability to detect, correlate, and neutralize malicious PDFs before they can do harm.

