A short history of forensic imaging — and why “open” has to mean something. Digital forensics is a science. Like every science, it lives or dies by one principle: reproducibility.
If another examiner, using a different tool, cannot take your evidence and arrive at the same result, then what you are doing isn’t forensics; it is an opinion wearing a lab coat. This principle is the bedrock of the Daubert standard, NIST’s Computer Forensics Tool Testing (CFTT) program, and SWGDE guidelines.
The answer to a challenge in court cannot be “trust me.” The answer must be: here is the specification, here is the math, and anyone with the skill and the time can verify it.
It Started with dd
In the beginning, there was dd. Written for Unix in 1974, it was never designed as a forensic tool. It was a general-purpose copy utility. But it did one thing perfectly: it copied every byte, in order, without asking questions.
When the first generation of examiners needed to capture a hard drive, dd was already there. Paired with Ron Rivest’s MD5 algorithm (1991) and later NIST’s SHA-1 (1995), practitioners had a recipe that was simple enough to explain to a jury and rigorous enough to stand up in court.
The genius of dd plus a hash was not its features; it was that anyone could validate it. The code was open, the algorithms were published, and the math was peer-reviewed.
The community expanded on this with tools like dcfldd (Nick Harbour) and dc3dd (Jesse Kornblum). Both were open source, free to read, and are still cited in NIST reports today. That lineage is the foundation of modern imaging. It worked because every layer could be inspected by anyone who cared to look.

Andy Rosen and the Expert Witness Format
The raw dd image had one weakness: it was massive and carried no metadata. There was no place to store the examiner’s name, case number, or chain-of-custody notes.
Andy Rosen solved this in the mid-1990s with the Expert Witness Format—what most examiners know as E01. EWF introduced compression, segmented files, and embedded metadata. While not a published ISO specification, the community eventually reached a point of “open archaeology” where the format’s internals became widely understood and validatable through open-source implementations like Joachim Metz’s libewf.
Google, AFF4, and What It Was Supposed to Be
By the late 2000s, disks were getting larger, and investigations began to involve live systems, memory, and network captures. Simson Garfinkel’s original Advanced Forensic Format (AFF) was the first serious attempt at an open forensic container built from the ground up.
AFF4—presented in 2009 by Michael Cohen, Simson Garfinkel, and Bradley Schatz—was a bigger idea. It treated evidence as a graph of streams and objects described by RDF metadata, wrapped in a standard ZIP container. The intent was crystal clear: an extensible, openly specified, scientifically validatable container that the entire community could use and trust.
Where the “Forensicness” Can Get Lost
Once AFF4 proved itself technically, the format started showing up in commercial tools. Here, the community divides into two groups:
The Scientific Approach: Vendors who write to the published specification and describe their extensions openly.
The Proprietary Approach: Vendors who ship tools writing files with the “.aff4” extension, yet the actual on-disk structure is not described in any public specification.
If a format is the mandatory output for an acquisition, yet only that vendor’s software can reliably read it, it is not an open format. It is a proprietary container with an open format’s name on the marketing page.
The scientific method does not have a marketing department; it has a specification.
The Question Every Examiner Should Be Asking
If you cannot, in principle, sit down with the published specification and write your own reader for an image format, then that format is not validatable. If it is not validatable, it is not reproducible.
This is not an academic concern. Defense experts are already asking: “Show me how your tool decoded this container.” The answer “I can’t, the vendor won’t publish the format” is a very uncomfortable one to give on a witness stand.

What SUMURI Is Doing About It
This is why SUMURI is shipping a validatable implementation of AFF4—a clean-room build that conforms to the published AFF4 v1.0 standard, with no hidden extensions and no proprietary surprises.
This implementation is landing first in PALADIN, available as a command-line tool so practitioners, researchers, and other vendors can image to AFF4, inspect the output, and validate the results themselves.
We have tested for true interoperability:
Physical AFF4 images: Read correctly by the reference open-source implementation (aff4imager) and FTK Imager 4.7+.
Service Dependency: Access to indexed metadata depends on system services that may or may not be available depending on how the system is being examined.
Logical AFF4-L images: These are standard ZIP containers. Any archive tool can browse them, and per-file hashes are stored in Turtle RDF metadata for independent verification.
Final Thought
A PALADIN AFF4 image should be openable by anyone who reads the public specification. Not just us. Not just our customers. Anyone.
This is not a competitive strategy; it is a return to first principles. If we call something open, it has to be open. If we call something forensic, it has to be validatable. Anything less, and we are just asking the court to trust us. And “trust me” has never been good enough for this work.


