When data is read from a hard disk, it is presented to the computer as a sequence of numbers. Often these numbers need some form of manipulation or translation before an investigator can comprehend the meaning. In a recent case, I was presented with a challenging analysis scenario that also highlighted a weakness in current forensic tools.
I was reviewing the forensic clone of a server which I suspected had been used to create a file. I knew this file was likely to have been in a particular folder on this server. Loading the clone into Encase, I used the in-built automatic recovery techniques, but no trace of the file was found. As is common for me, I decided to delve into internal operating system structures to see what additional information I could find. My first port of call was to manually review the folder's MFT entry1.
The MFT entry for a folder includes the names of all files within that folder. This is actually a redundant copy of the names, since the MFT entry for a file also contains its name, but Microsoft decided to include this information with the folder as well. Once I manually reviewed the contents of the MFT entry, I discovered that this data structure had previously been larger. The file I was searching for had been present in the folder, but was deleted. At the end of the MFT entry, in what is termed "entry slack", a copy of the file name was still readable. Encase had not discovered it because the MFT parser cannot read entry slack. Entry slack therefore becomes an unparsed area, embedded within the data structure. It is worth noting that this space is not notified to the examiner as unparsed.
Elsewhere, Encase will highlight other forms of slack space. But where slack space appears within a data structure, this is not done. This situation is not limited to Encase and is the case in all common forensic tools.
There are two ways in which data is ignored while appearing to be parsed by forensic tools.
- Unparsed data
This is the general case of my "entry slack" example above. Over the years, many forensic tool manufacturers have needed to reverse-engineer operating system structures to enable investigators to convert incomprehensable streams of data into useful facts. Without access to internal documentation, a reverse-engineer must make some assumptions. Often values will appear constant or be of unknown purpose. Reverse-engineers therefore often allow a degree of flexibility in the data structures parsed. A parser will normally skip the values in question rather than include a line like "Unknown value: 0x12345678" in its output.
With the complexity of many operating system structures, the average examiner is not in a position to manually verify that structures of specific interest have been correctly parsed, let alone verify that all structures an entire system have been fully parsed.
- Inconsistent data
Worse than ignoring unknown data is ignoring inconsistent data where it does not fit with the structure around it. An example of this would be where a data structure is being parsed successfully in one filesystem cluster, and a file name is being read which takes the parser over into the next filesystem cluster. At the end of the name, the parser would expect to see a constant value, but sees an unexpected value. At this point, the parser is likely to stop parsing, but also likely to include the (probably incorrect) name in its output, implying that the second cluster was in fact parsed. Unless the parser also highlights the inconsistency, the value incorrectly extracted may end up being assumed correct by an examiner.
I believe that this situation can only be remedied by forensic tools highlighting those areas of data that have been ignored either implicitly (by skipping) or explicitly (by ignoring rogue values). There are many billions of bytes on even the smallest modern hard drive and we can easily miss a vital clue if even a minute fraction of a percent of that hard drive is not parsed. Until our forensic tools tell us what parts of the drive or data structures that they have ignored, we can not even be sure how much of the evidence has truly been analysed.
1 The MFT (or Master File Table) is where an NTFS filesystem stores information about all files on a hard drive. Each file or folder uses one entry in the MFT.



