Gatherer Transaction Log Files - a Windows Search artefact

A recurring theme in many examinations is the prevalence of evidence in unallocated clusters. Reinstallation of the OS is often to blame and a recent case where XP was installed on a drive where the previous OS was Vista further complicated matters. All relevant data had been created during Vista's reign and the challenge was to determine what files and folders existed under this OS. The Encase Recover Folders feature assisted to an extent as did Digital Detective's Hstex 3. Loading the output of Hstex 3 into NetAnalysis allowed me to identify the download of a number of suspect files and some local file access to files within the Downloads folder.

The next step was to carry out a keyword search utilising the suspect file names as keywords. This is always a good technique and results in the identification of useful evidence in a variety of artefacts (e.g. index.dats, link files, registry entries, NTFS file system artefacts et al) but because in this case every thing was unallocated identifying all the artefacts was a little tricky. A considerable number of the search hits were clearly within some structured data but the data was not an artefact I was familiar with.

I have highlighted Record Entry Headers to draw attention to the structured nature of the data. This screen shot is of test data where the file names/path are stored as unicode as opposed to ASCII in the case I was investigating.















A bit of googling led me to page 42 of Forensic Implications of Windows Vista - Barrie Stewart which identified the structured data I had located as being part of Gatherer Transaction Log files created by the search indexer process of Windows Search. These files have a filename in the format SystemIndex.NtfyX.gthr where the X is replaced by a decimal number and on a live Vista system can be found at the path
C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Projects\SystemIndex\

These files have the words Microsoft Search Gatherer Transaction Log. Format Version 4.9 as a file header. The files are a transaction log of entries committed to the Windows search database indexing queues. The SearchIndexer process monitors the USN Change Journal which is part of the NTFS file system used to track changes to a volume. When a change is detected (by the creation of a new file for example) the SearchIndexer is notified and the file (providing it is in an indexable location - mainly User folders) is added to the queue to be indexed. The USN Change Journal is also something that may contain evidentially useful information and I will look at it in more depth in the next blog post.
Sometimes artefacts are only of academic interest but it was fairly apparent that this data could have some evidential value. Each file or folder has a record entry; parts of which had been deconstructed by Stewart. I was able to identify two additional pieces of information within each record - the length of the Filename block and a value that is possibly a sequence or index number or used to denote priority. I also observed some variations in some parts of the record that had been constant in Stewart's test data.
  • Record Header 0x4D444D44 [4 bytes]
  • Unknown variable data [12 bytes]
  • FILETIME Entry [8 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [12 bytes]
  • Length of file path following plus 1 byte (or plus 2 bytes if file path stored as unicode) [4 bytes] stored as 32 bit integer
  • Name and fullpath of file/folder (ASCII or Unicode -version dependant) [variable length]
  • 0x000000000000000000FFFFFFFF [13 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • 0xFFFFFFFF [4 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [4 bytes]
  • Sequence or index number? [1 byte] stored as 8 bit integer
  • Unknown variable data [15 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [20 bytes]
Microsoft do not seem to have publicly documented the record structure. To establish how useful this data can be I came to the conclusion that I needed to recover all of these records from unallocated. I needed an enscript and Oliver Smith over at Cy4or kindly wrote one for me. I wanted the enscript to parse out the file and path information, sequence or index number, the six time stamps and a hex representation of each unknown range of data into a spreadsheet. The script searches for and parses individual records (from the live systemindex file and unallocated) as opposed to entire files. I was astonished at just how much information the script parsed out - email me if you want a copy. Setting the spreadsheet to use a fixed width font (Courier New) lines up the extracted hex very well should anyone want to reverse engineer these records further. As it stands the file paths and timestamps can provide some useful evidential information, particularly when the recovered records have been recovered from unallocated clusters and relate to a file system older than the current one.
Timestamps
Obviously once you have run this enscript or manually examined the records the first question that arises is what are the timestamps. Establishing this has not been as easy as it could be and hopefully a little bit of crowd sourcing will sort this out for all of us. Post a comment if you can help in this regard. One approach is to use the hex 64 bit filetime value as a keyword and see where you get hits. Hits in another timestamp indicates that the timestamp is the same down to the nanosecond. Carrying out this process will result in hits in OS system files and fragments of them. I have found on the limited test data set I have used that Timestamp 3 matched the File Modified (File Altered) date within MFT for the file concerned and the timestamp for the same file in the USN Change Journal. The timestamp in the USN Change Journal record is the absolute system time that the change journal event was logged (1). It is worth reminding readers who are Encase users that Encase uses different terminology for the time stamps within the MFT - file modified is referred to as Last Written. I think it likely that timestamps 1 and 2 are linked to the indexing function (e.g. time submitted for indexing) given the journalling nature of the file but can not either prove this by testing or confirm this within Microsoft documentation. I can say that in testing sorting on Timestamp 1 gave a clear timeline of the file system activity I had provoked within User accessible folders.













Example CSV output of Enscript (click to enlarge)




轉自http://forensicsfromthesausagefactory.blogspot.com/2010/07/gatherer-transaction-log-files-windows.html

0 意見: