Ok, well this week I knew was going to be rough, but I didn’t know I was going to need this much luck.
This week’s question was a bit “off the beaten path” in terms of forensics, the question was to find the file name associated with a block ID in the forensic image of a Hadoop Cluster.
If you’re not familiar with Hadoop, this question was probably going to require a little bit of digging, but essentially, Hadoop is a distributed filesystem which runs on top of your base operating system and needs Java to run. If you want to learn more, here you go.
Rabbit Hole of Death
I went down my first rabbit hole trying to essentially recreate a Hadoop cluster locally and somehow mount the forensic image as the files for it, but that was a brick wall. To say I spent a lot of time on this is an understatement. I tried all sort of things from attempting to mount an hdfs cluster in Linux to setting up a Cloudera instance online (this cost money, so that’s where I drew the line). This was evidently NOT how to solve this challenge.
So after sleeping on it, I did a little more digging in FTK Imager only to find the following files in the Master E01 file:
I was curious about what fsimage_* files were and had my first bit of luck by finding this page when looking for what function those files serve: https://acadgild.com/blog/view-fsimage-edit-logs-files-hadoop
Aha! So these files are not only viewable, they very likely contain the information I need to answer this question.
Rabbit Hole #2
From the link above, it became evident I needed to able to run the ‘hdfs’ or ‘hadoop’ commands locally on my Linux VM. I spent another large chunk of time attempting to setup a local Hadoop cluster locally when I had my second bit of luck finding that you could simply install the Hadoop Command Line interface: https://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/HadoopCLI
It’s the darkest before the dawn
At this point I knew I needed 2 things:
- Java SDK
But which versions? I wasn’t too sure it mattered, but I had remembered seeing something in the original image that I went back to:
I ended up installing both those versions, although it occurred to me later that I probably could have just exported those files and untared them. Oh well.
Ok, so I finally extracted the Hadoop command line tools, installed the correct Java version and so I run:
aaaaaaand “Error: Unable to find $JAVA_HOME”
Close but no cigar. Ok, so I go and find where my java installation is. Again this took me way more time than it should have but eventually the internet yielded the answer:
Granted, this could be dependent on what version of java you installed, so my command was slightly different.
Moment of Truth
At last I had all the pieces put together, my Linux VM gave me a correct response to the
hdfs --version command and we were ready to go. I expoerted the first file I found to test it out and see what I got. Let’s be honest, I expected a full screen of error messages:
./hdfs oiv -i fsimage_0000000000000000024 -o fsimage24.xml -p XML
No errors. Crap, it probably didn’t do anything. But wait, is that an actual file named fsimage24.xml, and it’s not zero bytes? It’s a small file, let’s search for that block ID (1073741825):
So we see that the block ID is associated with inode 16387 and the associated file name is AptSource. I’ll admit I thougth to myself “There’s no way that’s it”. But since we had unlimited tries, I figured it was worth a try. Bingo!
Note: It then dawned on me that this was yet the third bit of luck I had as this was the first file I had selected and this was from the Master E01 file. I hadn’t even touched the two Slave E01’s yet, but had the answer been in one of the other images, the same process would have applied, just different files.