Alright, let's get it started. The scenario is the usual one: we have to exploit a binary (Linux, x86, 32bits) called "lena_server" that is running on a remote machine controlled by the organizers. As this challenge is belong [to us and] to the "shellcode" category, we already knew that the binary was supposed to get "something" in input, that was later processed and "somehow" executed.

This write-up is split in two parts: 1) our journey reversing the sh*t out of lena; 2) how we wrote our exploit.

## Reversing the Madness

We soon figured out that the program expects a BMP as input. The first thing it does is to perform these simple steps on the BMP's header:- check that the two first bytes are "BM"
- read the following four bytes and interpret them as the BMP's size
- sanity checks against some bytes in the header (for example, bytes 0x1a-0x1b must be 0x0001)
- checks that the "bits per pixel" is set to 24 (i.e., standard RGB)
- checks that the width and the height is less than 0x200 and multiple of 8 (actually, we later figured out that their BMP data's parser was also assuming a width of
*exactly*0x200...).

### From pixels to bits

The real madness is now about to begin. As we are going to deal with some numbers, I'll directly use as references the values from our final exploit: a BMP of size 512x80.So, after checking the constraints against the BMP header, the executable starts to "process" the actual BMP data. It first extracts 8x8 pixels subblocks starting from the top-left corner of the BMP. Actually, for each 8x8 pixels, the executable "extracts" three different 8x8 matrices, one per channel. That means that in our case, the binary extracts a total of (512/8)*(80/8)* 3 = 1920 8x8 subblocks.

For each of these subblocks, the binary then computes (what it seems to be) its discrete cosine transformation (by calling the function SimpleDCT::DCT2Calc8x8). That is, the output of this step is still a 8x8 matrix, but we are now in the frequency domain!! #wtf

This new matrix is then passed as argument to the function DCTCoefficient::DecodeDCT that computes some sort of statistics related to the entropy of the original 8x8 pixels. This eventually returns a single bit as output, 0 for low entropy, 1 for high entropy (at least, this is what we understood :)).

This bit is then passed to the DecodeDataStream::AddDataBit function that updates an internal data structure. As I mentioned, these steps are performed for each 8x8 subblock (1920 times in our case). From a high-level point of view, such data structure contains two buffers, that I will refer to as B1 and B2. At each iteration, the current bit is appended to B1. When the buffer B1 happens to contain 256 bits (32 bytes), the following additional steps are performed:

- the 32 bytes in B1 are passed as argument to a function called DecodeDataStream::DoDecodeBlockEv, that performs a "decoding" step (at that point, we had no idea on what that function was doing). Its output are 32 bytes, that in the general case differ from the 32 bytes in input.
- some obscure checks are performed on the output. If these "checks" are not satisfied, those 32 bytes are somehow modified or, in the worst case, the program quits.
- if the "DoDecodeBlockEv" function returns without errors, the first 16 bytes of the output are appended to buffer B2, while the latter 16 bytes are discarded.
- the content of buffer B1 is cleared.

### Reed-Solomon error correction codes

It turned out that the function "DoDecodeBlockEv" was actually implementing the decoding step of a Reed-Solomon error correction code. From wikipedia:"Reed–Solomon (RS) codes are non-binary cyclic error-correcting codes. They described a systematic way of building codes that could detect and correct multiple random symbol errors. By adding t check symbols to the data, an RS code can detect any combination of up to t erroneous symbols, or correct up to floor(t/2) symbols"When we understood what that meant, we realized how miserable our life was going to be: the function "DoDecodeBlockEv" applies a RS decoding step on each 32 bytes of buffer B1, by considering the first 16 bytes as the actual data and the remaining 16 bytes as "error correction code". In other words, the function uses the latter 16 bytes to self-correct (!!!) the first 16 bytes, our soon-to-be shellcode!! #wtf²

Clearly, we needed to make sure that each set of 32 bytes in B1 were the RS encoded version of 16 bytes of our shellcode. We first found some python implementation of this encoding step, but they were somehow wrong, in the sense that the decoding of the encoding was not returning the original input (and as we don't know sh*t about this stuff, we were not able to "debug it").

Eventually, we found a Matlab help page (yep, we were that desperate..) talking about a RS-encode function rsenc. Even if we were naturally adverse to rely on Matlab for our exploit, we decided to give it a try. We quickly got a Matlab instance running, we "encoded" the 16 bytes "AAAAAAAAAAAAAAAA" string, and we got as output the 32 bytes string "AAAAAAAAAAAAAAAA<other 16 bytes>". We then took this string, and with some gdb-leetnees we made the "lena" executable decode it. With our great surprise, the decoding step returned exactly the original 16 As. That is, the "<other 16 bytes>" were a valid error correcting code for the first 16 bytes! It worked™.

## Generating the exploit

After a long session of "hurrah" and high-fives all over the place, we started to smell pwnage in the air. We began our journey to victory by summing up what was happening to our poor BMP in input:- the BMP's header is checked against several constraints.
- the BMP's payload is retrieved and decomposed in matrices of 8x8 (three for each 8x8 pixels subblock).
- for each of the subblock, a new 8x8 matrix is computed by calculating its DCT.
- for each new 8x8 subblock, the function "DCTCoefficient::DecodeDCT" returns a bit, whose value depends on the original 8x8 pixels entropy.
- each of these bits is appended to the buffer B1.
- when B1 contains 32bytes, these 32bytes are decoded (function "DoDecodeBlockEv") following the Reed-Solomon error correction code scheme.
- if everything is fine, the first 16 bytes are appended to B2.
- When all the 8x8 subblocks are processed, the content of B2 is executed.

*just*needed to generate a BMP such that the final content of buffer B2 was actually executable. Clearly, our best shot was go backward: we took our standard open-read-write shellcode, and we then wrote a script to invert all those crazy transformation steps (as our Matlab skills are ridiculously poor, we also had Python scripts to generate few Matlab scripts: leetness++). We started with a ~90 bytes shellcode, padded to 112 bytes; we then obtained 224 bytes (1792 bits) after the RS encoding step with Matlab's rsenc, that we needed to pad to 1920 bits. Finally, as the width of the BMP needed to be 512 (remember their parser's assumption?), we used the 1920 bits to generate a 512x80 BMP (3 bits per pixel), that lead to our exploit:

That was not straightforward, but the result we obtained probably represents the highest form of artistic production that a bunch of computer scientists could ever hope for. Of course, the low-level details on how we actually generated that are left as an exercise for the reader ;)

That's all folks! We really (really!) liked this challenge, and we renew our big kudos to the @LegitBS_CTF crew for this great CTF!

Yanick / @reyammer

#wtf, excellent writeup!

ReplyDelete