Tuesday, December 10, 2013

The "behind the scene" of DexWare, a DalvikVM based service for the iCTF 2013.

This write-up will describe the "behind the scene" of DexWare, a service I wrote for the iCTF 2013. To the best of my knowledge, this is the first service in the history of CTFs to be based on Dalvik-bytecode!! I hope this write-up will be a useful starting point for those who will attempt something similar!

You can find the source code and the compiled binaries on github (link). Also, feel free to ping me on twitter (@reyammer) for any questions.

Introduction

It seems that many of the teams found this service particularly challenging to hack, even after discovering a possible attack vector. The fact that heavy usage of Java reflection was required to hack the service might be related to that :D. In any case, let me tell you, this was a pretty complex service to be set up as well, especially as it needed to run for 8+ hours, on 100+ VMs, at a risk of compromising the entire game if unstable.

These are the points I will touch in this write-up:

  • How to compile a DalvikVM for Linux/x86
  • How to actually run DEX bytecode on Linux/x86
  • Discussion about Dexware, the service
  • Quick overview on how to exploit the service
  • How to make the service reliable (bug in DexClassLoader?)

How to compile a DalvikVM for Linux/x86

The vulnerable box for the iCTF needed to be a classic Linux/x86 VM. Hence, to run a service like this, you first need a DalvikVM runnable on Linux/x86. Once you know how to do it, it's pretty straightforward.

After you download the entire AOSP repo, just fire the following commands:

$ cd <aospdir>
$ source build/envsetup.sh
$ lunch # and then select "aosp_x86-eng"
$ make -j8

You could actually compile only the parts related to the DalvikVM, but compiling everything on a moder machine doesn't take that much in any case. Once you have that, you will find two different directories in the "out" folder. The "host" directory, that contains everything related to the host (your current machine), and the "target" directory, that contains a bunch of files for the target. As we want to run the VM on the host itself, the file interesting to us are those ones in the "host" directory.

After the compilation is over, you will have your nice dalvikvm x86 executable!

$ file <aospdir>/out/host/linux-x86/bin/dalvikvm
dalvikvm: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Note that not all the files in the "host" directory are actually needed to run the VM. The files that I included on github (link) should be the smallest subset of files that is required.

How to run DEX bytecode on Linux/x86

Once you have the binary, it is not entirely straightforward to run pure Dalvik bytecode. Indeed, the DalvikVM executable expects to be run within the context of the Android OS, where many files are in specific directories and specific environment variables are set. Of course, the goal here is to run dalvik bytecode, not Android applications! A quick clarification to the readers not familiar with the Android ecosystem: Android applications are APK files, that, among many other components, contain the classes.dex file, the Dalvik bytecode. Here we are going to run only the actual code, while we'll forget about the remaining Android-related components. In fact, to run a full-fledge Android app, we would need a full-fledge Android OS with all its components running, something that is not going to happen given the many space/memory constraints we have in this setting (iCTF).

In any case, long story short, this is how I managed to run the VM. First, you need to create a "dalvik" bash script (full version here)

$ cat dalvik
#!/bin/bash

CURRDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export ANDROID_ROOT=$CURRDIR/linux-x86

export ANDROID_DATA=/tmp/dalvik-data
mkdir -p $ANDROID_DATA/dalvik-cache

$CURRDIR/linux-x86/bin/dalvikvm \
-Xbootclasspath\
:$ANDROID_ROOT/framework/core-hostdex.jar\
:$ANDROID_ROOT/framework/conscrypt-hostdex.jar\
:$ANDROID_ROOT/framework/okhttp-hostdex.jar\
:$ANDROID_ROOT/framework/bouncycastle-hostdex.jar\
:$ANDROID_ROOT/framework/apache-xml-hostdex.jar \
$@

Once you have this script, you can actually run your piece of Dalvik-bytecode by first wrapping the .dex file in a .jar, and by then calling the script with "./dalvik -cp <jar_fp> <MainClass>", where <jar_fp> is the path to the JAR file, and MainClass is the name of the class that defines the main() method.

How to do that is pretty simple: you first compile your Java application into .class files. Then, you create a DEX wrapper in a JAR with dx:

$ dx --dex --output=dexware.jar *.class

DexWare, the service

The DexWare service (source code) was meant to be an innovative firwmare to manage super-secret chemical formulas related to uranium :-). Each of the formula was identified by a formula ID, and each of them was protected by a different password. Such formulas were the CTF's flags the teams needed to steal!

The service was maintaining a DB of the formulas in a hashmap field, and, in theory, a user could access to a formula only by knowing the associated password.

Of course, a vulnerability was added to the service, such that an attacker was able to steal the formula (the flag!) even without knowing the password.

Which vulnerability? Well, I wanted it to be a nice service where the teams need to exploit a real-world vulnerability. The decision was really simple: as our most recent paper (that is going to be presented at NDSS 2014, stay tuned!) is about code injection vulnerabilities in Android apps, that was a no-brainer. Gotta be a code injection vulnerability :-)

The service has a self-upgrade functionality. When invoked, the service goes through the list of files in a specific temporary directory, check for files with the .jar extension, load them, and execute the upgrade() method of the DexWareUpgrader class, if implemented. And how were the teams supposed to write a jar in the temp dir? Well, by a pure coincidence, a hidden functionality was added to the service so that it was possible to dump arbitrary files to that specific temporary directory. Easy peasy, isn't it?

The Exploit

Actually, even after knowing the attack vector, it is not that easy. In fact, a lot of Java reflective calls were necessary to get a hold on the DB formula field, and, to do that, the teams needed to access the (undocumented?) "this$0" field to first access the outer class (that had the formula DB field). As some nice write-ups on how to exploit this service already popped out around the net, I will stop with the details here, and I invite the interested readers to check those links out: link1, link2 :)

How to make the service reliable. Bug in DexClassLoader?

After developing my own exploit, I tried to stress-test my service to check whether the memory requirements were reasonable. And something interesting happened: after I run my exploit (together with benign traffic) for ~5K times, the DalvikVM was nicely and reliably segfaulting in my face. Awesome, isn't it? :/

As it turned out after a quick investigation, everything was crashing due to "Too many opened files" errors. Wait, what?

After a deeper investigation, I started to blame the DexClassLoader. In fact, at each run of the exploit, the service was creating a DexClassLoader object, that in turn would open the jar file, to then execute the upgrade method. What's the issue? It seems that the DexClassLoader does *never* close the loaded jar file, and that's why my service was going out of file handles!

I tried my best to force the DexClassLoader to close the associate handle, but nothing worked out (even after deleting the JAR, setting to null all the Java objects involved, and explicitly calling the garbage collector). I also checked in the actual implementation of DexClassLoader, but I failed to find any reference on how to close the loaded JAR.

As a workaround, I implemented a routine in the service that every once in a while was going through the JAR files in that temporary directory and, through JNI calls, manually closing all the handles. Dirty as sh*t, but pretty effective :) (source code)

I do understand that an innocuous DexClassLoader is not supposed to be abused the way I did, but, still, I think it's a bug :-)

Conclusion

All in all, that was a really interesting service to write, and it seems the teams enjoyed it as well. For the future, I'm planning to have a service where actual Dalvik bytecode-related skills are required: in fact, the key technical challenges were related to perform tons of Java reflective calls, while no specific Dalvik technical skill was required. I'll try to pull something out for the next iCTF :-)

As always, any sort of feedback is welcome! Let's keep in touch and feel free to ping me on twitter at @reyammer. Finally, if you played the CTF and you solved this challenge: remote high-five man, awesome job :-)


Monday, July 29, 2013

ShellNoob 2.0 is out!

ShellNoob 2.0 is out!! You might now ask with a mix of suspicion and astonishment:

what whaaat?? -- http://www.youtube.com/watch?v=a1Y73sPHKxw

Yep, you got it right! A new version is out!

For those who haven't read the first blog post, ShellNoob is a shellcode writing toolkit that helps you dealing with the boring, error-prone, and painful steps, leaving only the fun part to you! At least that's the goal :)

From when I published the first version (exactly three months ago!) a lot of stuff happened. First, I'll be lucky enough to have a chance to give a demo and to present ShellNoob at Black Hat USA Arsenal! Here is the announcement/abstract: http://www.blackhat.com/us-13/arsenal.html#Fratantonio. Second, a shitton of new features got added!

Here it is the new updated feature list:
  • convert shellcode between different formats and sources. Formats currently supported: asm, bin, hex, obj, exe, C, python, ruby, pretty, safeasm, completec, shellstorm. (All the details are in the README.)
  • interactive asm-to-opcode conversion (and viceversa) mode. This is useful when you cannot use specific bytes in the shellcode and you want to figure out if a specific assembly instruction will cause problems.
  • support for both ATT & Intel syntax. Check the --intel switch.
  • support for 32 and 64 bits (when playing on x86_64 machine). Check the --64 switch.
  • resolve syscall numbers, constants, and error numbers (now implemented for real! :-)).
  • portable and easily deployable (it only relies on gcc/as/objdump and python). And it just one self-contained python script!
  • in-place development: you run ShellNoob directly on the target architecture!
  • built-in support for Linux/x86, Linux/x86_64, Linux/ARM, FreeBSD/x86, FreeBSD/x86_64.
  • "prepend breakpoint" option. Check the -c switch.
  • read from stdin / write to stdout support (use "-" as filename)
  • uber cheap debugging: check the --to-strace and --to-gdb option!
  • Use ShellNoob as a Python module in your scripts!
  • Verbose mode shows the low-level steps of the conversion: useful to debug / understand / learn!
  • Extra plugins: binary patching made easy with the --file-patch, --vm-patch, --fork-nopper options!

The source code and a (hopefully enough) informative README (with all the details, use cases, etc) is on github: https://github.com/reyammer/shellnoob. Check it out! Please send all the bug reports and swears to yanick [AT] cs.ucsb.edu / @reyammer. All feedback is welcome :-)


Saturday, June 22, 2013

Defcon CTF Quals 2013 – \xff\xe4\xcc 5 (lena) write-up

This is my write-up for the Defcon CTF Quals 2013 - \xff\xe4\xcc 5 (lena). I partecipated to the quals with the Shellphish team (we ended up in 7th place!), and I needed to spend an entire night with the great @cavedon (one of the Shellphish's secret weapon) to solve this challenge. Also, we probably wouldn't have made it without @adamdoupe, that monitored our health conditions when we were trying to finalize our exploit, during the following morning.  Needless to say, solving this was not easy (only 8 teams solved it!), but it was definitively super-fun and one of the best CTF challenges I've played with so far.

Alright, let's get it started. The scenario is the usual one: we have to exploit a binary (Linux, x86, 32bits) called "lena_server" that is running on a remote machine controlled by the organizers. As this challenge is belong [to us and] to the "shellcode" category, we already knew that the binary was supposed to get "something" in input, that was later processed and "somehow" executed.

This write-up is split in two parts: 1) our journey reversing the sh*t out of lena; 2) how we wrote our exploit.

Reversing the Madness

We soon figured out that the program expects a BMP as input. The first thing it does is to perform these simple steps on the BMP's header:
  • check that the two first bytes are "BM"
  • read the following four bytes and interpret them as the BMP's size
  • sanity checks against some bytes in the header (for example, bytes 0x1a-0x1b must be 0x0001)
  • checks that the "bits per pixel" is set to 24 (i.e., standard RGB)
  • checks that the width and the height is less than 0x200 and multiple of 8 (actually, we later figured out that their BMP data's parser was also assuming a width of exactly 0x200...).

From pixels to bits

The real madness is now about to begin. As we are going to deal with some numbers, I'll directly use as references the values from our final exploit: a BMP of size 512x80.

So, after checking the constraints against the BMP header, the executable starts to "process" the actual BMP data. It first extracts 8x8 pixels subblocks starting from the top-left corner of the BMP. Actually, for each 8x8 pixels, the executable "extracts" three different 8x8 matrices, one per channel.  That means that in our case, the binary extracts a total of (512/8)*(80/8)* 3 = 1920 8x8 subblocks.

For each of these subblocks, the binary then computes (what it seems to be) its discrete cosine transformation (by calling the function SimpleDCT::DCT2Calc8x8). That is, the output of this step is still a 8x8 matrix, but we are now in the frequency domain!! #wtf

This new matrix is then passed as argument to the function DCTCoefficient::DecodeDCT that computes some sort of statistics related to the entropy of the original 8x8 pixels. This eventually returns a single bit as output, 0 for low entropy, 1 for high entropy (at least, this is what we understood :)).

This bit is then passed to the DecodeDataStream::AddDataBit function that updates an internal data structure. As I mentioned, these steps are performed for each 8x8 subblock (1920 times in our case).  From a high-level point of view, such data structure contains two buffers, that I will refer to as B1 and B2. At each iteration, the current bit is appended to B1. When the buffer B1 happens to contain 256 bits (32 bytes), the following additional steps are performed:
  • the 32 bytes in B1 are passed as argument to a function called DecodeDataStream::DoDecodeBlockEv, that performs a "decoding" step (at that point, we had no idea on what that function was doing). Its output are 32 bytes, that in the general case differ from the 32 bytes in input.
  • some obscure checks are performed on the output. If these "checks" are not satisfied, those 32 bytes are somehow modified or, in the worst case, the program quits.
  • if the "DoDecodeBlockEv" function returns without errors, the first 16 bytes of the output are appended to buffer B2, while the latter 16 bytes are discarded.
  • the content of buffer B1 is cleared.
The good news was that, even if we had no idea on what the "decoding" step and the obscure checks were actually doing, we started to see the light: in fact, by quickly reversing the remaining part of the executable, we realized that if the program is able to process without errors all the 1920 bits in input, the content of the buffer B2 is later directly executed!

Reed-Solomon error correction codes

It turned out that the function "DoDecodeBlockEv" was actually implementing the decoding step of a Reed-Solomon error correction code. From wikipedia:
"Reed–Solomon (RS) codes are non-binary cyclic error-correcting codes. They described a systematic way of building codes that could detect and correct multiple random symbol errors. By adding t check symbols to the data, an RS code can detect any combination of up to t erroneous symbols, or correct up to floor(t/2) symbols" 
When we understood what that meant, we realized how miserable our life was going to be: the function "DoDecodeBlockEv" applies a RS decoding step on each 32 bytes of buffer B1, by considering the first 16 bytes as the actual data and the remaining 16 bytes as "error correction code". In other words, the function uses the latter 16 bytes to self-correct (!!!) the first 16 bytes, our soon-to-be shellcode!! #wtf²

Clearly, we needed to make sure that each set of 32 bytes in B1 were the RS encoded version of 16 bytes of our shellcode. We first found some python implementation of this encoding step, but they were somehow wrong, in the sense that the decoding of the encoding was not returning the original input (and as we don't know sh*t about this stuff, we were not able to "debug it").

Eventually, we found a Matlab help page (yep, we were that desperate..) talking about a RS-encode function rsenc. Even if we were naturally adverse to rely on Matlab for our exploit, we decided to give it a try. We quickly got a Matlab instance running, we "encoded" the 16 bytes "AAAAAAAAAAAAAAAA" string, and we got as output the 32 bytes string "AAAAAAAAAAAAAAAA<other 16 bytes>". We then took this string, and with some gdb-leetnees we made the "lena" executable decode it. With our great surprise, the decoding step returned exactly the original 16 As. That is, the "<other 16 bytes>" were a valid error correcting code for the first 16 bytes! It worked™.

Generating the exploit

After a long session of "hurrah" and high-fives all over the place, we started to smell pwnage in the air. We began our journey to victory by summing up what was happening to our poor BMP in input:
  • the BMP's header is checked against several constraints.
  • the BMP's payload is retrieved and decomposed in matrices of 8x8 (three for each 8x8 pixels subblock).
  • for each of the subblock, a new 8x8 matrix is computed by calculating its DCT.
  • for each new 8x8 subblock, the function "DCTCoefficient::DecodeDCT" returns a bit, whose value depends on the original 8x8 pixels entropy.
  • each of these bits is appended to the buffer B1.
  • when B1 contains 32bytes, these 32bytes are decoded (function "DoDecodeBlockEv") following the Reed-Solomon error correction code scheme.
  • if everything is fine, the first 16 bytes are appended to B2.
  • When all the 8x8 subblocks are processed, the content of B2 is executed.
At this point, we just needed to generate a BMP such that the final content of buffer B2 was actually executable. Clearly, our best shot was go backward: we took our standard open-read-write shellcode, and we then wrote a script to invert all those crazy transformation steps (as our Matlab skills are ridiculously poor, we also had Python scripts to generate few Matlab scripts: leetness++). We started with a ~90 bytes shellcode, padded to 112 bytes; we then obtained 224 bytes (1792 bits) after the RS encoding step with Matlab's rsenc, that we needed to pad to 1920 bits. Finally, as the width of the BMP needed to be 512 (remember their parser's assumption?), we used the 1920 bits to generate a 512x80 BMP (3 bits per pixel), that lead to our exploit:


That was not straightforward, but the result we obtained probably represents the highest form of artistic production that a bunch of computer scientists could ever hope for. Of course, the low-level details on how we actually generated that are left as an exercise for the reader ;)

That's all folks! We really (really!) liked this challenge, and we renew our big kudos to the @LegitBS_CTF crew for this great CTF!

Yanick / @reyammer

Wednesday, June 12, 2013

What The Fork: how to immediately block *any* Android device

What if an unprivileged Android app could lock, instantaneously, any Android device out there? What if such an app exists and is also really simple to implement?

A few months ago, Antonio and I stumbled upon a paper titled Would You Mind Forking This Process? A Denial of Service attack on Android. In this paper, the authors describe a vulnerability they discovered related to Android's Zygote that could be exploited by mounting a DoS (Denial-of-Service) attack: this resulted in the target device becoming completely unresponsive after a minute or so. Without going too much into the details, the vulnerability was due to the world-writable permission access to the Zygote's socket, from which Zygote itself listens for "requests to fork new processes". With this technique, they were able to make Zygote fork a high number of processes, and that was enough to block the device.

Their work got a lot of coverage and the vulnerability they found (CVE-2011-3918) has been considered as critical (CVSS score of 7.8). Eventually, even Google acknowledged the bug by committing a fix.

At that point, Antonio asked me "how about a fork-bomb? Would that work?". To our surprise, the answer to that question is: yes, definitively yes!

We found that the most simple implementation of a fork bomb is able to instantaneously block any Android device we have tested it on, regardless of the phone/OS version.

Up to now, we verified that our exploit works on Google's flagship device (Nexus 4 with Android 4.2.2) and on the following other ones:
  • Nexus 4 - Android 4.2.1
  • Nexus S - Android 4.2.1 (Cyanogen mod 10.1)
  • Galaxy Nexus - Android 4.2.2 (Cyanogen mod 10.1.0-RC5)
  • Galaxy Nexus - Android 4.1.1
  • Galaxy Nexus - Android 4.0.4
  • Samsung Galaxy Tab 10.1 - Android 3.1
  • Motorola Backflip - Android 2.3.7 (Cyanogen mod 7)
  • Emulator running Android - Android 4.1.2

We immediately contacted the Android security team (on Feb 7th, 2013), and received a response shortly after: in their opinion, this does not constitute a security issue, because it is just a local DoS and the user can somehow regain control of his device (by rebooting the phone).

We found their answer quite interesting. In fact, we believe that our exploit is strictly more powerful that the previously disclosed one, as in this case the device blocks immediately, and not after a minute or so. Also, the impact that such an issue might have is much higher, as there does not seem to be an easy patch to fix this problem. This naturally raised some questions: did Google patch that vulnerability only because it was a simple fix? Or maybe the patch was supposed to fix a different more serious (undisclosed) bug? As we have not heard anything back (we sent our last email to them on Feb 10th, 2013), we have no idea. In any case, we are surely not the first ones to come up with a local DoS attack (for example, check this nice trick from DexLab), but it seems that Google just does not care that much about this.

We think that local DoS is really bad (at least bad enough to commit a fix, especially when it's simple). First, this violates one of the core principle on the Android design: no application, by default, should have the permission to perform any operation that would adversely impact other applications, the operating system, or the user. Second, what if the attack starts during the night, when the user is relying on his phone for the alarm clock? That would be quite unfortunate. And what if our nasty app starts at every boot, making the device completely unusable from the beginning? Now, if you are one of those few guys who reads a blog post like this, you probably know your way around to delete the annoying app, but what if you are not?

That being said, our uber simple forkbomb app is available on github at: https://github.com/reyammer/android-forkbomb. Feel free to try it out, but be ready to pull out your battery :-)

For any questions, please feel free to drop us an email at yanick@cs.ucsb.edu and antoniob@cs.ucsb.edu.

Monday, April 29, 2013

ShellNoob 1.0 - a shellcode writing toolkit

Today I'm really happy to publicly release ShellNoob (and to publish my first blog post :-))

During the many CTFs I played, there always has been the need to manually write some shellcode (yep, most of time Metasploit is not enough, even if you are lucky and you get a working shellcode...)

Now, writing shellcode is always super fun, but some parts are extremely boring and error prone. And after googling for the n-th time "how to <put-anything-you-like-here>", I just got tired and I wrote shellnoob.py, a simple Python script that makes some boring steps less boring.

I bet that there are tons of similar scripts around the web, but I never found anything that had all the features I wanted. If you have suggestions, please ping me!

Alright, let's go to the meat. This is the list of features:
  • convert shellcode between different formats (currently supported: asm, bin, hex, obj, exe, C, python, ruby, pretty)
  • interactive opcode-to-binary conversion (and viceversa) mode. This is useful when you cannot use specific bytes in the shellcode.
  • resolve syscall numbers and constants (not exactly implemented yet :-))
  • portable: it only relies on gcc/as/objdump and python.
  • easily deployable: it's just one python file!
  • in-place development: you run ShellNoob directly on the target architecture!
  • other options: prepend breakpoint, 32bit/64bit switch.
  • read from stdin / write to stdout support (use "-" as filename)

Everything (i.e., the code, use cases, etc) is on github: https://github.com/reyammer/shellnoob

Check it out! Feedback, comments, and contributions are more than welcome!