Tuesday, March 15, 2016

From Android ART (binary-only) to DEX? Yes, we can!™ (kinda)

This is a write-up for the 0ctf 2016 quals "State of the ART" mobile/Android challenge worth 5 points. We (Shellphish) were one of the only three teams that solved it, and since I haven't seen any write-up on this, here is mine! Major props to @_antonio_bc_ and @subwire who heavily worked on this with me :)

Alright, here is the challenge. We were given one tar containing three files:

1) mmaps of a process running an Android app
2) output of dex2oat command run over the Android app's Dalvik bytecode
3) boot.oat

In recent Android versions, an app's Dalvik bytecode is converted into an OAT file (an ELF binary file) when the app is first installed. dex2oat is the program in charge of this process. More information on the OAT format can be found here. The boot.oat file is the OAT related to the main components of the Android framework.

We checked the output of the dex2oat command, and we found two classes related to the Android app: MainActivity.onCreate() and MainActivity.check(String s).

We then guessed that to solve the challenge we needed to reverse the check() function and that the flag would have been the string s that makes the check() function return 1.

Normally, the OAT format (and the output of the dex2oat command) includes the generated binary as well as the Dalvik bytecode of the app. However, the authors of the challenge removed the Dalvik bytecode part, making this challenge very interesting.

Thus the question: from the binary-only part of an OAT, can we reconstruct the Dalvik bytecode?

We reassambled the binary from the log file and, at a first look, it was clear that the app defines and manipulates a series of arrays and does some operation over their elements.

We encountered three main challenges to fully reconstruct what was going on:

1) It's low-level ARM assembly code. For example, a new-array v0, v1, byte[] Dalvik bytecode instruction looks like:

      0x00371eb6: f8d9e11c ldr.w   lr, [r9, #284]  ; pAllocArrayResolved
      0x00371eba: 9900     ldr     r1, [sp, #0]
      0x00371ebc: 2606     movs    r6, #6
      0x00371ebe: 1c32     mov     r2, r6
      0x00371ec0: f64e0020 movw    r0, #59424
      0x00371ec4: f2c7005b movt    r0, #28763
      0x00371ec8: 47f0     blx     lr

Register R0 contains a reference to the type of the array (i.e., byte[]), while register R2 contains the size of the array.

As another example, a fill-array-data v0, +6 Dalvik bytecode instruction (which loads into an already-created array a series of bytes at a given offset) looks like:

      0x00371ea8: f8d9e190 ldr.w   lr, [r9, #400]  ; pHandleFillArrayData
      0x00371eac: 4682     mov     r10, r0
      0x00371eae: 4650     mov     r0, r10
      0x00371eb0: f20f6144 adr     r1, +216
      0x00371eb4: 47f0     blx     lr

2) There is a lot of automatically-generated code that is not directly related to the Dalvik bytecode, which makes reversing the binary harder. For example, since this app was playing with arrays, there are "bound checks" all over the place:

      0x00372410: f8d9e238 ldr.w   lr, [r9, #568]  ; pThrowArrayBounds
      0x00372414: 1c01     mov     r1, r0
      0x00372416: 1c28     mov     r0, r5
      0x00372418: 47f0     blx     lr

3) When one method invokes a Java method in the Android framework, we only see a jump to an address in memory...how can we know where the app is jumping to?

For example:

      0x00372294: f6497e11 movw    lr, #40721
      0x00372298: f2c72ea0 movt    lr, #29344
      0x0037229c: f24520c0 movw    r0, #21184
      0x003722a0: f2c70044 movt    r0, #28740
      0x003722a4: 4641     mov     r1, r8
      0x003722a6: 2253     movs    r2, #83
      0x003722a8: 2365     movs    r3, #101
      0x003722aa: 47f0     blx     lr

Here the control flow jumps to address 0x72a09f11. The memory map information clearly tells us that we are jumping in boot.oat (the Android framework): 

$ cat mmap.txt

703d3000-70eee000 rw-p 00000000 b3:17 185108     /data/dalvik-cache/arm/system@framework@boot.art
70eee000-7298b000 r--p 00000000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat
7298b000-73e43000 r-xp 01a9d000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat
73e43000-73e44000 rw-p 02f55000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat

and we can compute the offset in the boot.oat file with the following formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000

(Note: the "-1" is for fixing the ARM thumb-related "bit")

For our example: 0x72a09f11 ~> 0x1b1bf10.

But how can we know which method in the framework is invoked?

It turns out that an OAT file contains the "Java method" -- "offset in binary" mapping we are looking for. To dump this information, we used oatdump, which is a tool that comes with AOSP. Note that oatdump and the OAT format are very target-specific, and an oatdump binary compiled for Android M will not work for an OAT generated for Android L.

Once we compiled the right version of oatdump (in our case, for Android L), we could extract the information we needed:

$ oatdump boot.oat
  74: java.lang.String java.lang.String.replace(char, char) (dex_method_idx=3238)
      0x0000: const/4 v9, #+0
      0x0001: iget-object v2, v10, [C java.lang.String.value // field@2147
      0x0003: iget v1, v10, I java.lang.String.offset // field@2145
      0x0005: iget v0, v10, I java.lang.String.count // field@2143
      0x0007: move v4, v1
      0x0008: add-int v5, v1, v0
    OatMethodOffsets (offset=0x0150b2b8)
      code_offset: 0x01b1af11
      gc_map: (offset=0x015fc583)
    OatQuickMethodHeader (offset=0x01b1aef8)
      mapping_table: (offset=0x018e19e2)
      vmap_table: (offset=0x01a7daff)
      v4/r5, v2/r6, v5/r7, v3/r8, v10/r10, v1/r11, v65535/r15
      frame_size_in_bytes: 96
      core_spill_mask: 0x00008de0 (r5, r6, r7, r8, r10, r11, r15)
      fp_spill_mask: 0x00000000
    CODE: (code_offset=0x01b1af11 size_offset=0x01b1af0c size=400)...
      0x01b1af10: f5bd5c00  subs    r12, sp, #8192
      0x01b1af14: f8dcc000  ldr.w   r12, [r12, #0]
      suspend point dex PC: 0x0000
      GC map objects:  v10 (r10)
      0x01b1af18: e92d4de0  push    {r5, r6, r7, r8, r10, r11, lr}
      0x01b1af1c: b091      sub     sp, sp, #68
      0x01b1af1e: 9000      str     r0, [sp, #0]
      0x01b1af20: 468a      mov     r10, r1
      0x01b1af22: 921a      str     r2, [sp, #104]
      0x01b1af24: 931b      str     r3, [sp, #108]

We eventually noticed that the offset we computed with our previous formula and the offsets outputted by oatdump are "off" by 0x1000 (we are still not sure why exactly), thus making our final formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000 - 0x1000

This allowed us to resolve all targets references in the app we had:

0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a0ad91 ~> 0x1b1bd90 ~> void java.lang.StringBuffer.<init>(java.lang.String)
0x72a0e5d9 ~> 0x1b1f5d8 ~> java.lang.StringBuilder java.lang.StringBuilder.reverse()
0x72a0e809 ~> 0x1b1f808 ~> java.lang.String java.lang.StringBuilder.toString()
0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a09f11 ~> 0x1b1af10 ~> java.lang.String java.lang.String.replace(char, char)
0x72a0a781 ~> 0x1b1b780 ~> java.lang.String java.lang.String.substring(int, int)
0x72a0d919 ~> 0x1b1e918 ~> java.lang.StringBuilder java.lang.StringBuilder.append(java.lang.String)
0x72a0ac71 ~> 0x1b1bc70 ~> void java.lang.StringBuffer.<init>()
0x72a0ab49 ~> 0x1b1bb48 ~> java.lang.String java.lang.String.trim()
0x72a08971 ~> 0x1b19970 ~> boolean java.lang.String.equals(java.lang.Object)

With this info, it was then trivial to re-implement the check() function in python, which spit out the flag. Note that the first part of the binary does many simple (xor-like) operations on the arrays defined in the code, but this last part was definitively the most challenging one.

$ python check.py 
FLAG: 0ctf{1ea5n_2_rE_ART}

Long story short: reversing OAT is somehow possible, but additional info is required (mmap + boot.oat) if you don't want to guess "too much." Also, this challenge might have been much much harder (if not impossible) if some of the methods would have been called through vtables (hence several levels of indirection).

All relevant files can be found at this link. Hope you enjoyed it!

Sunday, September 20, 2015

CSAWCTF 2015 -- pcapin (forensic 150) write-up

This is the write-up for solving "pcapin", a challenge from CSAW CTF 2015. It was in the "forensic" category, and it was worth it 150 points....may I say, 150 points my ass!?! This felt like a 1337 points challenge...at least :D

So, we have a pcap (links to all files at the end of the post), and we know that it contains the dump of some sort of file transfer protocol, and that a "not so sophisticated" encryption was used.

Several hours (and people) later, we extracted the following info:
- The data that the server sent us is split in two "sessions," each terminated by the "END" string.
- The first session contains 8 packets of 68 bytes each.
- The second session contains 28 packets of 212 bytes each.
- The first 12 bytes of each packet identify the total length of the session, some control codes (we still don't understand most of them...), the number of packets, and a packet counter. As it turns out, all these bytes are useless.
- The remaining bytes of each packet contain the actual "data".
- The first session is a file listing (once decrypted, one of the listed file is "flag.png"...)
- The second session is the content of the flag.png file.
- Each packet is encrypted with a 2-byte xor key.

So, we focused on the second session, as it contains the flag.png file. To recap, we have 28 encrypted packets that, once decrypted and concatenated, will give us the content of the "flag.png".

As a first step, we guessed the keys for the first and last packet. Since some PNG-related keywords need to appear at the beginning of a PNG file (e.g., "PNG" or "IDAT" strings), it was trivial to get the first 2-byte key: 20543. Same goes for the last packet: in this case there were many repeated 2-byte string at the end of the packet, so we guessed there was some padding (likely \x00) at the end. This gave us the key for the last packet: 19564.

But what about the keys for the other 26 packets? Well....

The general idea was to bruteforce the key for each packet. But how can we guess what's the correct key? After all, the IDAT payload is zlib-copressed stuff, so everything, once decrypted, will look like random bytes...

...and after several hours, we came up with this trick: for each packet, we decrypt it with a key K, try to uncompress it, and look at the result. If it's impossible to uncompress it, then for sure K is not the right key. If we can uncompress it, we then check the payload and see if it looks like RGBA (RGB + alpha channel!). We were able to confirm that this trick might have worked given that the IDAT payload contained in the first packet followed the pattern, and that alpha was always set to zero!

We wrote an heuristic to check for this pattern for each of the "attempt", and we sort the candidate keys according to another heuristics. Few hours later, we got the PNG, which contained the flag :-)

These are the 28 keys: 20543 44829 21138 23618 15062 59478 13198 54610 4633 46710 41810 38097 56123 58392 52387 12251 26106 43868 15618 57633 1053 53731 53447 30269 24329 17183 6131 19564.

Here there are a bunch of links: github repooriginal pcap, and the code. And this is the final, decrypted PNG:


Tuesday, December 10, 2013

The "behind the scene" of DexWare, a DalvikVM based service for the iCTF 2013.

This write-up will describe the "behind the scene" of DexWare, a service I wrote for the iCTF 2013. To the best of my knowledge, this is the first service in the history of CTFs to be based on Dalvik-bytecode!! I hope this write-up will be a useful starting point for those who will attempt something similar!

You can find the source code and the compiled binaries on github (link). Also, feel free to ping me on twitter (@reyammer) for any questions.


It seems that many of the teams found this service particularly challenging to hack, even after discovering a possible attack vector. The fact that heavy usage of Java reflection was required to hack the service might be related to that :D. In any case, let me tell you, this was a pretty complex service to be set up as well, especially as it needed to run for 8+ hours, on 100+ VMs, at a risk of compromising the entire game if unstable.

These are the points I will touch in this write-up:

  • How to compile a DalvikVM for Linux/x86
  • How to actually run DEX bytecode on Linux/x86
  • Discussion about Dexware, the service
  • Quick overview on how to exploit the service
  • How to make the service reliable (bug in DexClassLoader?)

How to compile a DalvikVM for Linux/x86

The vulnerable box for the iCTF needed to be a classic Linux/x86 VM. Hence, to run a service like this, you first need a DalvikVM runnable on Linux/x86. Once you know how to do it, it's pretty straightforward.

After you download the entire AOSP repo, just fire the following commands:

$ cd <aospdir>
$ source build/envsetup.sh
$ lunch # and then select "aosp_x86-eng"
$ make -j8

You could actually compile only the parts related to the DalvikVM, but compiling everything on a moder machine doesn't take that much in any case. Once you have that, you will find two different directories in the "out" folder. The "host" directory, that contains everything related to the host (your current machine), and the "target" directory, that contains a bunch of files for the target. As we want to run the VM on the host itself, the file interesting to us are those ones in the "host" directory.

After the compilation is over, you will have your nice dalvikvm x86 executable!

$ file <aospdir>/out/host/linux-x86/bin/dalvikvm
dalvikvm: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Note that not all the files in the "host" directory are actually needed to run the VM. The files that I included on github (link) should be the smallest subset of files that is required.

How to run DEX bytecode on Linux/x86

Once you have the binary, it is not entirely straightforward to run pure Dalvik bytecode. Indeed, the DalvikVM executable expects to be run within the context of the Android OS, where many files are in specific directories and specific environment variables are set. Of course, the goal here is to run dalvik bytecode, not Android applications! A quick clarification to the readers not familiar with the Android ecosystem: Android applications are APK files, that, among many other components, contain the classes.dex file, the Dalvik bytecode. Here we are going to run only the actual code, while we'll forget about the remaining Android-related components. In fact, to run a full-fledge Android app, we would need a full-fledge Android OS with all its components running, something that is not going to happen given the many space/memory constraints we have in this setting (iCTF).

In any case, long story short, this is how I managed to run the VM. First, you need to create a "dalvik" bash script (full version here)

$ cat dalvik

CURRDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export ANDROID_ROOT=$CURRDIR/linux-x86

export ANDROID_DATA=/tmp/dalvik-data
mkdir -p $ANDROID_DATA/dalvik-cache

$CURRDIR/linux-x86/bin/dalvikvm \
:$ANDROID_ROOT/framework/apache-xml-hostdex.jar \

Once you have this script, you can actually run your piece of Dalvik-bytecode by first wrapping the .dex file in a .jar, and by then calling the script with "./dalvik -cp <jar_fp> <MainClass>", where <jar_fp> is the path to the JAR file, and MainClass is the name of the class that defines the main() method.

How to do that is pretty simple: you first compile your Java application into .class files. Then, you create a DEX wrapper in a JAR with dx:

$ dx --dex --output=dexware.jar *.class

DexWare, the service

The DexWare service (source code) was meant to be an innovative firwmare to manage super-secret chemical formulas related to uranium :-). Each of the formula was identified by a formula ID, and each of them was protected by a different password. Such formulas were the CTF's flags the teams needed to steal!

The service was maintaining a DB of the formulas in a hashmap field, and, in theory, a user could access to a formula only by knowing the associated password.

Of course, a vulnerability was added to the service, such that an attacker was able to steal the formula (the flag!) even without knowing the password.

Which vulnerability? Well, I wanted it to be a nice service where the teams need to exploit a real-world vulnerability. The decision was really simple: as our most recent paper (that is going to be presented at NDSS 2014, stay tuned!) is about code injection vulnerabilities in Android apps, that was a no-brainer. Gotta be a code injection vulnerability :-)

The service has a self-upgrade functionality. When invoked, the service goes through the list of files in a specific temporary directory, check for files with the .jar extension, load them, and execute the upgrade() method of the DexWareUpgrader class, if implemented. And how were the teams supposed to write a jar in the temp dir? Well, by a pure coincidence, a hidden functionality was added to the service so that it was possible to dump arbitrary files to that specific temporary directory. Easy peasy, isn't it?

The Exploit

Actually, even after knowing the attack vector, it is not that easy. In fact, a lot of Java reflective calls were necessary to get a hold on the DB formula field, and, to do that, the teams needed to access the (undocumented?) "this$0" field to first access the outer class (that had the formula DB field). As some nice write-ups on how to exploit this service already popped out around the net, I will stop with the details here, and I invite the interested readers to check those links out: link1, link2 :)

How to make the service reliable. Bug in DexClassLoader?

After developing my own exploit, I tried to stress-test my service to check whether the memory requirements were reasonable. And something interesting happened: after I run my exploit (together with benign traffic) for ~5K times, the DalvikVM was nicely and reliably segfaulting in my face. Awesome, isn't it? :/

As it turned out after a quick investigation, everything was crashing due to "Too many opened files" errors. Wait, what?

After a deeper investigation, I started to blame the DexClassLoader. In fact, at each run of the exploit, the service was creating a DexClassLoader object, that in turn would open the jar file, to then execute the upgrade method. What's the issue? It seems that the DexClassLoader does *never* close the loaded jar file, and that's why my service was going out of file handles!

I tried my best to force the DexClassLoader to close the associate handle, but nothing worked out (even after deleting the JAR, setting to null all the Java objects involved, and explicitly calling the garbage collector). I also checked in the actual implementation of DexClassLoader, but I failed to find any reference on how to close the loaded JAR.

As a workaround, I implemented a routine in the service that every once in a while was going through the JAR files in that temporary directory and, through JNI calls, manually closing all the handles. Dirty as sh*t, but pretty effective :) (source code)

I do understand that an innocuous DexClassLoader is not supposed to be abused the way I did, but, still, I think it's a bug :-)


All in all, that was a really interesting service to write, and it seems the teams enjoyed it as well. For the future, I'm planning to have a service where actual Dalvik bytecode-related skills are required: in fact, the key technical challenges were related to perform tons of Java reflective calls, while no specific Dalvik technical skill was required. I'll try to pull something out for the next iCTF :-)

As always, any sort of feedback is welcome! Let's keep in touch and feel free to ping me on twitter at @reyammer. Finally, if you played the CTF and you solved this challenge: remote high-five man, awesome job :-)

Monday, July 29, 2013

ShellNoob 2.0 is out!

ShellNoob 2.0 is out!! You might now ask with a mix of suspicion and astonishment:

what whaaat?? -- http://www.youtube.com/watch?v=a1Y73sPHKxw

Yep, you got it right! A new version is out!

For those who haven't read the first blog post, ShellNoob is a shellcode writing toolkit that helps you dealing with the boring, error-prone, and painful steps, leaving only the fun part to you! At least that's the goal :)

From when I published the first version (exactly three months ago!) a lot of stuff happened. First, I'll be lucky enough to have a chance to give a demo and to present ShellNoob at Black Hat USA Arsenal! Here is the announcement/abstract: http://www.blackhat.com/us-13/arsenal.html#Fratantonio. Second, a shitton of new features got added!

Here it is the new updated feature list:
  • convert shellcode between different formats and sources. Formats currently supported: asm, bin, hex, obj, exe, C, python, ruby, pretty, safeasm, completec, shellstorm. (All the details are in the README.)
  • interactive asm-to-opcode conversion (and viceversa) mode. This is useful when you cannot use specific bytes in the shellcode and you want to figure out if a specific assembly instruction will cause problems.
  • support for both ATT & Intel syntax. Check the --intel switch.
  • support for 32 and 64 bits (when playing on x86_64 machine). Check the --64 switch.
  • resolve syscall numbers, constants, and error numbers (now implemented for real! :-)).
  • portable and easily deployable (it only relies on gcc/as/objdump and python). And it just one self-contained python script!
  • in-place development: you run ShellNoob directly on the target architecture!
  • built-in support for Linux/x86, Linux/x86_64, Linux/ARM, FreeBSD/x86, FreeBSD/x86_64.
  • "prepend breakpoint" option. Check the -c switch.
  • read from stdin / write to stdout support (use "-" as filename)
  • uber cheap debugging: check the --to-strace and --to-gdb option!
  • Use ShellNoob as a Python module in your scripts!
  • Verbose mode shows the low-level steps of the conversion: useful to debug / understand / learn!
  • Extra plugins: binary patching made easy with the --file-patch, --vm-patch, --fork-nopper options!

The source code and a (hopefully enough) informative README (with all the details, use cases, etc) is on github: https://github.com/reyammer/shellnoob. Check it out! Please send all the bug reports and swears to yanick [AT] cs.ucsb.edu / @reyammer. All feedback is welcome :-)

Saturday, June 22, 2013

Defcon CTF Quals 2013 – \xff\xe4\xcc 5 (lena) write-up

This is my write-up for the Defcon CTF Quals 2013 - \xff\xe4\xcc 5 (lena). I partecipated to the quals with the Shellphish team (we ended up in 7th place!), and I needed to spend an entire night with the great @cavedon (one of the Shellphish's secret weapon) to solve this challenge. Also, we probably wouldn't have made it without @adamdoupe, that monitored our health conditions when we were trying to finalize our exploit, during the following morning.  Needless to say, solving this was not easy (only 8 teams solved it!), but it was definitively super-fun and one of the best CTF challenges I've played with so far.

Alright, let's get it started. The scenario is the usual one: we have to exploit a binary (Linux, x86, 32bits) called "lena_server" that is running on a remote machine controlled by the organizers. As this challenge is belong [to us and] to the "shellcode" category, we already knew that the binary was supposed to get "something" in input, that was later processed and "somehow" executed.

This write-up is split in two parts: 1) our journey reversing the sh*t out of lena; 2) how we wrote our exploit.

Reversing the Madness

We soon figured out that the program expects a BMP as input. The first thing it does is to perform these simple steps on the BMP's header:
  • check that the two first bytes are "BM"
  • read the following four bytes and interpret them as the BMP's size
  • sanity checks against some bytes in the header (for example, bytes 0x1a-0x1b must be 0x0001)
  • checks that the "bits per pixel" is set to 24 (i.e., standard RGB)
  • checks that the width and the height is less than 0x200 and multiple of 8 (actually, we later figured out that their BMP data's parser was also assuming a width of exactly 0x200...).

From pixels to bits

The real madness is now about to begin. As we are going to deal with some numbers, I'll directly use as references the values from our final exploit: a BMP of size 512x80.

So, after checking the constraints against the BMP header, the executable starts to "process" the actual BMP data. It first extracts 8x8 pixels subblocks starting from the top-left corner of the BMP. Actually, for each 8x8 pixels, the executable "extracts" three different 8x8 matrices, one per channel.  That means that in our case, the binary extracts a total of (512/8)*(80/8)* 3 = 1920 8x8 subblocks.

For each of these subblocks, the binary then computes (what it seems to be) its discrete cosine transformation (by calling the function SimpleDCT::DCT2Calc8x8). That is, the output of this step is still a 8x8 matrix, but we are now in the frequency domain!! #wtf

This new matrix is then passed as argument to the function DCTCoefficient::DecodeDCT that computes some sort of statistics related to the entropy of the original 8x8 pixels. This eventually returns a single bit as output, 0 for low entropy, 1 for high entropy (at least, this is what we understood :)).

This bit is then passed to the DecodeDataStream::AddDataBit function that updates an internal data structure. As I mentioned, these steps are performed for each 8x8 subblock (1920 times in our case).  From a high-level point of view, such data structure contains two buffers, that I will refer to as B1 and B2. At each iteration, the current bit is appended to B1. When the buffer B1 happens to contain 256 bits (32 bytes), the following additional steps are performed:
  • the 32 bytes in B1 are passed as argument to a function called DecodeDataStream::DoDecodeBlockEv, that performs a "decoding" step (at that point, we had no idea on what that function was doing). Its output are 32 bytes, that in the general case differ from the 32 bytes in input.
  • some obscure checks are performed on the output. If these "checks" are not satisfied, those 32 bytes are somehow modified or, in the worst case, the program quits.
  • if the "DoDecodeBlockEv" function returns without errors, the first 16 bytes of the output are appended to buffer B2, while the latter 16 bytes are discarded.
  • the content of buffer B1 is cleared.
The good news was that, even if we had no idea on what the "decoding" step and the obscure checks were actually doing, we started to see the light: in fact, by quickly reversing the remaining part of the executable, we realized that if the program is able to process without errors all the 1920 bits in input, the content of the buffer B2 is later directly executed!

Reed-Solomon error correction codes

It turned out that the function "DoDecodeBlockEv" was actually implementing the decoding step of a Reed-Solomon error correction code. From wikipedia:
"Reed–Solomon (RS) codes are non-binary cyclic error-correcting codes. They described a systematic way of building codes that could detect and correct multiple random symbol errors. By adding t check symbols to the data, an RS code can detect any combination of up to t erroneous symbols, or correct up to floor(t/2) symbols" 
When we understood what that meant, we realized how miserable our life was going to be: the function "DoDecodeBlockEv" applies a RS decoding step on each 32 bytes of buffer B1, by considering the first 16 bytes as the actual data and the remaining 16 bytes as "error correction code". In other words, the function uses the latter 16 bytes to self-correct (!!!) the first 16 bytes, our soon-to-be shellcode!! #wtf²

Clearly, we needed to make sure that each set of 32 bytes in B1 were the RS encoded version of 16 bytes of our shellcode. We first found some python implementation of this encoding step, but they were somehow wrong, in the sense that the decoding of the encoding was not returning the original input (and as we don't know sh*t about this stuff, we were not able to "debug it").

Eventually, we found a Matlab help page (yep, we were that desperate..) talking about a RS-encode function rsenc. Even if we were naturally adverse to rely on Matlab for our exploit, we decided to give it a try. We quickly got a Matlab instance running, we "encoded" the 16 bytes "AAAAAAAAAAAAAAAA" string, and we got as output the 32 bytes string "AAAAAAAAAAAAAAAA<other 16 bytes>". We then took this string, and with some gdb-leetnees we made the "lena" executable decode it. With our great surprise, the decoding step returned exactly the original 16 As. That is, the "<other 16 bytes>" were a valid error correcting code for the first 16 bytes! It worked™.

Generating the exploit

After a long session of "hurrah" and high-fives all over the place, we started to smell pwnage in the air. We began our journey to victory by summing up what was happening to our poor BMP in input:
  • the BMP's header is checked against several constraints.
  • the BMP's payload is retrieved and decomposed in matrices of 8x8 (three for each 8x8 pixels subblock).
  • for each of the subblock, a new 8x8 matrix is computed by calculating its DCT.
  • for each new 8x8 subblock, the function "DCTCoefficient::DecodeDCT" returns a bit, whose value depends on the original 8x8 pixels entropy.
  • each of these bits is appended to the buffer B1.
  • when B1 contains 32bytes, these 32bytes are decoded (function "DoDecodeBlockEv") following the Reed-Solomon error correction code scheme.
  • if everything is fine, the first 16 bytes are appended to B2.
  • When all the 8x8 subblocks are processed, the content of B2 is executed.
At this point, we just needed to generate a BMP such that the final content of buffer B2 was actually executable. Clearly, our best shot was go backward: we took our standard open-read-write shellcode, and we then wrote a script to invert all those crazy transformation steps (as our Matlab skills are ridiculously poor, we also had Python scripts to generate few Matlab scripts: leetness++). We started with a ~90 bytes shellcode, padded to 112 bytes; we then obtained 224 bytes (1792 bits) after the RS encoding step with Matlab's rsenc, that we needed to pad to 1920 bits. Finally, as the width of the BMP needed to be 512 (remember their parser's assumption?), we used the 1920 bits to generate a 512x80 BMP (3 bits per pixel), that lead to our exploit:

That was not straightforward, but the result we obtained probably represents the highest form of artistic production that a bunch of computer scientists could ever hope for. Of course, the low-level details on how we actually generated that are left as an exercise for the reader ;)

That's all folks! We really (really!) liked this challenge, and we renew our big kudos to the @LegitBS_CTF crew for this great CTF!

Yanick / @reyammer

Wednesday, June 12, 2013

What The Fork: how to immediately block *any* Android device

What if an unprivileged Android app could lock, instantaneously, any Android device out there? What if such an app exists and is also really simple to implement?

A few months ago, Antonio and I stumbled upon a paper titled Would You Mind Forking This Process? A Denial of Service attack on Android. In this paper, the authors describe a vulnerability they discovered related to Android's Zygote that could be exploited by mounting a DoS (Denial-of-Service) attack: this resulted in the target device becoming completely unresponsive after a minute or so. Without going too much into the details, the vulnerability was due to the world-writable permission access to the Zygote's socket, from which Zygote itself listens for "requests to fork new processes". With this technique, they were able to make Zygote fork a high number of processes, and that was enough to block the device.

Their work got a lot of coverage and the vulnerability they found (CVE-2011-3918) has been considered as critical (CVSS score of 7.8). Eventually, even Google acknowledged the bug by committing a fix.

At that point, Antonio asked me "how about a fork-bomb? Would that work?". To our surprise, the answer to that question is: yes, definitively yes!

We found that the most simple implementation of a fork bomb is able to instantaneously block any Android device we have tested it on, regardless of the phone/OS version.

Up to now, we verified that our exploit works on Google's flagship device (Nexus 4 with Android 4.2.2) and on the following other ones:
  • Nexus 4 - Android 4.2.1
  • Nexus S - Android 4.2.1 (Cyanogen mod 10.1)
  • Galaxy Nexus - Android 4.2.2 (Cyanogen mod 10.1.0-RC5)
  • Galaxy Nexus - Android 4.1.1
  • Galaxy Nexus - Android 4.0.4
  • Samsung Galaxy Tab 10.1 - Android 3.1
  • Motorola Backflip - Android 2.3.7 (Cyanogen mod 7)
  • Emulator running Android - Android 4.1.2

We immediately contacted the Android security team (on Feb 7th, 2013), and received a response shortly after: in their opinion, this does not constitute a security issue, because it is just a local DoS and the user can somehow regain control of his device (by rebooting the phone).

We found their answer quite interesting. In fact, we believe that our exploit is strictly more powerful that the previously disclosed one, as in this case the device blocks immediately, and not after a minute or so. Also, the impact that such an issue might have is much higher, as there does not seem to be an easy patch to fix this problem. This naturally raised some questions: did Google patch that vulnerability only because it was a simple fix? Or maybe the patch was supposed to fix a different more serious (undisclosed) bug? As we have not heard anything back (we sent our last email to them on Feb 10th, 2013), we have no idea. In any case, we are surely not the first ones to come up with a local DoS attack (for example, check this nice trick from DexLab), but it seems that Google just does not care that much about this.

We think that local DoS is really bad (at least bad enough to commit a fix, especially when it's simple). First, this violates one of the core principle on the Android design: no application, by default, should have the permission to perform any operation that would adversely impact other applications, the operating system, or the user. Second, what if the attack starts during the night, when the user is relying on his phone for the alarm clock? That would be quite unfortunate. And what if our nasty app starts at every boot, making the device completely unusable from the beginning? Now, if you are one of those few guys who reads a blog post like this, you probably know your way around to delete the annoying app, but what if you are not?

That being said, our uber simple forkbomb app is available on github at: https://github.com/reyammer/android-forkbomb. Feel free to try it out, but be ready to pull out your battery :-)

For any questions, please feel free to drop us an email at yanick@cs.ucsb.edu and antoniob@cs.ucsb.edu.

Monday, April 29, 2013

ShellNoob 1.0 - a shellcode writing toolkit

Today I'm really happy to publicly release ShellNoob (and to publish my first blog post :-))

During the many CTFs I played, there always has been the need to manually write some shellcode (yep, most of time Metasploit is not enough, even if you are lucky and you get a working shellcode...)

Now, writing shellcode is always super fun, but some parts are extremely boring and error prone. And after googling for the n-th time "how to <put-anything-you-like-here>", I just got tired and I wrote shellnoob.py, a simple Python script that makes some boring steps less boring.

I bet that there are tons of similar scripts around the web, but I never found anything that had all the features I wanted. If you have suggestions, please ping me!

Alright, let's go to the meat. This is the list of features:
  • convert shellcode between different formats (currently supported: asm, bin, hex, obj, exe, C, python, ruby, pretty)
  • interactive opcode-to-binary conversion (and viceversa) mode. This is useful when you cannot use specific bytes in the shellcode.
  • resolve syscall numbers and constants (not exactly implemented yet :-))
  • portable: it only relies on gcc/as/objdump and python.
  • easily deployable: it's just one python file!
  • in-place development: you run ShellNoob directly on the target architecture!
  • other options: prepend breakpoint, 32bit/64bit switch.
  • read from stdin / write to stdout support (use "-" as filename)

Everything (i.e., the code, use cases, etc) is on github: https://github.com/reyammer/shellnoob

Check it out! Feedback, comments, and contributions are more than welcome!