Tuesday, March 15, 2016

From Android ART (binary-only) to DEX? Yes, we can!™ (kinda)

This is a write-up for the 0ctf 2016 quals "State of the ART" mobile/Android challenge worth 5 points. We (Shellphish) were one of the only three teams that solved it, and since I haven't seen any write-up on this, here is mine! Major props to @_antonio_bc_ and @subwire who heavily worked on this with me :)

Alright, here is the challenge. We were given one tar containing three files:

1) mmaps of a process running an Android app
2) output of dex2oat command run over the Android app's Dalvik bytecode
3) boot.oat

In recent Android versions, an app's Dalvik bytecode is converted into an OAT file (an ELF binary file) when the app is first installed. dex2oat is the program in charge of this process. More information on the OAT format can be found here. The boot.oat file is the OAT related to the main components of the Android framework.

We checked the output of the dex2oat command, and we found two classes related to the Android app: MainActivity.onCreate() and MainActivity.check(String s).

We then guessed that to solve the challenge we needed to reverse the check() function and that the flag would have been the string s that makes the check() function return 1.

Normally, the OAT format (and the output of the dex2oat command) includes the generated binary as well as the Dalvik bytecode of the app. However, the authors of the challenge removed the Dalvik bytecode part, making this challenge very interesting.

Thus the question: from the binary-only part of an OAT, can we reconstruct the Dalvik bytecode?

We reassambled the binary from the log file and, at a first look, it was clear that the app defines and manipulates a series of arrays and does some operation over their elements.

We encountered three main challenges to fully reconstruct what was going on:

1) It's low-level ARM assembly code. For example, a new-array v0, v1, byte[] Dalvik bytecode instruction looks like:

      0x00371eb6: f8d9e11c ldr.w   lr, [r9, #284]  ; pAllocArrayResolved
      0x00371eba: 9900     ldr     r1, [sp, #0]
      0x00371ebc: 2606     movs    r6, #6
      0x00371ebe: 1c32     mov     r2, r6
      0x00371ec0: f64e0020 movw    r0, #59424
      0x00371ec4: f2c7005b movt    r0, #28763
      0x00371ec8: 47f0     blx     lr

Register R0 contains a reference to the type of the array (i.e., byte[]), while register R2 contains the size of the array.

As another example, a fill-array-data v0, +6 Dalvik bytecode instruction (which loads into an already-created array a series of bytes at a given offset) looks like:

      0x00371ea8: f8d9e190 ldr.w   lr, [r9, #400]  ; pHandleFillArrayData
      0x00371eac: 4682     mov     r10, r0
      0x00371eae: 4650     mov     r0, r10
      0x00371eb0: f20f6144 adr     r1, +216
      0x00371eb4: 47f0     blx     lr

2) There is a lot of automatically-generated code that is not directly related to the Dalvik bytecode, which makes reversing the binary harder. For example, since this app was playing with arrays, there are "bound checks" all over the place:

      0x00372410: f8d9e238 ldr.w   lr, [r9, #568]  ; pThrowArrayBounds
      0x00372414: 1c01     mov     r1, r0
      0x00372416: 1c28     mov     r0, r5
      0x00372418: 47f0     blx     lr

3) When one method invokes a Java method in the Android framework, we only see a jump to an address in memory...how can we know where the app is jumping to?

For example:

      0x00372294: f6497e11 movw    lr, #40721
      0x00372298: f2c72ea0 movt    lr, #29344
      0x0037229c: f24520c0 movw    r0, #21184
      0x003722a0: f2c70044 movt    r0, #28740
      0x003722a4: 4641     mov     r1, r8
      0x003722a6: 2253     movs    r2, #83
      0x003722a8: 2365     movs    r3, #101
      0x003722aa: 47f0     blx     lr

Here the control flow jumps to address 0x72a09f11. The memory map information clearly tells us that we are jumping in boot.oat (the Android framework): 

$ cat mmap.txt

703d3000-70eee000 rw-p 00000000 b3:17 185108     /data/dalvik-cache/arm/system@framework@boot.art
70eee000-7298b000 r--p 00000000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat
7298b000-73e43000 r-xp 01a9d000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat
73e43000-73e44000 rw-p 02f55000 b3:17 185109     /data/dalvik-cache/arm/system@framework@boot.oat

and we can compute the offset in the boot.oat file with the following formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000

(Note: the "-1" is for fixing the ARM thumb-related "bit")

For our example: 0x72a09f11 ~> 0x1b1bf10.

But how can we know which method in the framework is invoked?

It turns out that an OAT file contains the "Java method" -- "offset in binary" mapping we are looking for. To dump this information, we used oatdump, which is a tool that comes with AOSP. Note that oatdump and the OAT format are very target-specific, and an oatdump binary compiled for Android M will not work for an OAT generated for Android L.

Once we compiled the right version of oatdump (in our case, for Android L), we could extract the information we needed:

$ oatdump boot.oat
  74: java.lang.String java.lang.String.replace(char, char) (dex_method_idx=3238)
      0x0000: const/4 v9, #+0
      0x0001: iget-object v2, v10, [C java.lang.String.value // field@2147
      0x0003: iget v1, v10, I java.lang.String.offset // field@2145
      0x0005: iget v0, v10, I java.lang.String.count // field@2143
      0x0007: move v4, v1
      0x0008: add-int v5, v1, v0
    OatMethodOffsets (offset=0x0150b2b8)
      code_offset: 0x01b1af11
      gc_map: (offset=0x015fc583)
    OatQuickMethodHeader (offset=0x01b1aef8)
      mapping_table: (offset=0x018e19e2)
      vmap_table: (offset=0x01a7daff)
      v4/r5, v2/r6, v5/r7, v3/r8, v10/r10, v1/r11, v65535/r15
      frame_size_in_bytes: 96
      core_spill_mask: 0x00008de0 (r5, r6, r7, r8, r10, r11, r15)
      fp_spill_mask: 0x00000000
    CODE: (code_offset=0x01b1af11 size_offset=0x01b1af0c size=400)...
      0x01b1af10: f5bd5c00  subs    r12, sp, #8192
      0x01b1af14: f8dcc000  ldr.w   r12, [r12, #0]
      suspend point dex PC: 0x0000
      GC map objects:  v10 (r10)
      0x01b1af18: e92d4de0  push    {r5, r6, r7, r8, r10, r11, lr}
      0x01b1af1c: b091      sub     sp, sp, #68
      0x01b1af1e: 9000      str     r0, [sp, #0]
      0x01b1af20: 468a      mov     r10, r1
      0x01b1af22: 921a      str     r2, [sp, #104]
      0x01b1af24: 931b      str     r3, [sp, #108]

We eventually noticed that the offset we computed with our previous formula and the offsets outputted by oatdump are "off" by 0x1000 (we are still not sure why exactly), thus making our final formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000 - 0x1000

This allowed us to resolve all targets references in the app we had:

0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a0ad91 ~> 0x1b1bd90 ~> void java.lang.StringBuffer.<init>(java.lang.String)
0x72a0e5d9 ~> 0x1b1f5d8 ~> java.lang.StringBuilder java.lang.StringBuilder.reverse()
0x72a0e809 ~> 0x1b1f808 ~> java.lang.String java.lang.StringBuilder.toString()
0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a09f11 ~> 0x1b1af10 ~> java.lang.String java.lang.String.replace(char, char)
0x72a0a781 ~> 0x1b1b780 ~> java.lang.String java.lang.String.substring(int, int)
0x72a0d919 ~> 0x1b1e918 ~> java.lang.StringBuilder java.lang.StringBuilder.append(java.lang.String)
0x72a0ac71 ~> 0x1b1bc70 ~> void java.lang.StringBuffer.<init>()
0x72a0ab49 ~> 0x1b1bb48 ~> java.lang.String java.lang.String.trim()
0x72a08971 ~> 0x1b19970 ~> boolean java.lang.String.equals(java.lang.Object)

With this info, it was then trivial to re-implement the check() function in python, which spit out the flag. Note that the first part of the binary does many simple (xor-like) operations on the arrays defined in the code, but this last part was definitively the most challenging one.

$ python check.py 
FLAG: 0ctf{1ea5n_2_rE_ART}

Long story short: reversing OAT is somehow possible, but additional info is required (mmap + boot.oat) if you don't want to guess "too much." Also, this challenge might have been much much harder (if not impossible) if some of the methods would have been called through vtables (hence several levels of indirection).

All relevant files can be found at this link. Hope you enjoyed it!