Alright, here is the challenge. We were given one tar containing three files:
1) mmaps of a process running an Android app
2) output of dex2oat command run over the Android app's Dalvik bytecode
3) boot.oat
In recent Android versions, an app's Dalvik bytecode is converted into an OAT file (an ELF binary file) when the app is first installed. dex2oat is the program in charge of this process. More information on the OAT format can be found here. The boot.oat file is the OAT related to the main components of the Android framework.
We checked the output of the dex2oat command, and we found two classes related to the Android app: MainActivity.onCreate() and MainActivity.check(String s).
We then guessed that to solve the challenge we needed to reverse the check() function and that the flag would have been the string s that makes the check() function return 1.
Normally, the OAT format (and the output of the dex2oat command) includes the generated binary as well as the Dalvik bytecode of the app. However, the authors of the challenge removed the Dalvik bytecode part, making this challenge very interesting.
Thus the question: from the binary-only part of an OAT, can we reconstruct the Dalvik bytecode?
We reassambled the binary from the log file and, at a first look, it was clear that the app defines and manipulates a series of arrays and does some operation over their elements.
We encountered three main challenges to fully reconstruct what was going on:
1) It's low-level ARM assembly code. For example, a new-array v0, v1, byte[] Dalvik bytecode instruction looks like:
0x00371eb6: f8d9e11c ldr.w lr, [r9, #284] ; pAllocArrayResolved
0x00371eba: 9900 ldr r1, [sp, #0]
0x00371ebc: 2606 movs r6, #6
0x00371ebe: 1c32 mov r2, r6
0x00371ec0: f64e0020 movw r0, #59424
0x00371ec4: f2c7005b movt r0, #28763
0x00371ec8: 47f0 blx lr
Register R0 contains a reference to the type of the array (i.e., byte[]), while register R2 contains the size of the array.
As another example, a fill-array-data v0, +6 Dalvik bytecode instruction (which loads into an already-created array a series of bytes at a given offset) looks like:
0x00371ea8: f8d9e190 ldr.w lr, [r9, #400] ; pHandleFillArrayData
0x00371eac: 4682 mov r10, r0
0x00371eae: 4650 mov r0, r10
0x00371eb0: f20f6144 adr r1, +216
0x00371eb4: 47f0 blx lr
2) There is a lot of automatically-generated code that is not directly related to the Dalvik bytecode, which makes reversing the binary harder. For example, since this app was playing with arrays, there are "bound checks" all over the place:
0x00372410: f8d9e238 ldr.w lr, [r9, #568] ; pThrowArrayBounds
0x00372414: 1c01 mov r1, r0
0x00372416: 1c28 mov r0, r5
0x00372418: 47f0 blx lr
3) When one method invokes a Java method in the Android framework, we only see a jump to an address in memory...how can we know where the app is jumping to?
For example:
0x00372294: f6497e11 movw lr, #40721
0x00372298: f2c72ea0 movt lr, #29344
0x0037229c: f24520c0 movw r0, #21184
0x003722a0: f2c70044 movt r0, #28740
0x003722a4: 4641 mov r1, r8
0x003722a6: 2253 movs r2, #83
0x003722a8: 2365 movs r3, #101
0x003722aa: 47f0 blx lr
Here the control flow jumps to address 0x72a09f11. The memory map information clearly tells us that we are jumping in boot.oat (the Android framework):
$ cat mmap.txt
703d3000-70eee000 rw-p 00000000 b3:17 185108 /data/dalvik-cache/arm/system@framework@boot.art
70eee000-7298b000 r--p 00000000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat
7298b000-73e43000 r-xp 01a9d000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat
73e43000-73e44000 rw-p 02f55000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat
offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000
(Note: the "-1" is for fixing the ARM thumb-related "bit")
For our example: 0x72a09f11 ~> 0x1b1bf10.
But how can we know which method in the framework is invoked?
It turns out that an OAT file contains the "Java method" -- "offset in binary" mapping we are looking for. To dump this information, we used oatdump, which is a tool that comes with AOSP. Note that oatdump and the OAT format are very target-specific, and an oatdump binary compiled for Android M will not work for an OAT generated for Android L.
Once we compiled the right version of oatdump (in our case, for Android L), we could extract the information we needed:
$ oatdump boot.oat
[...]
74: java.lang.String java.lang.String.replace(char, char) (dex_method_idx=3238)
DEX CODE:
0x0000: const/4 v9, #+0
0x0001: iget-object v2, v10, [C java.lang.String.value // field@2147
0x0003: iget v1, v10, I java.lang.String.offset // field@2145
0x0005: iget v0, v10, I java.lang.String.count // field@2143
0x0007: move v4, v1
0x0008: add-int v5, v1, v0
[...]
OatMethodOffsets (offset=0x0150b2b8)
code_offset: 0x01b1af11
gc_map: (offset=0x015fc583)
OatQuickMethodHeader (offset=0x01b1aef8)
mapping_table: (offset=0x018e19e2)
vmap_table: (offset=0x01a7daff)
v4/r5, v2/r6, v5/r7, v3/r8, v10/r10, v1/r11, v65535/r15
QuickMethodFrameInfo
frame_size_in_bytes: 96
core_spill_mask: 0x00008de0 (r5, r6, r7, r8, r10, r11, r15)
fp_spill_mask: 0x00000000
CODE: (code_offset=0x01b1af11 size_offset=0x01b1af0c size=400)...
0x01b1af10: f5bd5c00 subs r12, sp, #8192
0x01b1af14: f8dcc000 ldr.w r12, [r12, #0]
suspend point dex PC: 0x0000
GC map objects: v10 (r10)
0x01b1af18: e92d4de0 push {r5, r6, r7, r8, r10, r11, lr}
0x01b1af1c: b091 sub sp, sp, #68
0x01b1af1e: 9000 str r0, [sp, #0]
0x01b1af20: 468a mov r10, r1
0x01b1af22: 921a str r2, [sp, #104]
0x01b1af24: 931b str r3, [sp, #108]
We eventually noticed that the offset we computed with our previous formula and the offsets outputted by oatdump are "off" by 0x1000 (we are still not sure why exactly), thus making our final formula:
offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000 - 0x1000
This allowed us to resolve all targets references in the app we had:
0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a0ad91 ~> 0x1b1bd90 ~> void java.lang.StringBuffer.<init>(java.lang.String)
0x72a0e5d9 ~> 0x1b1f5d8 ~> java.lang.StringBuilder java.lang.StringBuilder.reverse()
0x72a0e809 ~> 0x1b1f808 ~> java.lang.String java.lang.StringBuilder.toString()
0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])
0x72a09f11 ~> 0x1b1af10 ~> java.lang.String java.lang.String.replace(char, char)
0x72a0a781 ~> 0x1b1b780 ~> java.lang.String java.lang.String.substring(int, int)
0x72a0d919 ~> 0x1b1e918 ~> java.lang.StringBuilder java.lang.StringBuilder.append(java.lang.String)
0x72a0ac71 ~> 0x1b1bc70 ~> void java.lang.StringBuffer.<init>()
0x72a0ab49 ~> 0x1b1bb48 ~> java.lang.String java.lang.String.trim()
0x72a08971 ~> 0x1b19970 ~> boolean java.lang.String.equals(java.lang.Object)
With this info, it was then trivial to re-implement the check() function in python, which spit out the flag. Note that the first part of the binary does many simple (xor-like) operations on the arrays defined in the code, but this last part was definitively the most challenging one.
$ python check.py
FLAG: 0ctf{1ea5n_2_rE_ART}
Long story short: reversing OAT is somehow possible, but additional info is required (mmap + boot.oat) if you don't want to guess "too much." Also, this challenge might have been much much harder (if not impossible) if some of the methods would have been called through vtables (hence several levels of indirection).
All relevant files can be found at this link. Hope you enjoyed it!