Over The Line

Wednesday, November 15, 2017

Some thoughts on Google's move to restrict accessibility service

In the last couple of days, many reports (see XDA developers, The Hacker News) appeared about Google emailing apps developers about new restriction affecting the Android accessibility service (a11y, in short). The gist: in 30 days, apps should use a11y only to implement features to assist users with disabilities; Apps misusing a11y (even for benign purposes) will get kicked out from the store.

I am one of the many researchers that showed how a11y poses severe issues for Android security. I did it by showing how you could mount a number of devastating attacks, and other researchers recently found real-world malware abusing them (see Zimperium's post on Clicking Bot Apps, or TrendMicro's post on a11y abuse).

It has been a community effort, and Google is finally taking actions. But I don't understand how this Google's move is going to help the ecosystem. Here there are my thoughts.

The good guys and the ecosystem are negatively affected

Many popular apps use a11y. Apps like LastPass, antivirus apps, app lockers, they all use a11y to (hackishly) implement features that it would be otherwise impossible to implement. Among the apps we looked at, none use a11y to assist people with disabilities.
The good guys *need* to comply with the new policy: there are companies behind these products, and they can't afford to mess with Google. Either they adapt, or their apps are out of the store.
If the policy will actually be enforced, some apps are damaged but are still somehow usable (like LastPass: people can always copy/paste their passwords 'manually'), but other apps will need to shutdown.
As a reaction, I fear that the good guys may start offering "if you want all our a11y-backed features, please download the APK from our website." This direction is bad. It will push users to unsafe behaviors: if people get used to "install an APK from a website," many more people will have 'side loading' enabled, and phishing websites will have more chances to lure users to install malicious repackaged apps ("which website should I visit to get the latest version of XYZ?") -- after all, it is "OK" to install random APKs from a website. The lack of a centralized 'store' is how tons of Windows users end up with crap on their machines. The Play Store is far from being malware-free, but it's much better than having users Googling for 'the right website' to get app XYZ.

Bad guys probably don't care much

The bad guys hosting apps on the Play Store will just need to find an excuse to claim "we use a11y for assisting people with disabilities." How can Google check if they are lying?
Given the latest news on WhatsApp/Facebook clones (see Motherboard article), it's clear that Google is currently lagging behind effective malware detection tools: a single researcher was able to find all these clones, without having access to Google's infrastructure. Now, this 'clone detection task' may be difficult, but I argue it is much simpler than determining the intent of a11y features from a behavioral point of view. So how Google will be able to determine, at scale, whether a a11y feature is ok or not, is a mystery to me.
Many apps are already installed from third-party markets (especially outside the US). Bad guys hosting their malware there are not affected.

Can't LastPass use the new AutoFill API?

Yes. But, unfortunately, the AutoFill API is only available in Android's latest version, Android O, which, according to Google's dashboard, only 0.3% of devices currently run. Yesterday LastPass published a short blog post, where they state that "they are working with Google." The post says that there is "no immediate impact to our Android users". Well, we knew already that the kickout party will be in ~30 days, so this doesn't say much. I hope they will find a deal.

Can't Google add a backward-compatibility layer for older versions?

Without modifying the framework, I have no idea how they could do it. Tricks you can do with a11y clearly bypass many security barriers. *If* there is a way to do this without framework modifications, then we'll see more papers on how to abuse this ;-) If there is no way to do it, then we need framework modifications. Which millions of devices will never see anyways. Auch.

What's Google's plan?

Honestly, I have no idea. I somehow hope that Google will actually be quite lenient with well known apps. But how can they select which apps are OK? Do they need to have 1M+ installs? How do you set the threshold? I don't know. Don't get me wrong, there are tons of super smart folks working at Google. I'm sure they have a plan. Here I'm just saying, I have no idea what's going on :-)

Update

It turns out that the Android folks changed their minds. According to this report on ArsTechnica, Google paused its decision to ban a11y-related apps for another 30 days while it considers responsible uses of accessibility services.

Tuesday, March 15, 2016

From Android ART (binary-only) to DEX? Yes, we can!™ (kinda)

This is a write-up for the 0ctf 2016 quals "State of the ART" mobile/Android challenge worth 5 points. We (Shellphish) were one of the only three teams that solved it, and since I haven't seen any write-up on this, here is mine! Major props to @_antonio_bc_ and @subwire who heavily worked on this with me :)

Alright, here is the challenge. We were given one tar containing three files:

1) mmaps of a process running an Android app
2) output of dex2oat command run over the Android app's Dalvik bytecode
3) boot.oat

In recent Android versions, an app's Dalvik bytecode is converted into an OAT file (an ELF binary file) when the app is first installed. dex2oat is the program in charge of this process. More information on the OAT format can be found here. The boot.oat file is the OAT related to the main components of the Android framework.

We checked the output of the dex2oat command, and we found two classes related to the Android app: MainActivity.onCreate() and MainActivity.check(String s).

We then guessed that to solve the challenge we needed to reverse the check() function and that the flag would have been the string s that makes the check() function return 1.

Normally, the OAT format (and the output of the dex2oat command) includes the generated binary as well as the Dalvik bytecode of the app. However, the authors of the challenge removed the Dalvik bytecode part, making this challenge very interesting.

Thus the question: from the binary-only part of an OAT, can we reconstruct the Dalvik bytecode?

We reassambled the binary from the log file and, at a first look, it was clear that the app defines and manipulates a series of arrays and does some operation over their elements.

We encountered three main challenges to fully reconstruct what was going on:

1) It's low-level ARM assembly code. For example, a new-array v0, v1, byte[] Dalvik bytecode instruction looks like:

0x00371eb6: f8d9e11c ldr.w lr, [r9, #284] ; pAllocArrayResolved

0x00371eba: 9900 ldr r1, [sp, #0]

0x00371ebc: 2606 movs r6, #6

0x00371ebe: 1c32 mov r2, r6

0x00371ec0: f64e0020 movw r0, #59424

0x00371ec4: f2c7005b movt r0, #28763

0x00371ec8: 47f0 blx lr

Register R0 contains a reference to the type of the array (i.e., byte[]), while register R2 contains the size of the array.

As another example, a fill-array-data v0, +6 Dalvik bytecode instruction (which loads into an already-created array a series of bytes at a given offset) looks like:

0x00371ea8: f8d9e190 ldr.w lr, [r9, #400] ; pHandleFillArrayData

0x00371eac: 4682 mov r10, r0

0x00371eae: 4650 mov r0, r10

0x00371eb0: f20f6144 adr r1, +216

0x00371eb4: 47f0 blx lr

2) There is a lot of automatically-generated code that is not directly related to the Dalvik bytecode, which makes reversing the binary harder. For example, since this app was playing with arrays, there are "bound checks" all over the place:

0x00372410: f8d9e238 ldr.w lr, [r9, #568] ; pThrowArrayBounds

0x00372414: 1c01 mov r1, r0

0x00372416: 1c28 mov r0, r5

0x00372418: 47f0 blx lr

3) When one method invokes a Java method in the Android framework, we only see a jump to an address in memory...how can we know where the app is jumping to?

For example:

0x00372294: f6497e11 movw lr, #40721

0x00372298: f2c72ea0 movt lr, #29344

0x0037229c: f24520c0 movw r0, #21184

0x003722a0: f2c70044 movt r0, #28740

0x003722a4: 4641 mov r1, r8

0x003722a6: 2253 movs r2, #83

0x003722a8: 2365 movs r3, #101

0x003722aa: 47f0 blx lr

Here the control flow jumps to address 0x72a09f11. The memory map information clearly tells us that we are jumping in boot.oat (the Android framework):

$ cat mmap.txt

703d3000-70eee000 rw-p 00000000 b3:17 185108 /data/dalvik-cache/arm/system@framework@boot.art

70eee000-7298b000 r--p 00000000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat

7298b000-73e43000 r-xp 01a9d000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat

73e43000-73e44000 rw-p 02f55000 b3:17 185109 /data/dalvik-cache/arm/system@framework@boot.oat

and we can compute the offset in the boot.oat file with the following formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000

(Note: the "-1" is for fixing the ARM thumb-related "bit")

For our example: 0x72a09f11 ~> 0x1b1bf10.

But how can we know which method in the framework is invoked?

It turns out that an OAT file contains the "Java method" -- "offset in binary" mapping we are looking for. To dump this information, we used oatdump, which is a tool that comes with AOSP. Note that oatdump and the OAT format are very target-specific, and an oatdump binary compiled for Android M will not work for an OAT generated for Android L.

Once we compiled the right version of oatdump (in our case, for Android L), we could extract the information we needed:

$ oatdump boot.oat

[...]

74: java.lang.String java.lang.String.replace(char, char) (dex_method_idx=3238)

DEX CODE:

0x0000: const/4 v9, #+0

0x0001: iget-object v2, v10, [C java.lang.String.value // field@2147

0x0003: iget v1, v10, I java.lang.String.offset // field@2145

0x0005: iget v0, v10, I java.lang.String.count // field@2143

0x0007: move v4, v1

0x0008: add-int v5, v1, v0

[...]

OatMethodOffsets (offset=0x0150b2b8)

code_offset: 0x01b1af11

gc_map: (offset=0x015fc583)

OatQuickMethodHeader (offset=0x01b1aef8)

mapping_table: (offset=0x018e19e2)

vmap_table: (offset=0x01a7daff)

v4/r5, v2/r6, v5/r7, v3/r8, v10/r10, v1/r11, v65535/r15

QuickMethodFrameInfo

frame_size_in_bytes: 96

core_spill_mask: 0x00008de0 (r5, r6, r7, r8, r10, r11, r15)

fp_spill_mask: 0x00000000

CODE: (code_offset=0x01b1af11 size_offset=0x01b1af0c size=400)...

0x01b1af10: f5bd5c00 subs r12, sp, #8192

0x01b1af14: f8dcc000 ldr.w r12, [r12, #0]

suspend point dex PC: 0x0000

GC map objects: v10 (r10)

0x01b1af18: e92d4de0 push {r5, r6, r7, r8, r10, r11, lr}

0x01b1af1c: b091 sub sp, sp, #68

0x01b1af1e: 9000 str r0, [sp, #0]

0x01b1af20: 468a mov r10, r1

0x01b1af22: 921a str r2, [sp, #104]

0x01b1af24: 931b str r3, [sp, #108]

We eventually noticed that the offset we computed with our previous formula and the offsets outputted by oatdump are "off" by 0x1000 (we are still not sure why exactly), thus making our final formula:

offset_in_boot_oat = offset_in_memory - 1 - 0x7298b000 + 0x1a9d000 - 0x1000

This allowed us to resolve all targets references in the app we had:

0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])

0x72a0ad91 ~> 0x1b1bd90 ~> void java.lang.StringBuffer.<init>(java.lang.String)

0x72a0e5d9 ~> 0x1b1f5d8 ~> java.lang.StringBuilder java.lang.StringBuilder.reverse()

0x72a0e809 ~> 0x1b1f808 ~> java.lang.String java.lang.StringBuilder.toString()

0x72a061f9 ~> 0x1b171f8 ~> void java.lang.String.<init>(byte[])

0x72a09f11 ~> 0x1b1af10 ~> java.lang.String java.lang.String.replace(char, char)

0x72a0a781 ~> 0x1b1b780 ~> java.lang.String java.lang.String.substring(int, int)

0x72a0d919 ~> 0x1b1e918 ~> java.lang.StringBuilder java.lang.StringBuilder.append(java.lang.String)

0x72a0ac71 ~> 0x1b1bc70 ~> void java.lang.StringBuffer.<init>()

0x72a0ab49 ~> 0x1b1bb48 ~> java.lang.String java.lang.String.trim()

0x72a08971 ~> 0x1b19970 ~> boolean java.lang.String.equals(java.lang.Object)

With this info, it was then trivial to re-implement the check() function in python, which spit out the flag. Note that the first part of the binary does many simple (xor-like) operations on the arrays defined in the code, but this last part was definitively the most challenging one.

$ python check.py

FLAG: 0ctf{1ea5n_2_rE_ART}

Long story short: reversing OAT is somehow possible, but additional info is required (mmap + boot.oat) if you don't want to guess "too much." Also, this challenge might have been much much harder (if not impossible) if some of the methods would have been called through vtables (hence several levels of indirection).

All relevant files can be found at this link. Hope you enjoyed it!

Sunday, September 20, 2015

CSAWCTF 2015 -- pcapin (forensic 150) write-up

This is the write-up for solving "pcapin", a challenge from CSAW CTF 2015. It was in the "forensic" category, and it was worth it 150 points....may I say, 150 points my ass!?! This felt like a 1337 points challenge...at least :D

So, we have a pcap (links to all files at the end of the post), and we know that it contains the dump of some sort of file transfer protocol, and that a "not so sophisticated" encryption was used.

Several hours (and people) later, we extracted the following info:

- The data that the server sent us is split in two "sessions," each terminated by the "END" string.

- The first session contains 8 packets of 68 bytes each.

- The second session contains 28 packets of 212 bytes each.

- The first 12 bytes of each packet identify the total length of the session, some control codes (we still don't understand most of them...), the number of packets, and a packet counter. As it turns out, all these bytes are useless.

- The remaining bytes of each packet contain the actual "data".

- The first session is a file listing (once decrypted, one of the listed file is "flag.png"...)

- The second session is the content of the flag.png file.

- Each packet is encrypted with a 2-byte xor key.

So, we focused on the second session, as it contains the flag.png file. To recap, we have 28 encrypted packets that, once decrypted and concatenated, will give us the content of the "flag.png".

As a first step, we guessed the keys for the first and last packet. Since some PNG-related keywords need to appear at the beginning of a PNG file (e.g., "PNG" or "IDAT" strings), it was trivial to get the first 2-byte key: 20543. Same goes for the last packet: in this case there were many repeated 2-byte string at the end of the packet, so we guessed there was some padding (likely \x00) at the end. This gave us the key for the last packet: 19564.

But what about the keys for the other 26 packets? Well....

The general idea was to bruteforce the key for each packet. But how can we guess what's the correct key? After all, the IDAT payload is zlib-copressed stuff, so everything, once decrypted, will look like random bytes...

...and after several hours, we came up with this trick: for each packet, we decrypt it with a key K, try to uncompress it, and look at the result. If it's impossible to uncompress it, then for sure K is not the right key. If we can uncompress it, we then check the payload and see if it looks like RGBA (RGB + alpha channel!). We were able to confirm that this trick might have worked given that the IDAT payload contained in the first packet followed the pattern, and that alpha was always set to zero!

We wrote an heuristic to check for this pattern for each of the "attempt", and we sort the candidate keys according to another heuristics. Few hours later, we got the PNG, which contained the flag :-)

These are the 28 keys: 20543 44829 21138 23618 15062 59478 13198 54610 4633 46710 41810 38097 56123 58392 52387 12251 26106 43868 15618 57633 1053 53731 53447 30269 24329 17183 6131 19564.

Here there are a bunch of links: github repo, original pcap, and the code. And this is the final, decrypted PNG:

Enjoy!

Tuesday, December 10, 2013

The "behind the scene" of DexWare, a DalvikVM based service for the iCTF 2013.

This write-up will describe the "behind the scene" of DexWare, a service I wrote for the iCTF 2013. To the best of my knowledge, this is the first service in the history of CTFs to be based on Dalvik-bytecode!! I hope this write-up will be a useful starting point for those who will attempt something similar!

You can find the source code and the compiled binaries on github (link). Also, feel free to ping me on twitter (@reyammer) for any questions.

Introduction

It seems that many of the teams found this service particularly challenging to hack, even after discovering a possible attack vector. The fact that heavy usage of Java reflection was required to hack the service might be related to that :D. In any case, let me tell you, this was a pretty complex service to be set up as well, especially as it needed to run for 8+ hours, on 100+ VMs, at a risk of compromising the entire game if unstable.

These are the points I will touch in this write-up:

How to compile a DalvikVM for Linux/x86
How to actually run DEX bytecode on Linux/x86
Discussion about Dexware, the service
Quick overview on how to exploit the service
How to make the service reliable (bug in DexClassLoader?)

How to compile a DalvikVM for Linux/x86

The vulnerable box for the iCTF needed to be a classic Linux/x86 VM. Hence, to run a service like this, you first need a DalvikVM runnable on Linux/x86. Once you know how to do it, it's pretty straightforward.

After you download the entire AOSP repo, just fire the following commands:

$ cd <aospdir>

$ source build/envsetup.sh

$ lunch # and then select "aosp_x86-eng"

$ make -j8

You could actually compile only the parts related to the DalvikVM, but compiling everything on a moder machine doesn't take that much in any case. Once you have that, you will find two different directories in the "out" folder. The "host" directory, that contains everything related to the host (your current machine), and the "target" directory, that contains a bunch of files for the target. As we want to run the VM on the host itself, the file interesting to us are those ones in the "host" directory.

After the compilation is over, you will have your nice dalvikvm x86 executable!

$ file <aospdir>/out/host/linux-x86/bin/dalvikvm

dalvikvm: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Note that not all the files in the "host" directory are actually needed to run the VM. The files that I included on github (link) should be the smallest subset of files that is required.

How to run DEX bytecode on Linux/x86

Once you have the binary, it is not entirely straightforward to run pure Dalvik bytecode. Indeed, the DalvikVM executable expects to be run within the context of the Android OS, where many files are in specific directories and specific environment variables are set. Of course, the goal here is to run dalvik bytecode, not Android applications! A quick clarification to the readers not familiar with the Android ecosystem: Android applications are APK files, that, among many other components, contain the classes.dex file, the Dalvik bytecode. Here we are going to run only the actual code, while we'll forget about the remaining Android-related components. In fact, to run a full-fledge Android app, we would need a full-fledge Android OS with all its components running, something that is not going to happen given the many space/memory constraints we have in this setting (iCTF).

In any case, long story short, this is how I managed to run the VM. First, you need to create a "dalvik" bash script (full version here)

$ cat dalvik

#!/bin/bash

CURRDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

export ANDROID_ROOT=$CURRDIR/linux-x86

export ANDROID_DATA=/tmp/dalvik-data

mkdir -p $ANDROID_DATA/dalvik-cache

$CURRDIR/linux-x86/bin/dalvikvm \

-Xbootclasspath\

:$ANDROID_ROOT/framework/core-hostdex.jar\

:$ANDROID_ROOT/framework/conscrypt-hostdex.jar\

:$ANDROID_ROOT/framework/okhttp-hostdex.jar\

:$ANDROID_ROOT/framework/bouncycastle-hostdex.jar\

:$ANDROID_ROOT/framework/apache-xml-hostdex.jar \

Once you have this script, you can actually run your piece of Dalvik-bytecode by first wrapping the .dex file in a .jar, and by then calling the script with "./dalvik -cp <jar_fp> <MainClass>", where <jar_fp> is the path to the JAR file, and MainClass is the name of the class that defines the main() method.

How to do that is pretty simple: you first compile your Java application into .class files. Then, you create a DEX wrapper in a JAR with dx:

$ dx --dex --output=dexware.jar *.class

DexWare, the service

The DexWare service (source code) was meant to be an innovative firwmare to manage super-secret chemical formulas related to uranium :-). Each of the formula was identified by a formula ID, and each of them was protected by a different password. Such formulas were the CTF's flags the teams needed to steal!

The service was maintaining a DB of the formulas in a hashmap field, and, in theory, a user could access to a formula only by knowing the associated password.

Of course, a vulnerability was added to the service, such that an attacker was able to steal the formula (the flag!) even without knowing the password.

Which vulnerability? Well, I wanted it to be a nice service where the teams need to exploit a real-world vulnerability. The decision was really simple: as our most recent paper (that is going to be presented at NDSS 2014, stay tuned!) is about code injection vulnerabilities in Android apps, that was a no-brainer. Gotta be a code injection vulnerability :-)

The service has a self-upgrade functionality. When invoked, the service goes through the list of files in a specific temporary directory, check for files with the .jar extension, load them, and execute the upgrade() method of the DexWareUpgrader class, if implemented. And how were the teams supposed to write a jar in the temp dir? Well, by a pure coincidence, a hidden functionality was added to the service so that it was possible to dump arbitrary files to that specific temporary directory. Easy peasy, isn't it?

The Exploit

Actually, even after knowing the attack vector, it is not that easy. In fact, a lot of Java reflective calls were necessary to get a hold on the DB formula field, and, to do that, the teams needed to access the (undocumented?) "this$0" field to first access the outer class (that had the formula DB field). As some nice write-ups on how to exploit this service already popped out around the net, I will stop with the details here, and I invite the interested readers to check those links out: link1, link2 :)

How to make the service reliable. Bug in DexClassLoader?

After developing my own exploit, I tried to stress-test my service to check whether the memory requirements were reasonable. And something interesting happened: after I run my exploit (together with benign traffic) for ~5K times, the DalvikVM was nicely and reliably segfaulting in my face. Awesome, isn't it? :/

As it turned out after a quick investigation, everything was crashing due to "Too many opened files" errors. Wait, what?

After a deeper investigation, I started to blame the DexClassLoader. In fact, at each run of the exploit, the service was creating a DexClassLoader object, that in turn would open the jar file, to then execute the upgrade method. What's the issue? It seems that the DexClassLoader does *never* close the loaded jar file, and that's why my service was going out of file handles!

I tried my best to force the DexClassLoader to close the associate handle, but nothing worked out (even after deleting the JAR, setting to null all the Java objects involved, and explicitly calling the garbage collector). I also checked in the actual implementation of DexClassLoader, but I failed to find any reference on how to close the loaded JAR.

As a workaround, I implemented a routine in the service that every once in a while was going through the JAR files in that temporary directory and, through JNI calls, manually closing all the handles. Dirty as sh*t, but pretty effective :) (source code)

I do understand that an innocuous DexClassLoader is not supposed to be abused the way I did, but, still, I think it's a bug :-)

Conclusion

All in all, that was a really interesting service to write, and it seems the teams enjoyed it as well. For the future, I'm planning to have a service where actual Dalvik bytecode-related skills are required: in fact, the key technical challenges were related to perform tons of Java reflective calls, while no specific Dalvik technical skill was required. I'll try to pull something out for the next iCTF :-)

As always, any sort of feedback is welcome! Let's keep in touch and feel free to ping me on twitter at @reyammer. Finally, if you played the CTF and you solved this challenge: remote high-five man, awesome job :-)

Monday, July 29, 2013

ShellNoob 2.0 is out!

ShellNoob 2.0 is out!! You might now ask with a mix of suspicion and astonishment:

what whaaat?? -- http://www.youtube.com/watch?v=a1Y73sPHKxw

Yep, you got it right! A new version is out!

For those who haven't read the first blog post, ShellNoob is a shellcode writing toolkit that helps you dealing with the boring, error-prone, and painful steps, leaving only the fun part to you! At least that's the goal :)

From when I published the first version (exactly three months ago!) a lot of stuff happened. First, I'll be lucky enough to have a chance to give a demo and to present ShellNoob at Black Hat USA Arsenal! Here is the announcement/abstract: http://www.blackhat.com/us-13/arsenal.html#Fratantonio. Second, a shitton of new features got added!

Here it is the new updated feature list:

convert shellcode between different formats and sources. Formats currently supported: asm, bin, hex, obj, exe, C, python, ruby, pretty, safeasm, completec, shellstorm. (All the details are in the README.)
interactive asm-to-opcode conversion (and viceversa) mode. This is useful when you cannot use specific bytes in the shellcode and you want to figure out if a specific assembly instruction will cause problems.
support for both ATT & Intel syntax. Check the --intel switch.
support for 32 and 64 bits (when playing on x86_64 machine). Check the --64 switch.
resolve syscall numbers, constants, and error numbers (now implemented for real! :-)).
portable and easily deployable (it only relies on gcc/as/objdump and python). And it just one self-contained python script!
in-place development: you run ShellNoob directly on the target architecture!
built-in support for Linux/x86, Linux/x86_64, Linux/ARM, FreeBSD/x86, FreeBSD/x86_64.
"prepend breakpoint" option. Check the -c switch.
read from stdin / write to stdout support (use "-" as filename)
uber cheap debugging: check the --to-strace and --to-gdb option!
Use ShellNoob as a Python module in your scripts!
Verbose mode shows the low-level steps of the conversion: useful to debug / understand / learn!
Extra plugins: binary patching made easy with the --file-patch, --vm-patch, --fork-nopper options!

The source code and a (hopefully enough) informative README (with all the details, use cases, etc) is on github: https://github.com/reyammer/shellnoob. Check it out! Please send all the bug reports and swears to yanick [AT] cs.ucsb.edu / @reyammer. All feedback is welcome :-)

Saturday, June 22, 2013

Defcon CTF Quals 2013 – \xff\xe4\xcc 5 (lena) write-up

This is my write-up for the Defcon CTF Quals 2013 - \xff\xe4\xcc 5 (lena). I partecipated to the quals with the Shellphish team (we ended up in 7th place!), and I needed to spend an entire night with the great @cavedon (one of the Shellphish's secret weapon) to solve this challenge. Also, we probably wouldn't have made it without @adamdoupe, that monitored our health conditions when we were trying to finalize our exploit, during the following morning. Needless to say, solving this was not easy (only 8 teams solved it!), but it was definitively super-fun and one of the best CTF challenges I've played with so far.

Alright, let's get it started. The scenario is the usual one: we have to exploit a binary (Linux, x86, 32bits) called "lena_server" that is running on a remote machine controlled by the organizers. As this challenge is belong [to us and] to the "shellcode" category, we already knew that the binary was supposed to get "something" in input, that was later processed and "somehow" executed.

This write-up is split in two parts: 1) our journey reversing the sh*t out of lena; 2) how we wrote our exploit.

Reversing the Madness

We soon figured out that the program expects a BMP as input. The first thing it does is to perform these simple steps on the BMP's header:

check that the two first bytes are "BM"
read the following four bytes and interpret them as the BMP's size
sanity checks against some bytes in the header (for example, bytes 0x1a-0x1b must be 0x0001)
checks that the "bits per pixel" is set to 24 (i.e., standard RGB)
checks that the width and the height is less than 0x200 and multiple of 8 (actually, we later figured out that their BMP data's parser was also assuming a width of exactly 0x200...).

From pixels to bits

The real madness is now about to begin. As we are going to deal with some numbers, I'll directly use as references the values from our final exploit: a BMP of size 512x80.

So, after checking the constraints against the BMP header, the executable starts to "process" the actual BMP data. It first extracts 8x8 pixels subblocks starting from the top-left corner of the BMP. Actually, for each 8x8 pixels, the executable "extracts" three different 8x8 matrices, one per channel. That means that in our case, the binary extracts a total of (512/8)*(80/8)* 3 = 1920 8x8 subblocks.

For each of these subblocks, the binary then computes (what it seems to be) its discrete cosine transformation (by calling the function SimpleDCT::DCT2Calc8x8). That is, the output of this step is still a 8x8 matrix, but we are now in the frequency domain!! #wtf

This new matrix is then passed as argument to the function DCTCoefficient::DecodeDCT that computes some sort of statistics related to the entropy of the original 8x8 pixels. This eventually returns a single bit as output, 0 for low entropy, 1 for high entropy (at least, this is what we understood :)).

This bit is then passed to the DecodeDataStream::AddDataBit function that updates an internal data structure. As I mentioned, these steps are performed for each 8x8 subblock (1920 times in our case). From a high-level point of view, such data structure contains two buffers, that I will refer to as B1 and B2. At each iteration, the current bit is appended to B1. When the buffer B1 happens to contain 256 bits (32 bytes), the following additional steps are performed:

the 32 bytes in B1 are passed as argument to a function called DecodeDataStream::DoDecodeBlockEv, that performs a "decoding" step (at that point, we had no idea on what that function was doing). Its output are 32 bytes, that in the general case differ from the 32 bytes in input.
some obscure checks are performed on the output. If these "checks" are not satisfied, those 32 bytes are somehow modified or, in the worst case, the program quits.
if the "DoDecodeBlockEv" function returns without errors, the first 16 bytes of the output are appended to buffer B2, while the latter 16 bytes are discarded.
the content of buffer B1 is cleared.

The good news was that, even if we had no idea on what the "decoding" step and the obscure checks were actually doing, we started to see the light: in fact, by quickly reversing the remaining part of the executable, we realized that if the program is able to process without errors all the 1920 bits in input, the content of the buffer B2 is later directly executed!

Reed-Solomon error correction codes

It turned out that the function "DoDecodeBlockEv" was actually implementing the decoding step of a Reed-Solomon error correction code. From wikipedia:

"Reed–Solomon (RS) codes are non-binary cyclic error-correcting codes. They described a systematic way of building codes that could detect and correct multiple random symbol errors. By adding t check symbols to the data, an RS code can detect any combination of up to t erroneous symbols, or correct up to floor(t/2) symbols"

When we understood what that meant, we realized how miserable our life was going to be: the function "DoDecodeBlockEv" applies a RS decoding step on each 32 bytes of buffer B1, by considering the first 16 bytes as the actual data and the remaining 16 bytes as "error correction code". In other words, the function uses the latter 16 bytes to self-correct (!!!) the first 16 bytes, our soon-to-be shellcode!! #wtf²

Clearly, we needed to make sure that each set of 32 bytes in B1 were the RS encoded version of 16 bytes of our shellcode. We first found some python implementation of this encoding step, but they were somehow wrong, in the sense that the decoding of the encoding was not returning the original input (and as we don't know sh*t about this stuff, we were not able to "debug it").

Eventually, we found a Matlab help page (yep, we were that desperate..) talking about a RS-encode function rsenc. Even if we were naturally adverse to rely on Matlab for our exploit, we decided to give it a try. We quickly got a Matlab instance running, we "encoded" the 16 bytes "AAAAAAAAAAAAAAAA" string, and we got as output the 32 bytes string "AAAAAAAAAAAAAAAA<other 16 bytes>". We then took this string, and with some gdb-leetnees we made the "lena" executable decode it. With our great surprise, the decoding step returned exactly the original 16 As. That is, the "<other 16 bytes>" were a valid error correcting code for the first 16 bytes! It worked™.

Generating the exploit

After a long session of "hurrah" and high-fives all over the place, we started to smell pwnage in the air. We began our journey to victory by summing up what was happening to our poor BMP in input:

the BMP's header is checked against several constraints.
the BMP's payload is retrieved and decomposed in matrices of 8x8 (three for each 8x8 pixels subblock).
for each of the subblock, a new 8x8 matrix is computed by calculating its DCT.
for each new 8x8 subblock, the function "DCTCoefficient::DecodeDCT" returns a bit, whose value depends on the original 8x8 pixels entropy.
each of these bits is appended to the buffer B1.
when B1 contains 32bytes, these 32bytes are decoded (function "DoDecodeBlockEv") following the Reed-Solomon error correction code scheme.
if everything is fine, the first 16 bytes are appended to B2.
When all the 8x8 subblocks are processed, the content of B2 is executed.

At this point, we just needed to generate a BMP such that the final content of buffer B2 was actually executable. Clearly, our best shot was go backward: we took our standard open-read-write shellcode, and we then wrote a script to invert all those crazy transformation steps (as our Matlab skills are ridiculously poor, we also had Python scripts to generate few Matlab scripts: leetness++). We started with a ~90 bytes shellcode, padded to 112 bytes; we then obtained 224 bytes (1792 bits) after the RS encoding step with Matlab's rsenc, that we needed to pad to 1920 bits. Finally, as the width of the BMP needed to be 512 (remember their parser's assumption?), we used the 1920 bits to generate a 512x80 BMP (3 bits per pixel), that lead to our exploit:

That was not straightforward, but the result we obtained probably represents the highest form of artistic production that a bunch of computer scientists could ever hope for. Of course, the low-level details on how we actually generated that are left as an exercise for the reader ;)

That's all folks! We really (really!) liked this challenge, and we renew our big kudos to the @LegitBS_CTF crew for this great CTF!

Yanick / @reyammer

Wednesday, June 12, 2013

What The Fork: how to immediately block any Android device

What if an unprivileged Android app could lock, instantaneously, any Android device out there? What if such an app exists and is also really simple to implement?

A few months ago, Antonio and I stumbled upon a paper titled Would You Mind Forking This Process? A Denial of Service attack on Android. In this paper, the authors describe a vulnerability they discovered related to Android's Zygote that could be exploited by mounting a DoS (Denial-of-Service) attack: this resulted in the target device becoming completely unresponsive after a minute or so. Without going too much into the details, the vulnerability was due to the world-writable permission access to the Zygote's socket, from which Zygote itself listens for "requests to fork new processes". With this technique, they were able to make Zygote fork a high number of processes, and that was enough to block the device.

Their work got a lot of coverage and the vulnerability they found (CVE-2011-3918) has been considered as critical (CVSS score of 7.8). Eventually, even Google acknowledged the bug by committing a fix.

At that point, Antonio asked me "how about a fork-bomb? Would that work?". To our surprise, the answer to that question is: yes, definitively yes!

We found that the most simple implementation of a fork bomb is able to instantaneously block any Android device we have tested it on, regardless of the phone/OS version.

Up to now, we verified that our exploit works on Google's flagship device (Nexus 4 with Android 4.2.2) and on the following other ones:

Nexus 4 - Android 4.2.1
Nexus S - Android 4.2.1 (Cyanogen mod 10.1)
Galaxy Nexus - Android 4.2.2 (Cyanogen mod 10.1.0-RC5)
Galaxy Nexus - Android 4.1.1
Galaxy Nexus - Android 4.0.4
Samsung Galaxy Tab 10.1 - Android 3.1
Motorola Backflip - Android 2.3.7 (Cyanogen mod 7)
Emulator running Android - Android 4.1.2

We immediately contacted the Android security team (on Feb 7th, 2013), and received a response shortly after: in their opinion, this does not constitute a security issue, because it is just a local DoS and the user can somehow regain control of his device (by rebooting the phone).

We found their answer quite interesting. In fact, we believe that our exploit is strictly more powerful that the previously disclosed one, as in this case the device blocks immediately, and not after a minute or so. Also, the impact that such an issue might have is much higher, as there does not seem to be an easy patch to fix this problem. This naturally raised some questions: did Google patch that vulnerability only because it was a simple fix? Or maybe the patch was supposed to fix a different more serious (undisclosed) bug? As we have not heard anything back (we sent our last email to them on Feb 10th, 2013), we have no idea. In any case, we are surely not the first ones to come up with a local DoS attack (for example, check this nice trick from DexLab), but it seems that Google just does not care that much about this.

We think that local DoS is really bad (at least bad enough to commit a fix, especially when it's simple). First, this violates one of the core principle on the Android design: no application, by default, should have the permission to perform any operation that would adversely impact other applications, the operating system, or the user. Second, what if the attack starts during the night, when the user is relying on his phone for the alarm clock? That would be quite unfortunate. And what if our nasty app starts at every boot, making the device completely unusable from the beginning? Now, if you are one of those few guys who reads a blog post like this, you probably know your way around to delete the annoying app, but what if you are not?

That being said, our uber simple forkbomb app is available on github at: https://github.com/reyammer/android-forkbomb. Feel free to try it out, but be ready to pull out your battery :-)

For any questions, please feel free to drop us an email at yanick@cs.ucsb.edu and antoniob@cs.ucsb.edu.