DEF CON 29 - Brian Hong - Sleight of ARM: Demystifying Intel Houdini
Aug 5, 2021 17:39 · 6563 words · 31 minute read
- Hello everyone and welcome to Sleight of ARM: Demystifying Intel Houdini.
00:05 - My name is Brian and I’ll be discussing Intel’s Houdini binary translator.
00:09 - So what it is, where it’s used, how it works, security concerns and recommendations.
00:14 - But before that, a little bit about myself real quick.
00:18 - That’s me, I’m Brian Hong. I studied electrical engineering at Cooper Union, and I’m currently working as a security consultant with the NCC group.
00:26 - Lately I’ve been performing a lot of Android pentests, and sometimes even Android malware analysis.
00:31 - Besides that, I like to build random bits of hardware and also like to reverse engineer low-level stuff that other people built, both software and hardware.
00:40 - So with that, let’s get started with Android.
00:44 - So Android is one of the largest operating systems in the world.
00:49 - And you could write applications for Android, Android applications using Java and Kotlin.
00:55 - And you could also, as well as C and C++ using their native development kit.
01:02 - Android was originally designed and built for ARM devices, but Google later added support for x86.
01:08 - And just to note, there has been out-of-tree support before that as well, such as the Android x86 project.
01:17 - So since then there’s been several Android devices running on x86, but now there’s many too, which are x86 Chromebooks and x86 hosts running commercial Android emulators.
01:31 - However, apps generally lack support for x86 still.
01:35 - And that’s because ARM is, ARM is the primary hardware platform for Android.
01:40 - And in fact, if you have native components in your app, the Play Store only requires the ARM builds.
01:47 - And because of this, many native applications don’t end up containing any x86 binaries, only ARM.
01:53 - So how can x86 devices running Android run these apps that only contain ARM binaries? And this would be a great time for me to introduce you to Houdini.
02:06 - So Houdini, the topic of this talk, is Intel’s proprietary binary translator that allows x86 devices to run on binaries.
02:16 - And it was co-created with Google, as it was designed to be run with Android NativeBridge.
02:20 - We’ll get to it in a second. Houdini is this mysterious little black box.
02:27 - We don’t know how it does. We don’t know how it works and there doesn’t seem to be any documentation on it.
02:35 - And it’s possible some vendors may be trying to hide that they’re using it.
02:41 - We’ll get to that as well. And there are three variants, 32-bit x86 implementing 32-bit ARM, 64-bit x86 32-bit ARM, 64-bit x86 64-bit ARM.
02:55 - Right. So Houdini can be used in physical hardware as they were on x86 based mobile phones.
03:01 - And they are still used in x86 Chromebooks, which is how we actually got the binaries.
03:08 - So we could take a look at them. They’re also using some commercial Android emulators, such as BlueStacks and NOX.
03:15 - Well, I don’t believe they’re enabled by default.
03:18 - I think, I believe, I remember, I think there is an option to enable it in the settings.
03:23 - And also it’s using the Android x86 project, which can be run on real hardware or on an emulator.
03:31 - Okay. So how does it work? Houdini is basically a interpreted emulator for ARM instructions.
03:40 - What that means is there’s a loop reading ARM instruction, the opcodes, and it produces a corresponding behavior in x86.
03:48 - I just want to make clear that it does not do just in time compilation.
03:52 - It doesn’t translate nor output any x86 instruction.
03:56 - It reads it and then does the behavior. And Houdini has two main components.
04:03 - The first, just Houdini, runs executables. And the second, libhoudini, is used to load and link ARM shared objects.
04:12 - So like I said, the first part, Houdini, runs ARM executables, both statically and dynamically linked.
04:19 - When running dynamic binaries, it actually uses its own set up pre-compiled libraries for ARM Android in addition to the x86 library needed by the rest of the Android and Houdini itself.
04:35 - And there’s a screenshot from the Chromebook actually.
04:38 - You could see that we have a x86 machine, denoted by x86 at the end.
04:44 - And the program I’m trying to run, hello static, is a 32-bit ARM statically-linked ELF binary, and I run it. /hello_static, and it just prints “Hello world. ” So some of you may have noticed that I just executed the binary directly without invoking Houdini.
05:06 - So you might be wondering, where does Houdini come in from? And this is actually a kind of cool feature, Linux kernel feature called binfmt_msc.
05:15 - And if you are familiar with it, I’ll give a quick explanation.
05:20 - Miscellaneous binary format is a Linux kernel feature that lets you basically register interpreters for custom binary formats.
05:29 - Kind of similar how a shebang works in Bash or Python programs.
05:35 - So in our specific case, our custom binary format is an ARM ELF binary.
05:44 - So the two screenshots below show the registered entries for a static and dynamic ARM ELF binaries.
05:54 - And you can see that the interpreter is set to /system/bin/houdini.
05:57 - So essentially the resulting effect is when I try to, when I type in. /hello in batch, or try to exec this binary, the kernel compares the magic bytes by looking at the first end bytes.
06:09 - If it matches, it turns it into that. So it actually execs /system/bin/houdini and it passes my program name as the first argument.
06:20 - Right. So, now that we know how the Houdini part is being used, let’s look at the more interesting second component, which is libhoudini. so.
06:31 - So libhoudini is itself a shared object for x86, and it’s used to load other shared objects for ARM, that’s built for ARM.
06:40 - It was designed to be used with Android NativeBridge.
06:43 - So let’s talk about that. So Android NativeBridge is the component in Android that allows this binary translation to work.
06:54 - It is part of the Android runtime and is the main interface between our Android side and our Houdini binary.
07:04 - So NativeBridge provides an interface for, for our Android to talk with Houdini.
07:09 - And I want to point out that while it might have been designed specifically with ARM and Houdini in mind, the interface that it provides can be used to implement other processor architectures.
07:21 - For example, running MIPS code on ARM device.
07:31 - So it’s part of Android runtime. So it’s initialized on boot.
07:36 - And when it, during it, it checks a system property, ro. dalvik. vm. native. bridge, and if it’s set to zero it’s disabled.
07:44 - If it’s not, the value is used as the name of the library to load, which implements the NativeBridge implementation.
07:51 - So in our case, that would be libhoudini. so.
07:55 - Actually a few interesting things about this.
07:58 - According to a DEFCON talk a few years back, it seems like BlueStacks renamed libhoudini to something like lib3btrans. so or something for some unknown reasons.
08:10 - And also it looks like Android x86 project uses their own implementation, libnb. so.
08:17 - But when you take a look at the source tree, it’s actually just a thin wrapper that loads and uses libhoudini itself.
08:26 - Another interesting thing was, there’s a script in the Android x86 project that enables NativeBridge, and it downloads libhoudini from a couple of obfuscated. cn URL shorteners.
08:41 - And yeah. It also seems like they move that link and the related code around a couple of times.
08:48 - Not sure what’s going on in there, but back on topic.
08:52 - NativeBridge defines two main callback interfaces.
08:58 - And just to get off topic again, I have to talk about the JNI first before I get into those callbacks.
09:04 - So Java Native Interface is a foreign function interface that enables our JVM code to interact with the native code and vice versa.
09:13 - Yeah. So, this part is actually not specific to Android and it is part of Java.
09:20 - So on the right side, we see a struct, the JNI Native Interface.
09:29 - It’s also typed up to… Its pointer is also typed up to JNIEnv.
09:34 - So this structure has basically a bag of function pointers that’s provided to the native code so that our native code could be, could use these functions to perform low-level JVM reflection.
09:49 - And I cut out a lot of it. There’s a lot of functions functions in there, but some of them are, you know, calling methods, getting the method ID, allocating object, getting a field, finding classes and so on.
10:04 - So this, the pointer to this struct has actually passed as the first argument when a Java, when your Java code calls the native code.
10:16 - And we’ll see how that’s used later. So the first callback interface from NativeBridge is the NativeBridgeRuntimeCallbacks.
10:29 - It’s quite simple, but it’s passed from NativeBridge to our libhoudini binary so that our libhoudini can call, find and call native functions in the Android side or the NativeBridge side.
10:43 - The second more interesting callback interface is the NativeBridge callbacks.
10:48 - And this is quite kind of like the opposite.
10:51 - It provides a way for our NativeBridge to call function in our libhoudini binary.
10:58 - We see some of the functions on the right. The most interesting of these are initialize(), loadLibrary(), and getTrampoline(), the latter two of which are quite similar to how dlopen and dlsym works.
11:13 - And I’ll show that in a later slide. Yeah, so this, this struct is actually exposed via the symbol NativeBridge ITF, which can be seen here.
11:26 - By looking at it in a hex editor or disassembler, you could see all the function pointers in that data structure.
11:35 - So, I have all of the components of like NativeBridge explained kind of, so I’m going to try my best to kind of put them all together.
11:45 - So here we go. So normally it would look something like this.
11:49 - Here you have an ARM device running ARM Android, and we want to load a ARM native library.
11:55 - So when your application launches, it would call system. loadLibrary, and which would trigger the Android runtime to call dlopen, and that will load our libnative. so into memory, into the processor.
12:10 - And then when our app wants to call a native code, native function, it will first do a dlsym, which will give it in turn a function pointer to our code.
12:20 - And then it jumps to it with the first argument being the pointer to our JNIEnv structure.
12:28 - And if our native code wants to interact with the Java world, it could do so by looking for the appropriate function in the JNIEnv, you know, function pointers.
12:38 - Now this gets a little more complicated when we talk about NativeBridge.
12:42 - So before anything happens, NativeBridge gets loaded on boot, right? So it checks the system property and sees that it’s pointing to libhoudini. so.
12:53 - And it dlopens the libhoudini library. Our Android and our device is x86, and so is libhoudini, and our goal is to run co, which is an ARM in our libnative. so.
13:13 - So after that, after libhoudini is loaded, it fetches the NativeBridge callbacks using dlsym() and closeinitialize(), which isn’t shown in the diagram.
13:25 - So after that, you know, the Android continues to boot up.
13:29 - And then when you launch your app, it would try to load the library again with system. load or loadLibrary(), which triggers NativeBridge plus Android runtime to call our NativeBridge callbacks load library, which acts similar to dlopen().
13:46 - So we’ll turn a handle. And with the handle, we could pass it to getTrampoline() to get a function pointer, similar to a dlsym().
13:56 - Now we can’t, we can’t actually just use dlopen() and dlsym() directly because the kernel will complain it’s a different architecture.
14:04 - And especially for dlsym(), because dlsym() will give you a function pointer.
14:08 - So Houdini has their own versions, loadLibrary() and getTrampoline().
14:13 - loadLibrary just opens our native file and maps it into memory.
14:17 - And getTrampoline() should return the function pointer, but it doesn’t.
14:22 - It can’t return the actual function pointer to our native library because our code is written in ARM, or it will contain ARM instructions in there, which our x86 processor probably won’t know how to handle.
14:35 - So instead libhoudini returns a pointer to a little step inside of our interpreter, inside our libhoudini, so that when we call the, the function returned by getTrampoline(), the interpreter is going to start running and the interpreter will in turn, start reading the native code and executing it.
14:56 - So the last part, I want to bring up is the JNIEnv pointer.
15:02 - I mentioned the, that the, when your Java code calls native code, the pointer to JNIEnv is passed as the first argument.
15:13 - We can’t pass that straight through to our native code because while our native code is running in ARM and our JNI functions are in x86.
15:25 - So what libhoudini does is it just remembers where it’s, like where it is, and then it creates its own fake version that’s filled with ARM instructions.
15:37 - Actually it’s filled the, JNIEnv function pointers point to a bunch of trap instructions.
15:44 - That way, when the interpreter sees those ARM trap instructions, it knows which of like proper JNI function to call on the real x86 structure.
15:59 - All right so, now that I’ve kind of explained how libhoudini comes together with NativeBridge and how it all fits together, let’s start thinking deeper into how the interpreter part works, and starting with memory.
16:14 - So, the emulation is a dual architecture, so it contains both separate ARM and x86 binaries.
16:22 - And it is a shared virtual address space, as well as they both have real world view of memory.
16:28 - So what that means is the x86 parts of the process and the ARM parts of the process view the memory the same way.
16:37 - And they’re in the same, they’re in the same address space.
16:40 - So there’s no magic translation between an ARM address versus an x86 address.
16:47 - And the last point is, there is a separate allocation for ARM stack.
16:52 - And just to, just to show, this is a snippet from one of the app’s process memory, memory map.
17:02 - We see here our native libraries loaded up there.
17:05 - And on down there, we have our libhoudini loaded.
17:10 - You could also see a bunch of ARM libraries loaded that’s used by our native code.
17:16 - The next one is, yeah. Specifically libc and a couple of others, we could see are loaded for both ARM and x86.
17:27 - And you also see a bunch of anonymously map pages, which is used internally by libhoudini.
17:31 - And our ARM stack lives somewhere in or around there.
17:38 - So moving on, I want to talk about the actual execution loop.
17:43 - So I mentioned earlier, it’s essentially a switch statement inside while loop.
17:48 - So this screenshot shows the portion of the interpreter where it would like, it would fetch, it would read the instruction, it would partially decode it, and then jump to the proper instruction handler.
18:02 - So in this assembly, I have the comments on the right, but I do have a equivalency code on the next slide.
18:12 - So basically I’ll run through this real quick.
18:16 - The snippet of code gets the program counter from the processor state, reads the instruction from memory, and then checks the condition bit, condition field.
18:28 - Condition code? Yeah, condition code to determine whether the current instruction should be executed or not.
18:34 - Once it determines that it should, it calculates this offset by concatenating bits 20-27 and the bits 4-7.
18:43 - So that offset is used as the entry offset into this instruction handler table, which has filled with a bunch of function pointers to instruction handlers.
18:55 - And then it jumps to it. So for example, our “mov r0, r1” instruction has the entry offset of 0x1A1.
19:04 - And each entry is a 32-bit address. So multiply by four bytes, we get 0x684.
19:10 - And then, so the final offset, final address that this function pointer is in, is 4BC044.
19:18 - And we could see that right here. If I look at it through a disassembler.
19:26 - At that address, we see a pointer to our function handler, instr_mov_1.
19:32 - And just note that this decompilation is not entirely correct, but already we could see in around lines 22, 23 and 27, we see some registers being moved around.
19:47 - And even some shifting and masking because mov instruction has the option to do that.
19:56 - The important thing to look at is that the, all of the instruction handlers have two parameters that’s passed in.
20:05 - The first is the instruction itself, the instruction bytes, so that the handler, the instruction handler could pull out the operands and fully decode it.
20:16 - And the second argument is the processor state.
20:19 - And it is basically what it is. It’s this data structure that keeps track of the ARM, emulator ARM’s processor state, but mostly continues to register values such as, you know, “r0, r1,” but also registers such as the program counter, stack pointer, link register, and so on.
20:42 - It also contains a byte that tells you whether it’s in thumb mode or not, and there’s a bunch of other fields, but I couldn’t really figure out what all of those do.
20:55 - Note that this is just the data structure in memory.
21:01 - And they have shared memory addresses between the x86 and the ARM side.
21:06 - So you can technically just, if you find that you could write values to these registers, to this struct to change and register values inside of ARM.
21:17 - So the next thing I took a look at was the syscalls.
21:22 - Trying to figure out how syscalls work. Syscalls adjust instructions as well.
21:28 - They’re special instructions, but they are instructions.
21:30 - So we could actually find them in the instruction handler table.
21:36 - We could see on the right, it takes the same parameters, the instruction and the processor state.
21:41 - This is also not entirely correct, the decompilation, but we could see that it, it actually doesn’t issue any x86 syscalls, but rather it just sets that SVC number field in the processor state and returns.
21:58 - So the actual switch for issuing x86 syscalls is further down that, that loop in the interpreter, and depending on which syscall number it is, it will do different things.
22:14 - Most of the time, it’s just simple wrappers or pastors with some conversion between like moving the ARM register value to the actual x86 register and calling int 80 in x86.
22:27 - Or just simple, you know, conversions. But some of them are a little more complicated than the other.
22:34 - One such example that was of interest to us was the clones call.
22:40 - And I’ve actually combined fork and clone here because nowadays if you call fork, it’ll go to libc fork, which would actually call clone.
22:49 - So, clone was also very interesting because it has a parameter there to pass in called child_stack.
22:56 - And you pass in a memory region, which will be used as a child_stack.
23:03 - And on top of that set stack will be the return address so that when the child is cloned, it’ll return and that address becomes the entry point of our child process.
23:18 - Now, we were wondering how that gets handled by the Houdini, and it turns out the child_stack we pass in is not passed to the kernel, but instead libhoudini creates its own empty RWX page and passes that as the child stack and handles the parent and child logic.
23:37 - So now that we have some ideas on how it works internally, let’s get to the fun stuff.
23:42 - So, detection. Is there ways we could detect whether we’re running as an app, we’re running inside of libhoudini or not? We came up with a couple of ways and the first way would be, we build an ARM native app.
23:59 - And in that app, we could check the host architecture, either via Java’s os. arch system property or by reading the /proc/cpuinfo.
24:11 - But as it turns out, you actually can’t do that because Houdini hides these.
24:16 - So when you do os. arch, System. getProperty, libhoudini makes us say armv7l from the Java side, when you’re running with NativeBridge.
24:30 - And when you try to cat /proc/cpuinfo, it would actually return ARMv8 processor rev 1 (aarch64).
24:41 - Actually, if you’re careful, you might be able to tell whether you’re running under Houdini because there seems to be some inconsistency, because one of them would’ve turned on v7, the other one would’ve turned on v8, and hardware says “placeholder,” funny enough.
24:56 - So there are some other ways as well. So checking the memory map, you could try to read the memory maps and see if either libhoudini is loaded or both ARM and our x86 libraries are loaded.
25:14 - So these are okay methods, but we think the best ways are those that are undetectable itself.
25:21 - So like no syscall issues, no files being opened, anything that would trigger an analysis tool, you know.
25:32 - So the method I came up with was using the JNIEnv function pointer.
25:37 - So I mentioned earlier, if you’re on a real device, I mentioned earlier that libhoudini creates its own ARM version of the JNIEnv structure function pointers.
25:49 - Now, if you’re on a real device, those functions pointers would point to real ARM opcodes.
25:56 - But if you’re running under libhoudini, the function pointers will point to also real ARM instructions, but those would be syscall instructions.
26:07 - I’ll have a quick demonstration of that later.
26:12 - So the next thing is, once we detect that we’re running inside libhoudini, can we escape to x86 with it? So of course we could call mprotect and write code to memory.
26:25 - But again, this isn’t very subtle. We would need to call mprotect, which would probably trigger most of the analysis tools.
26:34 - And another way we could try to do this is by x86 stack manipulations.
26:38 - We know approximately where the x86 stack is.
26:41 - So we could try to clobber the stack with ROP payloads and have it jumped to somewhere.
26:47 - This method is much more annoying, but one of the harder parts is trying to figure out where we could actually run our code.
26:54 - So we need to find a page that has execute permissions or try to find a bunch of, a lot of raw payloads.
27:03 - That brings us to security concerns. It turns out libhoudini uses, creates a bunch of RWX pages they use internally.
27:15 - And we see, we saw one of these for that, which has been used for the clone’s call.
27:20 - And they have read, write, and execute permissions, which means we could write x86 code to them and just jump to it.
27:28 - And there are again, shared memory. So we could write code from either both x86 side or from the ARM side.
27:35 - So just to show you what some of these are used for, the ARM JNIEnv, the ones filled with trap instructions, isn’t there.
27:44 - The ARM stack is in that memory region. So, back to security concerns.
27:51 - We have RWX pages in x86. So what about trying to get code execution on ARM? So it turns out Houdini ignores this bit entirely.
28:02 - Yeah. Which just means, you could load, you could write code anywhere and jump to it.
28:09 - And I don’t think I need to explain why that’s an issue, but yeah, ARM libraries themselves are loaded without the execute bit on their pages.
28:20 - So regarding the behavior ignoring the non-execute bit, not that this is correct, but if you think about it, this kind of makes sense.
28:30 - Houdini is an interpreter for ARM. The interpreter gets the data input.
28:37 - And that means, if you could read the data, read the instructions, it will run it.
28:42 - So to demonstrate that, I got this little program here, nx stack. c.
28:49 - And in my main, I allocate some memory on the stack code[512], and then I write two ARM instructions on it.
29:00 - And then I make that cast them to function pointers and jump to it.
29:05 - And normally on a real device, real ARM device, this will cause a SEC fault.
29:09 - But as we see below, it doesn’t. It just works.
29:13 - Actually, in the first iteration of this code, I accidentally have the memory outside of the function.
29:21 - So it was in the data section or some other region, and it still worked perfectly fine.
29:29 - Well, this runs fine with devices running with libhoudini.
29:36 - So the next step is a couple of quick demos.
29:44 - So, for the demo, this is on the Chromebook.
29:47 - And for the demo, I wrote this app.
29:49 - And I’ve actually built it, built two separate versions of it.
29:54 - It’s the exact same source. I just have two versions of it.
29:57 - One is built with just x86 libraries, our native code, and one only contains the ARM binary.
30:06 - So the top one is the x86 one and the bottom one is the ARM.
30:10 - And the Chromebook itself is x86. So to run the bottom app, it is running through libhoudini.
30:18 - So the first tab is CPU info. Well, you know, overturn the values that I mentioned before, and the top one doesn’t have any libhoudini.
30:28 - It’s running x86 on x86. So all the values are correct.
30:31 - We see GenuineIntel. Everything’s all nice.
30:35 - Whereas on the bottom, we’re running it with libhoudini, and we saw the output we saw before, ARMv7, ARMv8.
30:45 - Inconsistency, as well as hardware = placeholder.
30:50 - The second tab demonstrates the GetVersion, the detection method I quickly described, but here we see on the x86, when we do reference the GetVersion and CallStaticMethod functions, I believe, I mean, those are valid x86 instructions.
31:07 - I just, I think those are a bunch of push instructions as you’ve often seen in the beginning of a function.
31:16 - And on the bottom, when we do the same thing, fetch all, so this is running with libhoudini.
31:23 - So we will see ARM instruction, but specifically those instructions, 0xef000, those are our syscall instructions.
31:35 - So you could use that as a method to detect whether libhoudini is running or not.
31:44 - In this case, yeah. So the third tab actually is not for demonstration.
31:49 - Just the utility to show you the process’s memory map.
31:54 - So we’re in x86. There’s no libhoudini.
31:57 - And there should, everything should look fine.
32:00 - Right. But when we look at the, the ARM versions process map, we see a bunch of anonymous map memory, we see libhoudini right there.
32:14 - And we should also see our ARM libraries loaded in right there.
32:22 - Okay. So the, I think the most interesting tab is the last tab, the exec, which demonstrates the NX bit or the lack of NX bit check on the ARM side with libhoudini.
32:36 - So top one is running without libhoudini. And just to kind of explain to you what this layout is.
32:42 - So on the left side, you will write some bites.
32:45 - That’s going to get written to a buffer. You’re going to type in some bites, and then you got to click run, and then it will be passed to a native code where those bites will be actually written to memory and then jumped to.
32:58 - However, the top one is our x86 version, so obviously we can’t run ARM instructions and there’s no libhoudini loaded.
33:08 - So it’s going to crash. That was the intended behavior.
33:12 - However, on apps that’s running with libhoudini we could actually just type in valid ARM instructions, click run, and it would run.
33:23 - And I have a couple of different programs written up there because I don’t want to type it up manually.
33:30 - Run and then multiply, and multiply is r1 and r2, and then adds it to r0.
33:37 - That’s correct. And getSP actually reads the stack pointer of the ARM processor and returns it.
33:48 - And just to show you that this is dynamic, these bytes are actually being copied, I could change the actual bytes of the instruction.
33:56 - Register 15 will be reading the PC. I could also modify it.
34:02 - So the left side is completely changeable. As long as it’s executable ARM instructions, it will run it.
34:09 - I changed the one to a two or three and would add three.
34:13 - Same thing for adding two integers. I added three times.
34:18 - So 2 times 3 plus 6 is 13. So just for completeness, I have the same app, but now it’s running on a real ARM device.
34:30 - So this device happened to be on v8. So we’ll stay on v8 and aarch64 processor.
34:42 - It’s going to all look correct. There’s no libhoudini running on this because it’s an ARM device running ARM code.
34:51 - So we go to the detect tab. Let’s skip to it.
34:55 - We go to detect tab. We see just valid ARM instructions that are not, they’re not syscalls.
35:06 - And in the maps, you know, this should also be fine.
35:10 - Completely fine. No Houdini. No, yeah.
35:15 - And of course, this is not running with libhoudini.
35:19 - This is just running on actual ARM hardware.
35:21 - So when we try to copy these bytes into malloc memory or stack with the heap, and then jumped to it, it would crash.
35:31 - And it does. With demos out of the way, let’s now talk about possibilities of malware that knows about libhoudini.
35:42 - To start, we know that applications are often run in sandbox environments for analysis.
35:46 - This is mainly done in one of three ways. Running them on actual devices would give the most realistic behaviors, but it is hard to do on a large scale and also hard to instrument.
35:58 - The second best option is fully virtualized environments, like QEMU.
36:02 - but these have somewhat, a performance overhead since they would have to emulate the entire hardware and the processor.
36:11 - And that brings us to our third option, Android emulators.
36:15 - And those Android emulators on x86 devices can use technology like Houdini to run ARM applications.
36:24 - This has the least overhead as would only emulate parts of the application instead of the whole hardware and the operating system.
36:34 - And on another point, most of you would agree that inconsistent behaviors are harder to debug.
36:41 - And similarly, apps that may or may not have behaved maliciously are harder to detect and are also harder to analyze.
36:48 - So let’s combine those points. And so for example, a malware can use one of those detection methods mentioned previously to figure out whether or not it is running with libhoudini.
36:59 - Then it’s possible for them now to act benevolently when it thinks it is under analysis by seeing that libhoudini is being used.
37:08 - And in other cases, show malicious behaviors when libhoudini is not present.
37:16 - Yeah. So what about the other way around? We could also perform malicious actions only when Houdini is present, abusing the knowledge of its inner workings to further obfuscate itself.
37:30 - And for example, we don’t know what the play store uses nowadays, but it seems like their automatic app testing doesn’t use, doesn’t run ARM APKs on x86 with libhoudini.
37:44 - In a case like this, a malware could detect that it’s running on, well, it’s not under analysis, and when it is running under libhoudini, for example, inside an commercial emulator, then it could do some tricks like running code from the stack, which you can’t do on a real device.
38:08 - And trying to analyze that would prove to be difficult because a static analysis tool would see that you write some code onto the stack and it jumps to it.
38:20 - And that should crash. Whereas if you’re running on libhoudini, it works.
38:26 - So we finally come to the recommendations and how not to write an emulator.
38:32 - And we could start by talking about the RWX pages.
38:36 - So we noticed that libhoudini internally uses, well, libhoudini maps a bunch of RWX pages to be used internally, and those should not be there.
38:50 - If it’s really necessary, we recommend performing a finer-grain page permission control.
38:56 - And one of those methods would be implementing an efficient NX implementation.
39:02 - So, we understand that checking page permissions every instruction would incur a very significant overhead, right? Every instruction you want to run, it has to check the page permissions, via software.
39:18 - So instead what we could do is, we keep track of it in a data structure, and we only check if the instruction we’re currently running is different than the previous instructions.
39:32 - So in the case of jumps or instructions across a page boundary.
39:37 - We could check those. So this basically becomes our userland page table implementation.
39:44 - Given that, our recommendation is to just use virtualization.
39:53 - Simple enough. But regarding actually implementing the userland page table via software, we can do it a couple of ways, right? We could only trust the text section of the library on load.
40:09 - And the other option is to check the memory map and every time a new page is added.
40:17 - And then if a new page is added, we add that to our data structure that we keep track of.
40:22 - And third, we could, we could hook the memory mapping related syscalls, and then, add, whenever, for example, mmap is called or mprotect is called with the execute permissions, we update our data structure accordingly.
40:43 - So ideal solution combines the last two, two and three.
40:47 - So it will, yeah, every time you do mmap or mprotect, for example, it would add an entry into our data structure that keeps track of the page permissions.
41:00 - And just as a catchall, we could check the memory map for new pages that’s not already in there.
41:08 - And this has some good side effects, such as, we can now, since we have a userland page table, we could do dynamic library loading via dlopen().
41:22 - And we could also do legitimate just-in-time compilation.
41:27 - And of course, the used JIT pages should be cleared, properly cleaned up after usage to prevent page reuse attacks.
41:38 - And another thing is that, of course, this data structure is a critical data structure as it acts as our page table and should be heavily protected.
41:48 - So some of the things we mentioned is writeable only when being updated, surrounded by guard pages, not accessible to ARM, et cetera, et cetera.
41:57 - And another thing we recommend for researchers or vendors doing analysis or Android applications is when you’re running dynamic analysis, you should also run apps through libhoudini.
42:12 - As we mentioned, it’s possible for malwares or any other applications to behave differently when they see that libhoudini is enabled.
42:21 - Also when doing static analysis, we should look for access to Houdini RWX pages, or attempt to execute from non-executable pages, which would work if it was running under libhoudini.
42:37 - And just to add on that, anything scanning for JNIEnv function points, as that was one of our detection methods.
42:46 - So to summarize, what I’m trying to say in this presentation is that Houdini introduces a couple of security weaknesses into processes using it.
42:56 - And that would be ARM native applications running on x86 devices.
43:04 - Some of these impact the security of the emulator ARM code, such as the NX bit, the lack of NX bit check, while some also impact the security of the host x86 code, such as the rewrite execute pages everywhere.
43:20 - Yeah. And actually think the fact that Houdini is not well-documented publicly, nor easily accessible, has something to do with preventing wider security analysis and research into this, which could have caught these issues earlier.
43:36 - Yeah. Which brings us to our few last slides.
43:40 - I’d like to give big, big, big, special thanks to Jeff for mentoring this project and helping develop the methodology.
43:49 - Also Jennifer for all the support and research and amazing feedback, and Effi for basically bootstrapping this research.
43:55 - And with that, thanks everyone for joining.
43:58 - And I believe you’re at Q and A right now. .