DEF CON 29 - Dan Petro - Youre Doing IoT RNG
Aug 5, 2021 17:35 · 7965 words · 38 minute read
- Hi, I’m Allan Cecil. And with me today, I have- - Dan Petro.
00:05 - - We are presenting “You’re Doing IoT RNG”.
00:08 - We’re talking about vulnerabilities that exist in virtually every device out there in the IoT world.
00:14 - That’s a critical issue that we need to talk to you about because random numbers are used everywhere.
00:20 - - Before we get too far though, let’s talk a little bit about random numbers and why they’re so important to security.
00:26 - Numbers of course are kind of how computers work.
00:29 - So a random number can be a stand in for all sorts of things that we don’t normally think of as numerical.
00:34 - So encryption keys, authentication tokens, and our lovely friend, business logic.
00:40 - But one of the things you’re going to notice throughout this presentation is that the vulnerabilities that we’re going to describe here have a lot to do with these specific logic of a particular application in a way that is hard to replicate on map.
00:52 - So a lot of the vulnerabilities that we’ll discuss here are kind of specific to particular applications or frameworks and are necessarily the kind of canned exploit that you might expect from a widespread CVE.
01:04 - - One of the problems though is that computers are notoriously bad at making truly random numbers.
01:10 - And that’s because computers need to be deterministic.
01:12 - If you did math and every time you got a different value out of pi or you had a penny two processor that just kept messing up floating point numbers, you’d have problems.
01:21 - And we have seen this in the past. So we make computers to be very deterministic.
01:26 - However, sometimes you need stuff that isn’t deterministic.
01:29 - You need some entropy. And that’s where our hardware RNG comes in.
01:32 - That’s its job, is to make entropy, to make a source of randomness, to be a seed for some way of getting random numbers.
01:41 - So that solves the problem, right? - Now is actually a good time to bring up that there’s two major kinds of random number generators that you’ll find, a pseudo random number generator and a true random number generator.
01:53 - You might think from the names of them that you want the true new random number generator or the TRNG, that that’s the good one and pseudo ones are the bad ones, but it’s not that simple.
02:03 - Really the distinction is only that one is made in software and one is made in hardware.
02:08 - And the names really lead you astray here. I particularly hate the name, true random number generator for the hardware ones.
02:16 - It sort of implies a kind of quality behind them that isn’t necessarily present.
02:21 - I suspect this naming comes from big RNG as a propaganda term.
02:25 - But in any case, the pseudo random number generators also come in two major forms, the cryptographically secure random number generators or CSPRNG for short and regular ones.
02:36 - So the regular ones are things you’ll find in like libc random or the Mersenne Twister that are meant to be fast.
02:42 - They’re really efficient. They’re just pieces of software.
02:44 - In both cases, they work basically the same though.
02:47 - You take an initial seed number that could be any length somewhere between 32 bits all the way up to 128 bits.
02:56 - And then you stretch that entropy out indefinitely into the future.
03:00 - You could produce the stream of seemingly random numbers from that seed, right? So it’s an entirely predictable set of numbers given that initial seed.
03:10 - And so the distinction between a cryptographically secure random number generator and a not secure one is basically that the regular ones like the Mersenne Twister or libc random or any number of other common implementations will leak information about the internal state and, therefore, the seed as you go along.
03:29 - So there’s really no secure way to use those.
03:31 - They’re never safe to use for security, critical pieces of information.
03:37 - Anytime you actually need secure information, like a crypto key or something like that, you should definitely be using a CSPRNG.
03:45 - And then the security of the hardware RNG is something that we investigated in this research.
03:51 - One unfortunate part about the design of these hardware random number generators is that there’s not a lot of information known about the details of how they actually work or they’re basically black boxes.
04:01 - So if you wanted to find out about how your favorite IoT device, the hardware random number generator actually works, you’re kind of out of luck.
04:10 - One notable exception is the STM32. Wanna give them a shout out from STMicroelectronics.
04:16 - They actually have really great information about the details of the inner workings of their RNG as well as proof of correctness, a kind of proof of the good quality random numbers that came out of it.
04:26 - But there’s two kind of basic digital designs that you’ll find for how to produce random numbers in hardware, at least in a low cost way that you’ll find in IoT devices.
04:37 - Analog circuits and clock timings. The first one is an analog circuit.
04:41 - So you’re probably more familiar with digital circuits, that is circuits that are gated by a clocking function.
04:46 - Analog circuits are kind of the opposite. There’s no central gate by a clock.
04:51 - So what you can do is set up a analog circuit that sort of looks like the diagram we have here on the top right.
04:56 - And there’s a bit that flows back and forth between these (indistinct) functions.
05:00 - So it’s either a zero or a one and a zero and a one kind of back and forth again, and sort of spins in this infinite loop, going back and forth between zero and one at a function at a rate rather that is not exactly random, but sort of arbitrary.
05:13 - So if you were to pull that, if you say, ask the analog circuit at any given point in time to find out what the bid is, the value will be pretty random.
05:24 - And that’s a pretty good way of designing a hardware random number generator.
05:28 - - There’s also a method where you’re using multiple clocks at the same time to get a difference between a delta, between the measured differences between those clocks.
05:38 - If you had two clocks that were derived from the same source, the result should, in theory, be identical.
05:44 - So you would always get a one or a zero depending on when you sampled it.
05:49 - But if you allow them to run freely and measure the delta between the two, you can get a pretty normal distribution of differences between the two.
05:56 - Sometimes this happens on situations that you wouldn’t expect where the designers didn’t deliberately do that.
06:04 - One of them happens to be the original Super Nintendo.
06:07 - They had a 21 megahertz clock for the central processing unit and a 24. 576 clock for the APU, the audio processing unit.
06:16 - The result is that speed runners playing Super Metroid, they have to deal with random timings due to moving data across the bus when going through door transitions.
06:26 - Sometimes these random situations happen on unrelated devices and in a HAL harbor random number generator, you’re taking advantage of what just sometimes happens accidentally and using that deliberately to get random numbers out of it.
06:40 - There are some issues that come up if you’re calling too often, if you’re running too fast, you’re calling too frequently, for instance that output call from that analog circuit method.
06:51 - If you’re calling it too often, you’re gonna get the same number twice in a row because you didn’t give it enough time to transition.
06:57 - And the same thing could happen with the clock method.
07:00 - If you’re calling it too frequently, you could also end up with accidental sinking.
07:03 - It could just be that both of your clocks happen to align.
07:06 - So they’re both operating in exactly the same offset.
07:10 - There’s no guarantee that you’re going to be perfect, but it’s usually good enough that it’s fine as long as you’re not calling it constantly.
07:20 - How IoT does RNG is interesting. Most new IoT system on a chips have a hardware RNG device built into them as of 2021.
07:28 - That hardware RNG is an entire peripheral devoted to just RNG.
07:34 - So it must be secure, right? - The thing about IoT and programming on IoT devices is that there’s not really any operating systems to kind of smooth over the errors that you might make.
07:46 - Typically you just run C/C++ on bare metal.
07:50 - So if you needed a random number for the security critical piece of information, like a crypto key, you just call the hardware RNG peripheral directly, usually through a HAL functional hardware abstraction layer function and something that looks basically like this.
08:06 - This of course is pseudo code, but it looks basically the same across basically every SDK and operating system.
08:12 - So there’s the HAL_get_random_number of function and there’s kind of two parts that we really care about here.
08:17 - For one, there’s an output variable, the outnumber, that’s the actual random number that we care about.
08:23 - If you’re familiar with C, it’s an output variable, basically you send it a pointer to the number and then the function will overwrite the value at that pointer.
08:32 - And then the actual return code, which has an error message.
08:35 - The error message that’ll tell you in case something went wrong along the way.
08:39 - There’s lots of things that can go wrong when you are talking to a piece of hardware, right? The peripheral might be broken or something went wrong over the bus.
08:46 - Maybe there are a random number generator peripheral just wasn’t ready yet.
08:50 - Maybe the relative positions of Jupiter and Saturn for all we know are not aligned.
08:55 - In any case, an error can occur in the calling of this random number generator function and it’ll let you know about that in the return code.
09:03 - So we wanna ask the natural question, how many people out there in the world actually check this error code? So as it happens, almost nobody actually checks the return codes of these HAL RNG functions.
09:15 - Basically everybody out there just sort of makes a call to the random number generator peripheral and just uses whatever result that happens to give you.
09:23 - You can see two results here from a free RTUs, which is a popular IoT operating system.
09:28 - And the MediaTek 7697. You could see both of these calls here are basically what we looked at earlier.
09:35 - There’s a return code that you’re not seeing that it’s checked and then the output variable that’s put into it.
09:41 - So maybe you’re wondering, all right, so you didn’t check the return code of the HAL RNG function.
09:48 - What’s the worst that could happen? - Undefined behavior.
09:52 - We don’t know what will happen. And that is a pen tester’s favorite phrase.
09:58 - The worst that can happen might sound like it’s the number zero.
10:02 - And in fact, we have seen examples of this.
10:04 - The XKCD joke that you always have to reference when you’re talking about random numbers is that you ask for a random number and it always returns a static value, a constant.
10:13 - That isn’t quite what usually happens, but it’s not far off.
10:17 - We’ve seen large swaths of zeros in a one gigabyte file of supposedly random numbers.
10:23 - And even in our own implementations, when we were trying to do it properly, we ended up with results that had large quantities of zeros in them.
10:31 - But that’s not the most insidious one. The worst is where you have partial entropy.
10:36 - It looks like it’s random, but it isn’t as random as you thought it was.
10:41 - A good example is you make a call, you get a 32-bit unsigned energy back.
10:46 - It has four bytes of random numbers. And the first call looks fine.
10:50 - You call again, it still looks okay. Then you call a third time and you get zeros and you call a fourth time and you still get zeros.
10:57 - But if you’re not looking carefully enough, you might not notice that you got a whole bunch of zeros in your calls.
11:03 - This partial entropy can be really tricky because it substantially reduces the actual randomness that you’re working for, the strength of the random number that you’re getting.
11:13 - And it might seem like this is not an issue, but this seems to happen pretty often.
11:21 - - So this is actually how this entire research project got started.
11:24 - We do a lot of IoT engagements at Bishop Fox under what we’d usually call a product security review.
11:29 - We like to say that if it breaks when you drop it, it’s a product security review.
11:33 - And so one time we had a client that was developing an IoT device that does a lot of cryptography in it that was kind of a security device.
11:41 - And I was reviewing the code and looked at the code, but he used a lot of a random number generation in the process of doing a lot of this cryptography and was curious about how it did random number generation on such a tiny, low power device.
11:56 - And it turns out that there was a hardware random number generator on the sock that our client was using.
12:02 - And on a Lark, I sort of asked to see like, “Hey, what is the quality of the random numbers “coming out of this thing?” I was sort of curious, I hadn’t actually seen the output of one of these hardware random number generators previously.
12:18 - And I didn’t really expect it to be bad. I was thinking that the hardware random number generator surely is the gold standard for RNG, right? And so when we got the results back, we ran it through some statistical randomness tests and it failed basically all of them.
12:34 - And then upon further inspection, looking at the actual binary files and a couple of gigabytes of output from the RNG, the large swaths of it were just zero.
12:46 - Surely I thought that this was a mistake, that this can’t have been right.
12:50 - So we then embarked on this sort of longer journey to investigate to see, was it just a single buggy chip or was it some crazy buggy code? And this entire thing kind of blew up from there.
13:02 - So you might think that encryption keys of zero are about as bad as it can get, right? Surely it can’t get worse than that.
13:08 - Well, I’d like to introduce you to Petro’s law.
13:11 - If I could have one eponymous law, one law named after myself, help me out here, DEFCON, it would be this, “It can always be worse. ” No matter how bad you think it is, it can always be worse.
13:23 - So what could be worse than encryption keys of zero? Uninitialized memory, that’s what.
13:28 - So take these three lines of pseudo code, for example.
13:32 - These are pseudo code of course, but if you just hop on to GitHub and look around, you’ll find lots of examples of this in the real world.
13:38 - So you declare, but not instantiate a number that you’re going to be using as your holder for a random number.
13:45 - This is declared on the stack. You then pass this to the HAL random number function.
13:50 - However, if the HAL function works in such a way that it doesn’t perturb the random number variable when there’s an error condition, like it doesn’t set it to zero say, then what will happen is when you go to use it later on, the value will just be whatever happened to be present in RAM prior to the call.
14:09 - So this can actually happen quite regularly in the real world too.
14:13 - For instance, if you’re doing some cryptography with a Diffie-Hellman key exchange, this will involve generating a random number and then sending it over the network to a potential adversary.
14:22 - So this is something that’s quite realistic at practice.
14:25 - - So let’s talk about some real world instances.
14:28 - In 2019, a study against over 75 million publicly available certificates found that over 435,000 of them were vulnerable to attack.
14:37 - In the study, they specifically called out lightweight IoT devices are particularly prone to being in low entropy states.
14:43 - And I’m not saying that our research here directly points to what they found, but it’s a pretty clear link that when you have these many devices that are producing such low entropy results, it seems pretty likely that this was what they found.
14:59 - - So you might’ve thought this was gonna be a very simple case of just simply blame the users, right? Those pesky users aren’t checking the return codes and we just need to make sure that they’re calling the functions correctly.
15:12 - Well, it’s actually not quite so simple. So take this for instance, this is some pseudo code at the top here.
15:19 - This is from the MediaTek 7697 documentation.
15:24 - You call the random number, you check if the status code is not equal to okay and then you’re sort of handle the error.
15:31 - Well, the handling of the error comment right there is doing a lot of heavy lifting because when you need a random number for some security critical code, you can’t just simply move forward without that random number.
15:44 - It’s sort of important to the core thing that you’re trying to accomplish there.
15:48 - So generally speaking, you’re kind of given two options.
15:51 - One is to spin loops. So you can just kind of wild loop call the random number generator function again over and over and over again until you get an OK status.
16:00 - Basically, you’re going to use 100% CPU indefinitely, maybe forever if the RNG to referral was broken, waiting for a result.
16:08 - That’s not very good. But the second option is just to quit out entirely, kill the entire process or if you’re in the networking stack, if you’re trying to make a TLS key for TCP connection that you’re in, then it’s going to involve killing the entire TCP connection.
16:25 - That’s not a very good option either. - Both are so unacceptable that it really leaves developers with only option three, which is YOLO.
16:34 - You can’t just spend loop because it’ll lead to broken, buggy, useless devices, same thing with quitting and killing the entire process and starting over.
16:44 - This is gonna make a device no one would want to buy.
16:46 - So we force users into this unwinnable situation.
16:52 - RNG and IoT is fundamentally broken and it’s not the fault of the user.
16:58 - So let’s talk about the right way to RNG. And the right way to do this is to use a cryptographically secure pseudo random number generator.
17:06 - It’s a mouthful, but a CSPRNG has some distinct advantages.
17:11 - It never blocks execution. It has API calls that don’t fail.
17:16 - It pulls from multiple entropy sources, more on that in a second, it always returns crypto-quality results due to stretching out the amount of randomness that you have.
17:24 - In short, it’s a much stronger system than just relying on a single source of entropy.
17:32 - - The way that the CSPRNG subsystem works is you start with a number of entropy sources.
17:37 - So these can include the hardware random number generator, but also lots of other things that an operating system might have access to such as interrupt timing from various devices, networking receive times, like tiny nanosecond receive times has a quite a bit of entropy to the network.
17:54 - You XR them all together into this big entropy pool.
17:58 - So it’s important to know about this, is all of them are XR together so that it’s not sufficient to break just a single one of these entropy sources in order to predict the output of the random number generator, an attacker would need to simultaneously predict all of them.
18:12 - So that’s very strong. Additionally, what you can do is then read from the entropy pool by a cryptographically secure random number generator.
18:20 - These are typically just a hashing function, like a Linux kernel, just MD5 the entropy pool.
18:26 - And then in order to produce a more numbers afterward, MD5 the last output with the entry pool itself, so you can kind of chain basically key stretching your way out to produce the functionally infinite amount of entropy from a static amount.
18:41 - Also because we can study these hashing functions offline, we’re very sure about the strength of the results.
18:49 - So this also makes remediation tricky here, that you see it’s not as simple case of use zigged view should have zagged where there’s a bug and a piece of software somewhere and whoever made the software can just kind of fix that bug, we patch it and move on with our lives.
19:04 - This is a case of a missing feature and one across the very heterogeneous landscape of devices and pieces of software.
19:11 - The most likely place that you’d implement a feature like this, CSPRNG subsystem, is one of the emerging, the IoT operating system, something like FreeRTOS, Contiki-NG, (indistinct) or point of many others.
19:24 - And we’d highly recommend using one of those if you’re making a new IoT device from scratch because the individual device, SDK is seeming like they’re unlikely to get this kind of a feature.
19:35 - Those SDKs are usually very thin filled with example code that’s mostly around hardware enablement and not so much around making a full device end to end.
19:46 - So yeah, this is gonna be a tricky remediation process.
19:49 - Now this vulnerability might be with us for some time.
19:52 - - This is how RNG and IoT devices should work too.
19:56 - But right now it doesn’t. And the rest of this talk is all about convincing you why you really do need a CSPRNG subsystem.
20:06 - It’s absolutely critical to have this. Now, let’s talk a little bit more about using hardware RNG to seed an insecure PRNG because this’ll matter a lot when we start moving into the rest of this talk.
20:18 - Nobody codes from scratch. We were talking earlier about blaming the users for doing it wrong.
20:22 - Well, if you’re a user and you’re grabbing reference or example code and that reference library or that code that you’re working with has vulnerabilities, it propagates those vulnerabilities out.
20:33 - And one of the places this shows up is some IoT devices and operating systems use the hardware, but only to seed an insecure libc pseudo-random number generator.
20:44 - What that looks like is you’re getting a nice, potentially random number from the hardware random number generator, you’re using that to seed libc, but everything after that is not necessarily cryptographically secure.
20:57 - You’re not really using the hardware even though you might think you are.
21:00 - This shows up in the MediaTek link 7697 and specifically in the Contiki-NG.
21:06 - I’m gonna let Dan describe this in a demo we have for you.
21:10 - - So for this demo, we built an IoT security camera device.
21:13 - It takes pictures every few minutes, just like a real security camera and post them to a publicly accessible website.
21:19 - So the only thing keeping an attacker from being able to view your photos is that each file is named in this long random file name here chosen by the camera.
21:28 - Now before you go thinking that this is unrealistically vulnerable, this is how discord works and lots of other applications like it.
21:34 - Anytime you take a photo and send it to a friend over discord, it’s publicly accessible.
21:40 - The long random file name is the only thing that’s keeping people from seeing your photos.
21:44 - Our device, however, is built using Contiki-NG, a popular IoT operating system.
21:49 - When you call the operating system to get a random number, it will use the hardware RNG on board, but only to seed the insecure libc rand function.
21:59 - So we don’t know what the seed is, but we don’t have to because we can derive it.
22:03 - So suppose one day you take a photo with your camera and post it on social media.
22:08 - Wow, what a cool camera you bought, how fun? But what you didn’t know is that an attacker can use this file name to derive what the original seed was that generated it.
22:17 - That’s because that’s how the libc rand function works.
22:20 - Our attacker here uses ANN Twister to find the seed.
22:23 - Once they have, they can use that seed to determine every past and future value from the RNG.
22:29 - So our attacker can just plug in some of those numbers back into the camera website and view every photo that the camera has ever taken, even if they’ve never been shared before.
22:42 - All right, so let’s have a word about exploitability.
22:44 - This comes up a lot anytime you give a talk about random number generation or crypto stuff, is how exploitable is this really? And the answer here is very exploitable, but it’s not going to be a canned exploit.
22:56 - You’re not going to see you’re doing it wrong. pi, that just sort of exploits things in the wild.
23:03 - It’s gonna have to do with the particular business logic of the device that you’re speaking to.
23:08 - It’s going to be very particular to individual Internet of things, devices.
23:13 - So there’s not gonna be just like a simple CVE that you can kind of apply universally across the board to every device.
23:20 - That comes with a one asterix, one possible exception here, and that’s with asymmetric keys.
23:26 - You see one of the things that often causes the hardware RNG functions to fail is calling them very rapidly in succession.
23:34 - So you call them too quickly and then it runs out of entropy and just starts giving you zeros.
23:39 - So one really common way to make that happen in the real world is to make a 2048 or 4096-bit RSA Key, right? In order to get that many bits from the hardware RNG, you’re going to need to kind of call it in a loop in succession very quickly.
23:54 - And those sorts of keys are very sensitive to low entropy.
23:58 - It’s not like an AES key where if you’re missing 32-bits off of 128-bit key, you’re still probably fine, whereas with RSA keys, that’s not the case.
24:09 - In fact, there’s a different talk at this DEFCON going on right now called the Mechanics of Compromising Low Entropy RSA Keys.
24:16 - We did not plan for this. That was just a thing that sort of happened.
24:20 - So this is actually a thing that you can check for empirically.
24:23 - You can look at the RSA keys coming out of IoT devices and see if they are of poor quality or not.
24:31 - And that’s the sort of thing you can do from the outside of black box as well.
24:36 - So if you’re a pen tester and you have an IoT engagement coming up, how you’re going to actually exploit this is going to depend greatly on whether it’s a black box approach or whether you have the source code to the application of the device itself.
24:48 - As a black box approach, it’s gonna be much trickier.
24:51 - You’re gonna want to look at the output of the RNG from however the application is using it.
24:56 - The easiest way as we mentioned is using the asymmetric keys, the device produces an RSA key or a certificate of some form, then you can look at those cryptographically to see if any known attacks work against it.
25:09 - That’s probably gonna be your best bet. Second though, look for any opportunities to tax the RNG, so any opportunity for the attacker to influence how often the RNG is being called.
25:20 - So for instance, if the device is producing some ID value, and that is done at the request of a user, then try requesting that very, very quickly and see if the numbers start becoming zero or if they’re lower entropy.
25:35 - Other than that, trying to actually measure the entropy of values that come out of an application can be very hard because very often it’s gonna be permuted in some way, you’re not going to get the raw hex values that come out of the RNG.
25:48 - Typically it’s gonna be produced as a six-digit pin or something like that, and an urge to perform statistical analysis on the kinds of output of the RNG, you’re going to need a very large sample size, like a gigabyte or so.
26:01 - And in order to get a gigabyte of six-digit pins, you’re gonna have to make a lot of calls.
26:07 - So that might actually be very difficult in practice.
26:09 - With source code, however, things become a lot more visible.
26:12 - You can look into to see how the hardware RNG is called and see if the return code is being ignored.
26:18 - I hesitate to recommend actually implementing a CSPRNG subsystem on your own at this state, since there’s a lot that can go wrong there.
26:27 - There’s a lot of moving pieces that go on there.
26:29 - There’s certainly a lot that you can mess up, but at least consider it if it’s critical enough.
26:35 - - Okay, you’ve done everything right. You’re spin looping, you’re blocking until the hardware random number generator gives you valid non-zero results or validated that you’ve used every library correctly.
26:49 - And that your libraries aren’t written in a bad way that are perhaps just seeding libc.
26:53 - You validated all of the code you’re using, surely you’ve got it right, now right? Nope, there’s still some usage quirks.
27:01 - You’re still likely going to do it wrong. In fact, you will do it wrong.
27:05 - This is the same level of difficulty as trying to write crypto code.
27:08 - And for the most part, we know not to try to write our own crypto code.
27:11 - It’s kind of a well-known law. Well, this is my law.
27:15 - My law is, don’t write your own RNG code. You will do it wrong.
27:20 - Now, you might potentially have some documentation.
27:24 - Maybe, if you can find it. On some usage quirks you’re gonna run into, for instance, the LPC54628 has a warning on page 1,106 out of 1,152 that says, “When you’re calling it, “you have to throw out 32 results, use one, “and then throw out the next 32 results “and repeat the process. ” It is the only way to guarantee that you’re getting proper random numbers out of it.
27:49 - But how would you know if you didn’t sort through a 1,000-page document? And if you saw this code written down and you went through the comments, unless they specifically called out why they were doing it, you would think that this was buggy code.
28:03 - This doesn’t seem sane. You can’t try writing RNG code on your own.
28:09 - It’s as bad as trying to write crypto code and even worse at…
28:13 - This is absolutely a situation that emphasizes you can’t blame the user for this.
28:17 - There’s no one that’s gonna get this right.
28:19 - One of the things we’ve touched on in this talk, our hardware dev kits that SoC, silicon-on-a-chip vendor is released to allow developers to debug and flash their devices.
28:30 - An IoT developer is going to build their advice around an SoC.
28:35 - And they’ll be writing their code in C/C++ or something similar on a PC and then flashing the device and testing it.
28:43 - The dev kits provide a variety of different features for testing things and each vendor does it a completely different way.
28:49 - For instance, this is a spark fund board that is using an NRF based SoC and it has a USB port for debugging and flashing as well as an SD card.
29:01 - Each vendor implements it completely differently.
29:03 - And for instance, this is an older version of an STM32.
29:09 - This particular design has a debugger at the top that can flash the device.
29:14 - The actual device you’d be flashing is this SoC down here.
29:18 - This portion of the board is still a dev kit.
29:20 - It has all of the functionality in that chip exposed on the pens on the sides.
29:26 - But you can do some interesting things. You could flash this chip and debug it with this portion.
29:33 - And when you’re satisfied with the device, you can actually snap off this entire upper portion.
29:38 - Sometimes you’ll do this because you’ll develop your device and then use the dev kit itself.
29:43 - One of these designs that we did on the side on my task bot project was with this Cyprus PSoC 5 based board.
29:51 - To flash this device, you plug this in into a computer and to actually use it, you’re using a completely separate USB port on the other side.
29:58 - In our case, we connected the actual dev kit to a board we manufactured.
30:02 - This was made by total in the task bot community.
30:06 - This allows us to connect to a video game console and pretend to be a controller as in for instance.
30:12 - You can also just make your own hardware that doesn’t have any of this extra debugging functionality at all.
30:17 - And that’s what we did with this task team 32 bot made by own source.
30:21 - Same concept, this allows us to connect directly to a video game console and pretend to be a controller.
30:26 - We just have cables that go from ethernet to say an Nintendo or Super Nintendo.
30:29 - But in this case, we’ve taken the SoC right here and incorporated it into a board of our own design.
30:35 - And that’s generally what a IoT device developer is going to do.
30:39 - Now, there are a variety of different advantages of doing that.
30:43 - One of the boards we worked with was this newer version of an STM32 that incorporated everything over just a single USB port.
30:51 - Rather than splitting it out like the PSoC 5 did, this has a USB serial interface, it has debugging functionality, there’s various different ways that you can talk just over USB.
31:01 - This was really interesting to work with. And ultimately the bulk of our research time was working with devices like these to try to access the hardware RNG that was on the SoC.
31:15 - This particular device had really good documentation.
31:18 - So a lot of credit to STMicro for that. They even went so far as to provide a proof of their randomness.
31:26 - And when we started trying to reproduce the results, we actually ran into several problems.
31:31 - One of the issues we ran into was that even with a reference librarian code and documentation, we still ran into problems with not properly spend locking and blocking execution of the program.
31:42 - For instance, in our early tests, we were using something called byte circle, just an analysis tool that allowed us to throw the data at this program.
31:52 - And it would show the byte distribution in a nice circle.
31:55 - So out of 255 possible different combinations, what we found is that all of the values we were getting were very low numbers for some reason.
32:05 - When you looked at the data in a file, it looked like it was properly random.
32:09 - But what we found was that we were actually calling it too frequently.
32:14 - We weren’t blocking properly and as a result, we were calling the hardware RNG too frequently and starving it, getting bad results out of it as a result.
32:26 - One of the other challenges we ran into was some of the devices made it very difficult to even get the data off of.
32:33 - MediaTek could be flashed over USB, but to actually exfiltrate the data, we had to use this crazy method over wireless and do various Git calls.
32:43 - It was pretty complicated, and really the bulk of our research was just spending time trying to get accurate numbers out of these devices.
32:52 - I bring up all this complexity because I wanna highlight dwangoAC’s rule as it were, don’t try to make your own hardware RNG code.
33:02 - It’s as bad as writing crypto code. I’ve said it before, but I can’t emphasize this point enough.
33:07 - We spent a long time trying to do this and still managed to mess it up several times in the process.
33:14 - IoT vendors really have it rough because if they do release a device that is standalone like this, how do you flash it in the field? Once you’ve released this, you’ve broken off the debugger, there’s possibly no debugger on it at all.
33:27 - There’s a lot of hardware out there that possibly has badly implemented hardware RNG that vendors simply can’t fix.
33:34 - So to pen testers, this is gonna be a perennial finding that will pop up for years to come because there’s already hardware out there that’s potentially using insecure RNG methods and it’s not easy to fix.
33:46 - - Okay, so let’s take a look at what the quality of the entropy looks like coming out of these hardware random number generators is raw, no more bugs, no more silly usage quirks and library shenanigans.
33:58 - Let’s look at the actual numbers coming from the hardware RNGs themselves.
34:01 - Are they good, are they bad? So we ran a bunch of the RNGs through some statistical analysis and we’ve got some cool results to show you.
34:09 - What you’re looking at here is a histogram for the MediaTek 7697.
34:14 - That is a diagram of all bytes and how often they occur zero to 255 from left to right.
34:20 - So what we should see is basically a flat graph.
34:23 - Every byte should occur just as often as every other byte, maybe with a little bit of fuzziness at the top.
34:29 - But that’s not what we see. What we see is this obvious sawtooth sort of pattern that occurs down the line.
34:35 - And if there’s two things that don’t go along well together, it’s obvious repeating patterns and crypto keys.
34:41 - Do you feel comfortable using this for your encryption key? I for sure don’t.
34:47 - Next up is the Nordic nRF528040 for this, there’s this obvious repeating 12-byte pattern of just zeros, there’s a 0, 0, 0.
34:57 - It’s a little bit hard to see in the picture there.
34:59 - The highlighting doesn’t capture the third zero.
35:01 - But this happens in hacks 50 bytes and that’s super bad.
35:06 - Obviously repeating patterns of any kind is bad, but especially fully zeroed bytes coming from the Nordic board was something that we saw that kind of stood out and made it fail all of these statistical randomness tests there forward.
35:22 - This example was so egregious and so peculiar that we thought for sure for a really long time that it was just our instrumentation that was at fault, but we don’t think it is for three reasons.
35:31 - One, we spent a really long time investigating our code, trying to figure out how this could possibly be happening from it and never figured out how.
35:39 - But two, it’s a 12-bit pattern, which is very weird if it was exactly one byte or no byte, then perhaps you’d think that was just a Knoll terminated string was kind of getting copied around incorrectly somewhere.
35:52 - But exactly 12-bits is really curious. And also the amount that jumps by 50 bytes.
35:59 - Sometimes it’ll actually jump by a little bit more than 50 bytes which kind of offsets things a little bit.
36:04 - So there’ll be maybe 80 bytes or something like that in hex.
36:07 - And then it’ll kind of continue on the pattern from there, which also wouldn’t really make sense as an inconsistency in our instrumentation.
36:18 - - So aside from just a distribution of values and aside from just taking all of the bytes and putting them on a graph and seeing how frequently they are, you can do a lot more statistical analysis.
36:29 - There’s a lot of tools out there including dieharder that you can use on large datasets to see if there are repeating patterns.
36:35 - And we relied on this a lot when we were doing our own implementation on the STM32.
36:40 - When we first started, we were failing all of the tests and it took us a while to get the code right.
36:46 - We thought we were spent looping properly, we weren’t.
36:49 - It turns out it’s very difficult to do this properly even when you think you know what you’re doing.
36:54 - Once we got through all of the other tests, we still had one of the dieharder tests that continued to fail, which was the RGB minimum distance test.
37:03 - - So the minimum distance test, what it’s doing is it takes a bunch of random numbers, interprets them as integers, and plots them in end dimensional space, and then calculates using a simple algorithm what the minimum distance between any two of those points are.
37:16 - And it should fall within expected parameters.
37:19 - What this is doing really is it’s checking for repeats or nearly repeated values since any repeats would cause the minimum distance to become smaller than it normally should be.
37:31 - So what you’re seeing here in terms of a failure is a P value of exactly zero, which means that it’s very confident that this should not have occurred randomly or the chances of it having occurred naturally is very small.
37:44 - At the same time, you’re watching this, we’ll be releasing our code that gathered and analyzed the hardware RNG entropy itself.
37:50 - It’s nothing terribly special, but it took a long time to get right and working across a bunch of IoT dev ports.
37:56 - And while we don’t think that there’s any errors that would have impacted the results, consider that even if it turns out that there’s a bug in our code, what does it say that two computer security experts spent months studying and analyzing our RNGs and still couldn’t get them to work properly? And what does that say about the state of IoT security more broadly? - Okay, so what are the conclusions we’ve come to after talking through all of this? The first is that this affects the entire IoT industry.
38:24 - It’s not a single vendor, it’s not a single device.
38:26 - It’s not a single particular quirk, this is widespread.
38:32 - And if you take nothing else out of this talk, the point we really wanna make is that the IoT world needs a CSPRNG subsystem.
38:41 - This can’t be fixed by just changing documentation and blaming users.
38:45 - You really need RNG code that is well-vetted and you should consider it dangerous to write it on your own, it’s just like crypto code.
38:53 - And also you should never use entropy directly from the hardware.
38:56 - You don’t know how strong or weak it might be as we just showed you.
39:00 - - Okay, so what can you actually do about this? Well, it depends on what camp you’re in.
39:04 - If you’re a device owner, keep an eye out for updates, IoT devices aren’t the best in general with software updates.
39:10 - And you’re gonna wanna make sure that yours is ready to take them when they’re available.
39:14 - IoT device developers, we highly recommend using one of the emerging IoT operating systems.
39:19 - We don’t have a strong preference for one versus the other at this point.
39:23 - But if you’re baking a new device from scratch or maybe even just updating a current one, we highly recommend using those rather than writing raw C on the bare metal yourself.
39:32 - If you’re an IoT device manufacturer or an iOS developer themselves, implement one of these CSPRNG subsystems, that’s the only secure way to do it, there isn’t a way to get around it.
39:43 - You might even consider deprecating or straight up disallowing users from using the hardware RNG raw itself.
39:49 - And if you’re a pen tester, keep an eye out for this because it’s gonna be a perennial finding for years to come.
39:56 - - I am Allan Cecil(dwangoAC. ) - I’m Dan Petro and thanks a lot. .