DEF CON 29 - Zabrocki, Matrosov - Glitching RISC-V chips: MTVEC corruption for hardening ISA
Aug 5, 2021 17:35 · 9241 words · 44 minute read
- Hello everyone, my name is Adam Zabrocki, and together with Alex Matrosov, we would like to give you a talk about Glitching RISC-V Chips: MTVEC Corruption For Hardening ISA.
00:09 - However, this is the end goal, what we end up doing.
00:12 - And before we ended up doing that, we also needed to go through various different researches we would like to cover in this topic today.
00:22 - To sentence about ourself, we don’t wanna focus too much here.
00:25 - Both of us did this research during our work at Nvidia, and here you can find some private contact information to me and Alex, and we just want to mention that we did some security research for a couple of years already.
00:38 - So what is this talk about exactly? So let’s speak first about execution environment.
00:43 - So to be able to speak about execution environment, we need to have some kind of hardware.
00:48 - And when we speak about the hardware, we especially also mean about the CPU architecture.
00:53 - There might be various different CPU architecture like RISC-V, like x86, like ARM, et cetera.
01:00 - And when we write a software, software especially targets the CPU architecture.
01:05 - So any software which we write in the end are kind of designed to run on specific CPU architecture hardware, even if you’re writing the high level language, in the end still the interpreter runs on and targeting specific CPU architecture.
01:19 - And in this kind of scenario, if you would like to break such kind of execution environment, we can focus on the various type of the attack.
01:27 - We can focus on the pure software attacks or pure hardware attacks.
01:30 - Example of the pure software attack could be any kind of the Memory Safety, like Overflow, Use After Free, et cetera.
01:36 - Injection, which is very popular in the higher level languages like command injection, SQL injection, cross-site scripting or logical issues where you essentially try to target a specific software, which is very badly designed and which might have security implication.
01:51 - There is also much more other attacks. While the pure hardware attacks, we usually think about the glitching type of the attacks or any kind of the side channel attack or physical probing of the hardware, also there is many more.
02:03 - So from the high-level perspective, if you think about the pure software attack, they are targeting specific implementation, example could be specific programming language, which allows you to have some kind of undefined behavior, which we try to hunt, or you can also target specific compilers, specific software or like firmware, et cetera.
02:21 - And there is no different than pure hardware attacks.
02:23 - In the pure hardware attacks, you also focus on targeting specific implementation.
02:28 - Example could be specific CPU family, or specific implementation of the architecture, ISAM, et cetera.
02:34 - And there is also recently very nice research about, what about mixing these two types of attack, pure hardware and pure software.
02:42 - And example of such kind of mix attack could be, for example, meltdown or spectre attacks.
02:48 - They essentially means hardware and software.
02:50 - However, if you think about that a bit more, what about if you found the bug in the reference code of the hardware ISA itself? Not in the implementation of the ISA, but like in the kind of ISA itself, like a reference code for the ISA.
03:03 - This is pretty interesting implication from that, because then the problem will affect all of the implementation of the silicon, not just the specific implementation or specific family, because everything which is realized from the reference code like ISA will be affected.
03:19 - And this is also interesting because in such case software cannot trust hardware at all when they do.
03:24 - So this is exactly what we’ll try to speak about today.
03:28 - This type of the problem which we discovered during our research.
03:31 - So at first, how did we find it and how did we even focus on such kind of problem? So essentially we wanted to analyze a specific boot software like Boot ROM or a specific microcode runs.
03:44 - However, the problem is that it was running on the specific RISC-V chip, which we essentially had zero experience with architecture of RISC-V, and more over, it was not just a simple RISC-V chip implementing base.
03:56 - ISA however, they carry custom extension and custom functionalities on this specific environment, which we wanna analyze.
04:04 - And even more, the boot software was written in the AdaCore/SPARK language which we additionally also had zero experience with at that time during the research.
04:14 - And we starting to focus on that a bit more quickly, is there any kind of public offensive research publication about that language.
04:22 - Did anyone even hear about that language before? Because we did not.
04:25 - So at that time we also needed to be able to analyze the binary, especially the binary compiled from SPARK language targeting this customized V-chip, and there was no even any tools who natively support and even simple RISC-V, including IDA Pro and Ghidra.
04:43 - This was around 2019, and none of these tool natively supported RISC-V.
04:48 - And this is exactly what we try to speak about today.
04:51 - So during this talk, we will try to describe our journey through all of the problems which we met during this research, which resulted in the end, in the discovering ambiguity of the RISC-V specification, and also one additional problem.
05:05 - However, first thing first let’s speak about RISC-V in a nutshell.
05:09 - So RISC-V is essentially an open standard instruction set architecture known as ISA, based on the RISC principles.
05:16 - And unlike any other, most of the ISAs, RISC-V essential is provided under open source that do not require any fee, it’s open source and free essentially.
05:27 - But there is some kind of side effect of that because it’s open source and it’s free, essentially the same RISC-V chip might have tons of the different implementation, even ISA is the same.
05:36 - So you think you have exactly the same chip but implementation could be completely different and variant, and also RISC-V has a small standard base ISA, which essentially have multiple standard extension.
05:46 - However, this gives you a potential huge fragmentation of the silicon, because one with ISA, with the custom extension, another base ISA with a different extension, and all of them still are RISC-V.
05:57 - And everybody can easily add their own extension, which is very cool because it’s open source, so nothing stops you to do that.
06:03 - So just take the ISA out of your own extension and the built on silicon, which essentially as a side effect gives you even much bigger fragmentation because it’s not only RISC-V base with a different extension but also they could be custom extension on the top of that.
06:17 - And what is worth to mention that, today there is more than 500 plus members of the RISC-V foundation who support this initiative of RISC-V, including the big players like Nvidia or Google.
06:31 - So most people now are familiar with the x86, and this is just prepared as very short and simple table, who compares this two architecture.
06:40 - So main difference is exactly license because x86 charged fee for ISA and microarchitecture while RISC-V it’s free, so there’s no fee for ISA, neither for microarchitecture.
06:50 - RISC-V it’s an instruction set based on the RISC, obviously.
06:54 - And X86 originally was a CISC, but it’s not really true anymore because since Pentium pro, x86 instruction essentially are turned into something called micro-ops, which is kind of like RISC nowadays.
07:06 - And x86 is very old architectural, so essentially we have various variants of ISA.
07:11 - They have 16 and 32 and 64 bits of ISA. However, RISC-V is much more modern, so we have 32, 64 and there’s even 128 bits of variance of ISA, which is not locked yes, but essentially there is one.
07:24 - And RISC-V operates on the memory model called load-store architecture.
07:28 - While x86 is a register-memory architecture.
07:31 - And RISC-V has 32 general purpose registers.
07:36 - But there is one special registers called zero register, which always keeps zero.
07:40 - And what is interesting from security perspective, RISC-V natively support execute only memory, known as XOM.
07:46 - It can set up this as, this is supported in the page table entries.
07:49 - However, x86 normally doesn’t support XOM, unless you have hypervisor extension, then you can define XOM-like attributes in the slot table, which is second level (indistinct) translation.
07:59 - And another big difference is software ecosystem, which supports specific architecture.
08:04 - So x86 essentially runs everywhere. It’s very old and very well designed and very well researched architecture.
08:11 - So you have Linux ecosystem, Windows, Macintosh, and many more.
08:13 - While RISC-V from the practicality perspective, you essentially have only one ecosystem, which is Linux essentially.
08:20 - But again, from the security perspective, you would like to focus more on the privilege modes and levels which architectural carries.
08:27 - So this is very nice picture taken from the blockbuster you get here on the below of the slides, which essentially gives you the traditional rings, which x86 carries.
08:36 - So originally x86 only had four rings, which is ring zero, one, two and three, while ring one and two was not really used.
08:43 - However, when the virtualization, hardware virtualization of architecture, x86 not fully designed yet, people trying to implement part of virtualization of x86 using ring one, but it’s not, it doesn’t need to be done anymore because now we have full hardware virtualization of x86.
09:01 - So traditionally in the ring three, there is a least privilege code running, which is like user application, and in the ring zero, you have kernel code most of the time.
09:09 - However, over the time people demand more privilege levels.
09:12 - So that’s why we have something which we, and officially called ring minus one, where the hypervisor works software.
09:20 - In the ring minus two, there is something which we call SMM, which is more privileged than ring minus one, and the results are something not many people know, management engine on the x86, which we can call this as kind of ring minus three, because it’s the most privileged mode.
09:36 - If you compare that to the RISC-V, we have only three mode, which is M, S, and U.
09:42 - So traditionally it’s an open source architecture.
09:46 - So you have also various combination of this mode.
09:49 - So you can have M mode without another mode.
09:51 - So you can have M and U mode without S mode, or M, S, and U mode, which all of the mode essentially.
09:57 - So what are these modes? So U mode extends from user mode.
10:00 - It’s kind of equivalent to the ring three on the x86.
10:03 - This is where the application runs, user application runs least privileged mode.
10:06 - In the supervisor mode, which is a S mode, essentially this is what is equivalent to ring zero.
10:11 - This is where the kernel runs. And M mode is kind of interesting, because M mode is called machine mode.
10:18 - What the ISA defines, this is the software, which is the closest to the hardware runs, kind of like a firmware.
10:25 - However, if you wanna compare this mode to the x86, it should be something around the ring minus two and ring minus three.
10:32 - This is what you should think about as an M mode, is the most privileged mode essentially, it’s most powerful mode.
10:37 - However, RISC-V also works on the hypervisor extension, so we have a few extra modes.
10:42 - So S mode became HS mode, this is where the hypervisor extended supervisor works.
10:48 - And we have two new modes, which is VS mode and VU mode, which is virtualized supervisor and virtualized user.
10:53 - This is where the VM will run, but also it doesn’t cancel U mode, normal U mode, so you have also all of these modes together.
11:00 - And again, because it’s open source with one additional combination of supported RISC-V chips, which is also MVS and VU mode.
11:09 - And again, you don’t need to implement all of the modes.
11:11 - You can choose which mode to support your hardware.
11:14 - And we, as an attacker, we are very interested and we would like to focus on that mode, which is M mode, which you can kind of call it “God” mode, because this is the most privileged and most powerful mode runs.
11:25 - So if you are there, is the most ideal situation.
11:29 - So how to be there. So essentially we know more or less what is RISC-V.
11:34 - We learn about RISC-V a bit very quickly, but we at least know what we are talking about.
11:38 - And then we know that the software which was targeting this specific hardware RISC-V which customize V essentially, was written in SPARK.
11:45 - And then we started to think about like, what the hell is SPARK? We never heard about that language.
11:50 - Like, did anyone even hear this language before? So what is AdaCore/SPARK.
11:56 - AdaCore/SPARK is essentially a programming language together with a set of analyzing tools.
12:00 - So SPARK in fact is a type of like other language, ADA language, but it’s subset of ADA language.
12:07 - It’s much more restricted and doesn’t have full ADA features because SPARK essentially wants to be formally verified language.
12:16 - And the form of verification has been carried by the exact set of analyzing tools, and this is where the strength of the language is.
12:22 - So exactly analyzing tools. These tools include GNATProve, GNATStack, GNATTest and GNATEmulator.
12:30 - Essentially what this tool gives you as a attributes for the SPARK language itself, it can statically prove various different things.
12:37 - It can prove that these dynamic checks can never fail for example, or they also can warranty you there is absence of runtime errors, essentially everything is in the known and correct states, because of that, you don’t have errors, or runtime errors at least.
12:52 - And also all of that it’s formally verified, which is very interesting and cool, because that gives you a form of verified proofs that all of these attributes are being intact.
13:03 - So what you should think about that from the attacker’s perspective, it’s essentially a memory safe language.
13:09 - It’s a memory safe language, which is formally proved.
13:12 - So it’s a kind of like a RUST. RUST is not formally proved, but it’s a memory safe language and it’s much stronger typing system than RUST, because it has very strong typing system, and because it’s much stronger than RUST, also there’s lack of problems like arithmetic overflows, integer overflows, underflows, et cetera, which is very interesting.
13:31 - And because of that, it’s a very secure language.
13:33 - It’s traditionally have been, it was used traditionally in the industries, which are critical, like avionics, railways or defense system.
13:41 - So we should think about any language from the developer’s perspective that you would like to model a correct states of the machine of the software and the hardware and language allows you to do it more easy or more hard to do that, like in the C language, there is a lot of undefined behaviors, so modeling along the machine states, it’s very difficult and that’s why you have various undefined states.
14:05 - Other language it’s slightly more accurate in that case and allows you to give you a more accurate modeling of the states, and that’s why there is less unambiguously and less unknown states there.
14:17 - However, because SPARK, it’s a much more strict than other essentially, it’s the most closest way how we can get to the correct state.
14:24 - However, because the strength is in the analyzing tools, essentially this is what the tools can give you as an attribute.
14:33 - However, there is some kind of values in the SPARK language.
14:35 - Imagine you wanna call the libraries, which is implemented in the different language, like in the C, essentially SPARK analyzer will not be able to prove that the code written in C, which you cause from SPARK is also correct in unknown state, so that’s why they survive from that.
14:52 - And how does it look in the practice? So if you write a program and you make a mistakes, the prover will not only tell you where the bug is in the which line, and which file, but also they gives you proof of concept essentially.
15:03 - So what are the values necessary to generate the bug or error? This example on the top, you can see that the prover tells you that there is a problem divide by zero, which may fail when the variable B has value 42.
15:16 - So it not only gives you the problem and does not only tells you the problem, but also gives you the necessary condition, how to execute the problem, which is very interesting.
15:24 - Another example could be medium array index track might fail when the MyIndex have value 36.
15:31 - So it’s very interesting. And then we’re starting to think about that a bit more.
15:35 - And so we wanted to analyze, like what are the problems which specific tools can catch or cannot catch? And we found one of the interesting scenario where I was able still to generate a program, which prover says this doesn’t have any problems, which is you see on the top, on the slide, mark on the green, the prover said there is no problems there.
15:55 - However, when we run that we were able to execute stack (indistinct) vulnerability, and generally the exception, because lack of the memory.
16:03 - However, as we said before, the SPARKS supposed to give you warranty there is absence of errors, runtime errors.
16:09 - So how is it even possible? And then we started to run other tools, like GNATStack and we realized that even GNATProve didn’t find a bug, this problem, GNATStack did, and on the bottom, on the red, you can see that they said that the GNATStack analyzed all of the different phases and the analyzer could be a problem with stack (indistinct).
16:28 - So what did we learn from that kind of approach? Essentially, we learned that you can compile still a buggy code, but the problems are detected by the tools and the developers might not run them at all.
16:40 - So it’s very interesting attribute because it depends on the developing scenarios and the process of developing the software on the company.
16:47 - And also tools are orthogonal to themselves because they detect completely different classes of the problem, and to be fully protected, you must run all of the provided tools.
16:56 - And again it depends what the process of software development is to be able to find out if all of these tools are run.
17:03 - And another problem is there is no clear definition of what are the classes of the problems, which can or cannot be detected.
17:09 - It’s very limited public information, what can be detected, what is not.
17:13 - And again, we didn’t find anything from the security researchers perspective.
17:16 - So what did we do? We end up trying to do more research and find out, because if nobody did it before us, we must do it by ourself.
17:25 - So this is what we end up doing. We starting to evaluate, it’s kind of strong word, but let’s say analyzing the AdaCore/SPARK language from the offensive security perspective.
17:34 - And we divided the language, we divided to the, all of the security problems, software problems, so the general buckets of the problem and one of the most popular one is exactly general memory corruption bug, and compare with the other languages.
17:50 - And so essentially SPARK as we mentioned, is a memory safe language.
17:53 - So none of these kinds of problem exist. There is some kind of caveats which I don’t wanna focus now.
17:58 - However you can think about that, it’s a memory safe language, so memory corruption doesn’t exist there.
18:05 - And then we moved to the general pointer security.
18:08 - And it’s interesting because SPARK doesn’t have pointers at all.
18:11 - So if it doesn’t have pointers, the pointer security doesn’t apply there.
18:15 - However, there is still kind of corner case, like imagine we are able to generate the stack (indistinct).
18:20 - So that’s why uncontrolled memory allocation kind of semi exist, if you don’t run the GNATStack.
18:25 - But it can be of course catch by the tool set if you run them correctly.
18:28 - And also they could be some kind of double (indistinct).
18:31 - Imagine your DMA twice the same as the memory, it’s kind of possible to have like that.
18:36 - And then we move to the arithmetic security, and as I mentioned, the SPARK gives you ability to have very precise type language.
18:45 - Essentially you can define own type in the language and also define own boundaries.
18:50 - And then the prover will tell you if anyone crossed these boundaries, which we defined.
18:54 - So essentially because of that, you don’t even need to use the general types like integers.
18:59 - You just defined only one every time, or a new model in your program, your hardware.
19:04 - Essentially, that’s why none of this problem exists, like integer overflow, and underflow or arithmetic overflow doesn’t exist in the SPARK.
19:12 - And then we also have other types of the problem like missing default case in switch stipend or assigning instead of comparing, and these problems doesn’t exist in SPARK because tools easily catch it up.
19:22 - And then we’re starting to find out something more interesting, like a parallel execution.
19:26 - Essentially, what about the problems like race condition or deadlocks? Essentially, it’s possible to have them, but AdaCore working on the extension to SPARK, (indistinct), which supposed to close that gap, but from our perspective, it was not very interesting because the boot software was not running essentially parallel.
19:45 - It was single execution, so it was not very interesting for us, but it’s worth to mention.
19:49 - And then in the entry move to the logical box.
19:52 - So essentially in the logical box, the prover and neither GNATStack, none of the tools were, can catch if you badly design the software, because you try to model the hardware and to try to generate any kind of software, we transformed that specific states, which we model.
20:11 - And then in the end, if you barely do that, of course SPARK won’t be able to catch the (indistinct), badly designed or badly modeled, because this is intention of your software.
20:20 - So this kind of problem still exists if you’re accurately modeling the hardware, or if you accurately handing the DMA or barely designed the software.
20:27 - So this is exactly the path which SPARK can not help you much.
20:31 - So what did we learn from this evaluation? So, essentially, as we mentioned before, you can still compile the buggy code and because the problem are detected by the tools and developers just might not run them.
20:42 - And again, the tools are orthogonal, so they detect different classes of the problem.
20:46 - So again, to be fully protected, you must run all of the tools, not just one of them.
20:51 - And from analysis of the implementation box, we realized that there still might be security issue when you, in the design problems, or also some kinds of logical errors, but there is no kind of memory safety, for example.
21:04 - So it’s not worth to focus on that. And again, there is an additional cost of the problem that could be introduced by the compiler because bugs can be introduced by the compiler itself, even if it’s not in the software or maybe they’re in the hardware.
21:16 - However, to be able to catch these three types of the problems, we need to analyze the binary, yes.
21:21 - So because we need to analyze the binary, we need to have tools who can target the RISC-V and that was the problem that during this research, neither either Pro nor Ghidra natively supported RISC-V.
21:32 - There was some kind of progress, but there were no essentially native support of RISC-V.
21:37 - So we decided to focus on the Ghidra and add on a custom plugin to Ghidra, to be able to analyze the binary which was running there, but was generated from SPARK language.
21:50 - And this is what we end up doing, and how did we do it? So Ghidra 9. 0, that time is what was the newest one, didn’t support natively RISC-V.
21:57 - More over, we were dealing with the customized V chips not the standard RISC-V chip.
22:01 - So even if you hate them, it will not work for us.
22:05 - So again, RISC-V is huge and implementation of entire RISC-V base would take us tons of time.
22:12 - And traditionally we needed to add the custom feature there, a custom extension, not just the RISC-V base.
22:19 - So what we end up doing, we found on the GitHub, a few RISC-V base plugins for Ghidra, which had different implementation of RISC-V.
22:28 - And we decided to integrate one of the plugin, which we thought it’s a good one on the top of three of Ghidra itself.
22:35 - And a few months after which is what is interesting, Ghidra 9. 2 brought the natively RISC-V support, where they use exactly the same plugin which we use in our research.
22:44 - And where to start, so essentially we successfully integrate RISC-V plugin first, but of course we need to modify it because we have custom extensions.
22:53 - So Ghidra is using SLEIGH language to describe the CPU and what is SLEIGH language? And SLEIGH language is a processor specification language developed just for Ghidra and it heritage from the SLED.
23:04 - And what is not cool that there was at that time at least very little documentation about that.
23:09 - So if you wanna implement some kind of a simple CPU, you can use as a source of knowledge already implemented CPUs in the source code.
23:17 - And based on that, you can just implement on one.
23:19 - but if you wanna do something more complex, this could be a very painful, and it was very painful at least for me.
23:26 - So additionally we found, in fact only one interesting source of knowledge.
23:30 - It was a presentation made by Giulaume Valadon, which we link here on the slide.
23:36 - So what do you need to do to implement CPU in the Ghidra? Essentially to create a couple of files, which are listed here.
23:42 - And so then we also defined the model manifests, how to tie them together and compile.
23:46 - However, we already had this file because we have from the plugin, USB base plugin, but we needed to modify them, especially SLASPEC to be able to add custom extension.
23:56 - And this is exactly the file where you define the register definition, the tokens, the aliases, the instruction, et cetera.
24:02 - And Ghidra, what is interesting, and it’s worth to keep in mind, allows you to compile the bad model SLASPEC, as soon as the syntax is correct, as long as the syntax is correct, which essentially you will not even know if you make a mistake, then you run the Ghidra, you think everything works, then you meet the instruction, which you barely implement on the runtime.
24:22 - And then you see tons of Java exception, and maybe the program will crash.
24:26 - So we use essentially “check and try” and “calm down” techniques to be able to achieve what we wanted.
24:31 - However, let’s briefly talk about how to do these kinds of custom extension or tokens.
24:35 - So Ghidra plugin CPU implementation. So essentially at first you define the token, and the token of the instruction here, defines common the bit instruction narrative, and you can define them in the range of the bits which you’re focus with, from the values or the axis of the name.
24:48 - Here on the bottom you have, for example, the name CSR zero, which will be representing the bits from 20 to 27 as a token.
24:57 - And then you also can define the register. And RISC-V, for example, have status registers, like U status, from the user mode status, which we defined the registered offs and the size and the name, and then you can grow them together.
25:08 - So you can attach the variables to the CSR zero.
25:11 - You can attach the names of the register, which we define before, and this is exactly what you do.
25:16 - We just, exactly also what we did. We just define the custom extension, the custom tokens, the custom register, the custom variables, and then we starting to define the custom instruction and how to define the instruction.
25:29 - So this is example of the compressed ADA instruction.
25:32 - You essentially define what are the operands, there’s two operands, you have D and CRS-2.
25:38 - And then you define what are the values of the tokens, which we define as previously as (indistinct) mapping.
25:44 - And then you define what they must be necessarily values for them to be able to decode these specific instruction.
25:50 - So example the cop-001 token with the value two, and the bits between 13 to 15 interpreted as a token cop (indistinct) 15, (indistinct) value four, et cetera, et cetera, the final of the bits and then the Ghidra will know that this is exactly the instruction which you’re looking for.
26:06 - And then you define the (indistinct), if it matches you go to (indistinct) in the compressed, then add the instruction, we essentially add two operands, so that’s what we did.
26:15 - And then there’s some other examples of the compress branch equal zero.
26:20 - It’s exactly the same, you define the tokens, you define the values of the tokens, which are necessary to, and call the instruction.
26:26 - You define operands and then (indistinct) will check if the operands is zero, then you just jump and that’s all.
26:34 - And in the end, when we starting to add that, and we add the custom extension, the top of the RISC-V base ISA on this plugin which we had, we end up having this, which is very cool.
26:45 - So essentially what, this is a screenshot from the Ghidra, that shows you that essentially it correctly decompile the program, and you have forfeited the compiler on the right side if you see.
26:57 - On the left side, you have the assembly. And then the right side we starting to have for free the compiler, and as you can see this is a binary which was generated from the SPARK.
27:07 - This is a simple language they found like Ada, RV from RISC-V, et cetera.
27:10 - However, that’s the idea that we also got for free, not only that this assembly, but also on the analyzer.
27:17 - We also had the custom extension was automatically reflected in the decompiler, which of course obviously make our life much easier.
27:27 - So, I’ve linked everything together. We already know what to look for because we analyze the SPARK and we know what the limits for the language and to what we can hunt for from the offensive perspective, obviously not memory safety issues.
27:39 - However, we also know that we were learning on the RISC-V and it was not normal RISC-V, but the custom RISC-V chip.
27:45 - So we focus on the design on how hardware is more focused.
27:50 - So we focus on what is the design of the SPARK, software SPARK, how is it designed and how to implement the custom hardware, how is it modeled because this we know what is there and it’s not standard.
28:02 - So what we saw, we saw during analyzing the binary, because that’s why we needed to have this Ghidra plugin, that very first, very, very first instruction of this boot software was configuring the hardware, also custom hardware, and not only, but additionally also custom hardware.
28:20 - As soon as the first instruction run is starting to configure the hardware.
28:24 - And later we see that instructions of setting up the MTVEC value.
28:30 - And then we starting (indistinct) what is MTVEC value? And officially RISC-V convention says that MTVEC is defined as a register read only and, or read/write, which holds the base of the rest of the trap handler.
28:41 - And by default RISC-V always handles all of the traps in the most privileged mode, which is M mode, so that it can delegate this trap to the least privileged, the other modes if it’s needed to.
28:53 - And when the essentially trap happens, the RISC-V switches to the machine mode and set the instruction pointer counter to the value defined in the register MTVEC.
29:04 - And MTVEC can be defined in the two modes. There is a mode which essentially keeps the direct pointer, which is just jumped there and is defined by the two significant bits.
29:14 - So bit zero and one, if it’s zero then we have direct mode.
29:17 - You just jumped to the pointer. Or if it’s non zero, then it’s being treated, the value there as a vectorized like IDT table x86, for example.
29:27 - So then we started to think that MTVEC is essentially configured after we configure the hardware.
29:33 - So what will happen if any interrupt arise before the MTVEC is initialized, because it can happen during initialization of the hardware.
29:41 - And this is exactly what we discovered that essentially there is a problem because RISC-V MTVEC registers specification does not define the initial value of MTVEC at all, so it’s undefined.
29:53 - However, when we’re starting to analyze a few different implementation of RISC-V, we found out that most of the tests and implementation set it to zero anyway, but in many implementations, zero is not a valid address, or if it’s valid, it’s not mapped.
30:08 - And then the reference to it, to MTVEC pointer which is known will generate an exception.
30:13 - And this is kind of interesting because if there’s any exception generated before initialize MTVEC register, RISC-V ends up in a very stable infinitive exception loop, because it’s starting for the reference an old pointer, which is not valid and generate another exception, which generate another exception and so on, so on.
30:30 - And what was interesting that during this experiment RISC-V did not halt, it was not halted, it continues spinning in the infinitive exception loop, and such state is in fact ideal situation for the fault injection attack, like glitching attack, because RISC-V at first is running in the highest privilege mode, which is M mode, and constantly dereferencing the glitchable register.
30:54 - And so this is the first the bug essential, which we found that ISA does not define the initial value of MTVEC register, and the second bug is that, ISA allows you to have infinitive exception loop without halting the core.
31:09 - There is lack of double or triple fault exception, which is kind of interesting.
31:15 - And however, how to exploit these kind of issues which we found, and this is where I would like to hand over to Alex Matrosov, which physically glitch the RISC-V and he can take you over from there, thanks.
31:28 - - Thanks Adam. Hello DEFCON, my name is Alex Matrosov, and I will be walking you over our exploitation technique for MTVEC, and it’s actually two important points, just Adam mentioned it on the previous slide, we have MTVEC undefined behavior, and the core of RISC-V also doesn’t halted, and it is actually looped after, search of an exception happening with undefined behavior of MTVEC instruction.
32:00 - But let’s talk about the attack scenario and it is important points, which you need to know before we basically get an actual scenario.
32:08 - We need to prefill D/I MEM of RISC-V core. And we can use like external recover USB boot flow functionality or access (indistinct).
32:20 - And second thing we need an ability to generate an early exception during core execution.
32:25 - Basically it is a physical hardware damage can cause that.
32:30 - And let’s go to this scenario, let’s say we prefilled IMEM with R as a shellcode, but it is some interesting techniques we used to make it the attack more stable because when, after successful MTVEC exception, when we jumping into IMEM, it can be some random address, right? And we need to make sure our code will be executed, with actual payload.
33:00 - So we need to pre-fill IMEM with the NOPs and NOP SLED will be basically creating some sort of insurance will be, get into the random place, but it will be led our shell code execution, and on top of our NOP SLED, we put actual shellcode with a payload, and it will be basically the realistic attack scenario in our case.
33:26 - So attack your boot RISC-V core and then enforce the necessary condition to generate an early exception during the software boot and execute before actually MTVEC get initialized, right? So a RISC-V core, you’ll be jumped into the null page and enters to the state of the infinite loop exception, very stable and predictable state, by the way.
33:53 - And it’s actually one of the conditions of our success in this attack scenario.
33:59 - Attack here glitches the MTVEC register CSR and after the value get glitched because it just a boolean value, even one big change, will be changed the condition of this register, right? And the looped core, you’ll be point somewhere into IMEM and the special payload which we discussed before will be executed, this R shellcode.
34:32 - So it’s very interesting and stable attack scenario.
34:36 - The test that went in different ways, and I will be explaining how we actually created the fault injection attack in the next slide.
34:45 - But before we move there one thing you need to remember, because MTVEC register has a new page, it’s very likely that the change of one bit will end up and generate an address point in the middle of the NOPs and IMEM.
35:01 - And it’s exactly why we need the fault injection attack.
35:05 - And we experimented with a different type of fault injection attacks, clock glitching, voltage glitching, and actually we try to chip shooter and with a NOP SLED, it’s actually possible, but most stable and in our case because we experiment with U4 boards and ChipWhisperer.
35:27 - And it is U4 board with RISC-V Silicon CPU on it.
35:31 - So we are actually able to reach very stable point with the clock glitching attack.
35:39 - So on this oscilloscope diagram you exactly see the glitch happens with the clock technique.
35:48 - But also because we measure this values, with clock glitching, even with this small guy, ChipWhisperer Nano, we can manage after we await all the parameters and we have the offsets of the timeframes, we can create successful attack and actually with much cheaper hardware.
36:11 - So of course, we tried this complex scenario in many different environments.
36:18 - And first of all, thanks for Nvidia hardware team, which has actually help us a lot to play with a simulation environment with internal hardware, and actually, first of all, we realize about this attack when we’ve been playing with some internal tools.
36:38 - On this slide, you can see the simulation environment where we have full event scenario.
36:46 - On step first, we pull the trigger to corrupt MTVEC registers, CSR, the value of CSR, and looping the core, and on the wave diagram, it’s very feasible once that happens, and the step two actually leading the value change and then equal triggers an exception handler with the corrupted MTVEC register value.
37:13 - And that’s led exactly pointing to IMEM, and executing our NOPs SLED and lead the shellcode execution.
37:28 - But let’s talk also about how we report this bug, and how we fix this bug with an industry effort from the RISC-V foundation community.
37:41 - Actually, it was very tough scenario because think about you find the bug, not an actual implementation, not in a single board, you find the buganizer that’s in most of the boards, which is currently available in the markets, it’s affected by this issue, and how to fix it, right? So, first of all like of course when we just realize this is big issue, we contacted the RISC-V Foundation, and until the time it was no official security response group, and it’s been actually, we’ve worked with RISC-V foundation over our PSIRT.
38:25 - And also now it is official security response group exists, which is good, right? And I think they can address security bugs and issues with ISA, and not only is the ISA tied to the RISC-V foundation as an industry call much more efficient.
38:41 - Also we contact SiFive and working with them on analyzing this issue, and actually we allocated the CV number 2021-11-04.
38:54 - And Nvidia PSIRT and Nvidia RISC-V hardware team, confirmed this issue and fixed this issue internally, and sync with all involved parties for responsible disclosure.
39:06 - It’s been kind of tough and timeline been also very limited because we want to deliver this talk on DEFCON Conference, but all the parties need to be secure first of all, right.
39:18 - And how actually we can propose the fix for this problems right, so first of all, what we need to do is initialize at MTVEC.
39:31 - All tested chips have MTVEC programmable and the most common mode vulnerable to describe a problem.
39:40 - And actually we also realize some of the newer chip being released recently this spring also actually affected by this problem because of course nobody thinks before you bought MTVEC and define that behavior.
39:56 - But of course without a looped core on the exception handler, this attack will be not exploitable.
40:04 - And if MTVEC value will be not described, but then still on define that behavior exists, but let’s say it will be at some double or triple fault, like exception or like just in the help, this will be prevent the code execution, which we described on the previous slide.
40:28 - So how we can fix this issue and what kind of mitigations it can be.
40:34 - First of all, it can be a combination of hardware mitigations and software mitigations.
40:42 - But let’s talk about DCLS, and DCLS, and TCLs, it’s actually very interesting techniques when we taken the shadow core DCLS with a double core or TCLS with a triple core in consideration of additional execution, following the instruction flow, right? So the shadow core adjust, having the same flow, the same instruction and in sync with the original core and if instruction flow is not equal, it will be panic or halted or whatever happens as defined at the silicon, right? Same thing happening with TCLS.
41:29 - And TCLS just introduced two cores and it will be a more difficult to glitch.
41:37 - And in case of the two cores, the possibility of the glitch still possible potentially, but of course it’s raising and increasing the bar.
41:46 - TCLS created much harder the attack scenario.
41:53 - And I would say, realistically, it need first of all to be involved in multiple glitch attacks simultaneously.
42:01 - So you need to glitch simultaneously multiple cores and with the corruption of the instruction, let’s say you need to corrupt the value, and if this values are different, so it can be, cause some additional exceptions, which you can’t predict, right? So I would say it’s raising the bar, of course the attack still possible, but much more complicated.
42:23 - But let’s talk about the software mitigations.
42:26 - And Jeremy Boone from NCC Group recently created a blog series about the software mitigations, and thank you for that.
42:34 - It was a great series of explanation from the industry standards about software mitigations.
42:42 - But of course, all this mitigation is known for a while, and also it’s broadly used.
42:48 - Software mitigations doesn’t prevent such of the attacks, but it’s raising the bar, in many cases it make much more difficult as the attack scenario or like for unexperienced attacker, it make it even impossible.
43:06 - But all this, hamming distance, clear memory, random delays and redundant checks, it’s actually can be introduced on the compiler level and created automatically because first of all, the developer can forget to install this mitigations or like copy paste something incorrectly, or forget to change before coping this sort of mitigations in different places, also some of them just can be utilized by compiler during the compilation flows, because redundant checks as example without special definitions or compiler extensions, it will be optimized during the compilation phases, and you clearly can see this in this assembly flow.
43:56 - Also, we proposed some decisions, we propose some design decisions to address for MTVEC weaknesses.
44:04 - And basically, as we said before, start CPU need to signal when the signal will be arrives, pre-initialize MTVEC to point to the halt instruction.
44:17 - Basically, if something happens, it will initiate the help and change the ISA needed because we need to warn about the potential undefined behavior if the MTVEC will be, not initialized, right? And we need to basically introduce this kind of comment into the ISA documentation.
44:39 - Also introduction of double, triple fault like exception will be complimentary to the halt of the core, and actually instead of infinite loop, it will be make this attack vector unexploitable.
44:55 - It will be cause the denial of service, but the not code execution we described previously.
45:02 - And of course, like it’s much more can be explained in terms of RISC-V hardening and I think the mitigations against software attacks can be explained in different ways, but also it is brilliant research been done of introducing pointer masking.
45:24 - And Adam will be take over from that place.
45:30 - - Thanks Alex. Let’s briefly talk about the pointer masking extension for RISC-V, which can significantly increase the security state of entire RISC-V ecosystem in the software, especially in the software, but also in the hardware.
45:43 - It can also reduce the impact or at least increase the bar of exploitability of this MTVEC issue, which we’re speaking about here today.
45:52 - So this extension is driven as a collaboration between Nvidia, Google, RISC-V TEE Extension and the J Extension Task Group.
46:00 - So this is a huge collaboration. And from the security perspective it allows to implement various technologies, including Hardware ASAN, including Pointer Authentication Code known as PAC, Hardware Memory Sandboxing, and it also can serve as a foundation for other extension, including Hardware Memory Tagging, which cannot exist in the RISC-V without the pointer masking.
46:21 - And also it can significantly improve the security of the other extension, which is protected.
46:29 - It can protect the Control Flow Integrity Attributes, and Shadow Stack, which is an extension driven by the TEE Task Group and the work is in progress.
46:39 - However, this kind of technologies are mostly known, excluding the one, which is Hardware Memory Sandboxing.
46:45 - This is kind of innovative and we haven’t seen in other architectures.
46:49 - So what is Hardware Memory Sandboxing? Essentially it allows you to lock down the specific execution context on the sub-region of the memory.
46:58 - So even you can make an extra boundary between the threats, even if they run on the same protest context.
47:05 - So the threat execution, executing threat, might not even be able physically to see or reference any memory outside of the pre-configured one.
47:14 - And this is being confers in the hardware level.
47:18 - But this is more than that because essentially what you can think about that, that essentially you can lock down what is specific ranges of the memory, which are visible for each specific execution context.
47:30 - So even if you have the vulnerability in such kind of software which runs on specific execution context, such vulnerability will not allow you to jump over the predefined memory region.
47:42 - So even if some kind of secrets is leaving in the other pages, which normally this specific software doesn’t need to have access to, the vulnerability in the classic way, will allows you to steal these secrets or override these secrets.
47:57 - But because pointer masking can lockdown, what do you physically see, which physical pages (indistinct) pages, then you can protect that such kind of vulnerability will not be able to allow attacker using the current execution context to reference or corrupt any kind of the secrets outside of their predefined memory regions.
48:18 - And this is exactly what the Hardware Memory Sandboxing is doing, and it’s provided by the pointer and masking isolation feature.
48:26 - Essentially it’s slightly even more than that because you can have multiple different execution contexts, which are locked down and they never be able to reference memory, which does not belong or are not configured for them to be seen.
48:42 - And this also allows you to implement features like, not only sandboxing, but also for the low cost devices which do not need to do full context switching, because essentially the execution contexts cannot corrupt (indistinct) data between themselves because they kind of lockdown.
49:02 - You might not do a full context switch, which you can save you performance.
49:06 - So this is what essentially does it do, and from the MTVEC perspective, how the pointer masking can help essentially can predefine specific boot software, can only be able to reference the memory for a very small specific sub-region of the memory.
49:22 - So then if you wanna glitch the MTVEC register, random, corruption of the random bit will most likely do not allows you to execute your custom code or shell code because you will not even see that.
49:34 - It will be lockdown specific execution context, which would be very hard to predict what type of bit you need to precisely glitch to be able to end up in the predefined memory sandbox.
49:45 - So that’s what it is, and I would like to hand over to Alex.
49:49 - - Thanks, Adam, and I will be finish our representation.
49:53 - And actually, first of all I want to address some acknowledgement for Nvidia team.
50:00 - And it’s been a lot of people involved from GPU system software and our RISC-V hardware team.
50:07 - I want to thank you all. So product security and especially our PSIRT team, where they being hardly working on this issue, because in the beginning it’s been not obvious where this issue need to be reported.
50:21 - And actually which parties we need to discuss this issue.
50:25 - We need also to thank you SiFive and RISC-V foundation, because RISC-V foundation take our concerns very serious, and address this issue to whole ecosystem.
50:38 - And also Security Team got created. Also SiFive been very present during the disclosure process and help us to understand this issue and side effects more deeply.
50:54 - In the summary, I want to actually summarize our research.
50:58 - And first part, we also cover some type safety languages and formal verification with AdaCore and SPARK.
51:07 - Of course it is minimize some attack surfaces, with the memory corruption issues, but it is not a silver bullet and some issues can be still exist, and especially it’s hard to mitigate some undefined behavior or race condition attacks.
51:23 - And of course in the end, the code compiles to the native code, right? And to machine code by the compiler and formal verification happens on early stage before this transition to intermediate representation happens, and all the compiler optimization applies.
51:42 - It’s still some room for undefined behavior, and a lot of other interesting things can happen.
51:50 - There are CPU ISA bugs exist, right? So when we found two interesting side effects in RISC-V ISA, and this actually real world attack scenarios can combine with the physical attack such of the fault injection, and in combination with software exploitation techniques, this can lead to very realistic and impactful attack scenarios.
52:18 - And of course disclosing the ISA bugs is tough.
52:21 - And we need to basically pay more attention on our open ISA and basically how these bugs can be addressed.
52:31 - And what kind of way and path to reporting in will be exist for the researcher, right? Thank you very much for your attention, and we’ll be happy to answer those questions. .