DEF CON 29 - Tom Van Goethem, Mathy Vanhoef - Timeless Timing Attacks
Aug 5, 2021 17:35 · 5422 words · 26 minute read
- Hello. I am Mathy Vanhoef and together with Tom Van Goethem, we will be presenting Timeless Timing Attacks.
00:09 - Tom Van Goethem is a researcher at the DistriNet group in KU Leuven in Belgium.
00:14 - He is a fanatic web and network security enthusiast, and he likes to exploit side channel leaks in browsers, and more general in Web platforms, as well.
00:25 - I am Mathy Vanhoef. I’m a postdoc at NYU, Abu Dhabi, and I’ll, later this year, be starting as a professor at KU Leuven in Belgium.
00:34 - I’m interested in wireless on network security and software security, and also a bit in applied crypto.
00:41 - Previously, I discovered a KRACK attack against WPA2, and the RC4 NOMORE attack.
00:49 - And in this presentation we’ll be talking about timing attacks and timing leaks.
00:54 - Let me start with some examples of timing leaks.
00:59 - In the top left corner we can see a straightforward timing leak where depending on some secret condition, extra computation is performed.
01:10 - And here, the execution time, so also the response time of, for instance, the web application will leak whether the condition was true or false.
01:20 - The top right, we have a for-loop that iterates through all the elements in an array and the for-loop is terminated once an element is found with a certain secret condition.
01:34 - In other words, the number of iterations that are being executed, depend on some kind of secret information, and this, again, also means that the execution time of this for-loop leaks sensitive information to an adversary.
01:52 - Finally, at the bottom, we have an if test that checks whether an array is empty or not and if it is not empty, extra computation is performed.
02:03 - And here, you can imagine a web app where you can search through secret documents using a key word And then this code would leak whether this keyword is present in a secret document or not.
02:19 - How are these timing leaks exploited in practice? Well, the attacker would first need to connect to the target server.
02:28 - The attacker then sends a possibly large amount of requests to the server.
02:33 - And for each request, the server will measure how long it takes to receive a response.
02:41 - And then the attacker will compare two sets of timing measurements, namely the baseline on the target.
02:50 - And what do I mean with baseline here? Well, with baseline, I, for instance, mean the first example here, where then the secret condition evaluates to false, and the adversary knows that the secret condition evaluates to false.
03:07 - And the adversary will use that as a baseline and the target request is then a request where the adversary doesn’t know what the secret condition evaluates to, but based on the response time, the adversary can then derive whether the secret condition and the target request is also false, or whether it is in fact true.
03:33 - And to check whether there is a difference between the baseline of the target request we will be using statistical tests to determine whether there really is a significant difference or not.
03:49 - There are several factors that influence the success rate of a timing attack.
03:54 - The first one is the network connection between the attacker and the victim server.
04:00 - And in particular, the higher the jitter of this connection, the worse the performance of the timing attack.
04:08 - Now, the attacker can mitigate this to some extent by, for instance, moving closer to the target.
04:14 - So in practice that could be renting a server in the same cloud provider where the victim is also located.
04:25 - And I also want to remark here that the jitter is present both on the upstream path, so in the requests, and in the downstream path, so in the responses.
04:36 - Another important factor is the size of the timing leak.
04:40 - So it’s much easier to exploit a large timing leak, say, of 50 milliseconds than it is to exploit a small timing leak of, say, five microseconds.
04:52 - And finally, the last important factor is the number of measurements that on adversary can make.
04:58 - In particular, the more timing measurements that an adversary can make, the better the performance of the timing attack will be.
05:06 - And Tom will now show a graphical illustration of a timing attack.
05:11 - - Thank you for that introduction, Mathy. So now let’s see how such a remote timing attack works with a visual representation.
05:21 - So we have our attacker on the left side, sending the request to the server.
05:27 - And this request goes over the Internet, then is processed by the server, a responses is generated, and sent back to the attacker and the attacker will then measure how long it took between sending the request and receiving the response.
05:43 - And in this case, this was a bit over three seconds.
05:48 - Then the attacker stores this timing value, and then he will probably need to make multiple requests.
05:57 - So he starts sending the next request. But here on the third hop, there was a small delay, also causing a delay in the arrival of the response.
06:10 - So this means that because of this jitter the attacker will need to make multiple measurements and then apply some statistical analysis in order to reveal the true processing time of the server.
06:31 - We did some measurements to see how many requests the attacker would actually require.
06:38 - We varied the number, or the difference in timing of the processing time between five microseconds and 50 microseconds.
06:51 - And then we determined how many requests an attacker would need to achieve a 95% accuracy.
06:58 - Our attacker was located in our university network located in Belgium, and then we set up servers across the world.
07:10 - And then we performed the attack to see how many requests were required.
07:17 - To detect a timing difference of 50 microseconds with a server in the EU, with the attacker also located in the EU, a total of 333 requests were required.
07:32 - Now when the server was further away, so for instance, when it was located in the U. S.
07:39 - more requests were required. In this case, for 50 microseconds, well over 4,000 requests were required to detect this difference.
07:54 - You can also see that as the timing difference became smaller.
07:59 - also the attacker required more measurements.
08:05 - For 20 microseconds, it was close to 3,000 requests that where needed for the server in EU, growing to 23,000 for 10 microseconds.
08:18 - And for a timing difference of five microseconds, we found that it was not possible to do this with the maximum set to 100,000 requests.
08:35 - We were thinking how can we improve these remote timing attacks? And that’s how we came to timeless timing attacks.
08:46 - We know that if we rely on absolute response timing, this will not be very reliable because there will be always this jitter that’s included in every request because the request needs to traverse over the Internet, and so does the response.
09:11 - Then we thought, okay, then maybe let’s get rid of the notion of time.
09:16 - That’s why we call these attack timeless. So instead of relying on the sequential timing measurements, we can try to explore to concurrency and only consider the order of the responses.
09:32 - We don’t use any absolute timing measurements anymore in these timing and measurements.
09:38 - We just look at the response order. And this means, as a very important side effect is that these timeless timing attacks are completely unaffected by network jitter.
09:52 - So let’s see how this works in practice. Again, we have our attacker on the server.
09:59 - But now the attacker is not just sending one request, but he will be sending two requests that are contained in the same packet.
10:08 - These two requests arrive simultaneously at the server and are processed concurrently, and because the light pink requests took less time to process the light blue response will be generated first and sent first to the attacker, allowing the attacker to detect that these requests required less processing time.
10:39 - If in this case, we have some jitter on the way to the server it will not affect anything because both requests will still arrive exactly at the same time at the server, because they’re contained in a single packet.
11:00 - And then they can be processed simultaneously at the server site.
11:07 - And again, we see the same result. - Thank you, Tom.
11:12 - So what are the requirements of a timeless timing attack? Well, first of all, the requests need to arrive at the same time at the server.
11:22 - The server needs to process these requests in parallel, so, concurrently.
11:27 - And the order of the responses needs to reflect the difference in the execution time on the server.
11:35 - Now let’s explore these three requirements in more detail.
11:41 - The first requirement that both requests arrive simultaneously at the server can be fulfilled in two manners.
11:50 - Namely, we can either rely on multiplexing or on encapsulation.
11:57 - And with multiplexing, our example is the HTTP/2 protocol, and this is because with the HTTP/2 protocol, two requests can be sent simultaneously over the same connection.
12:12 - In fact, a single TCP packet can carry multiple HTTP/2 requests that will be processed concurrently once they arrive at the server.
12:24 - For encapsulation, there our example is the HTTP/1 protocol because the HTTP/1 protocol on its own does not allow you to send two requests in parallel.
12:36 - However, when we exploit HTTP/1 over Tor, or over VPN, then we can send two HTTP requests at the same time over Tor or over the VPN network.
12:50 - Let me illustrate these two cases graphically.
12:53 - For multiplexing, we have the HTTP/2 example where, here on top, you can see the two HTTP/2 requests in gray, and the adversary will assure that these two HTTP/2 requests are sent in a single TCP packet.
13:11 - This assures that these two requests will arrive simultaneously at the server.
13:18 - For the encapsulation case, we have the HTTP/1 example where, here again at the bottom, we have two HTTP/1 requests shown in gray.
13:28 - These two HTTP/1 requests are sent in different TCP connections, meaning they’re also encapsulated in two different TCP packets shown in red here.
13:39 - And each TCP packet is then encapsulated in a different Tor cell.
13:45 - Now what the adversary now does is the adversary assures that these two Tor cells will be aggregated into a single TCP packet and once this TCP packet arrives at the Tor union servers that we are targeting, then the union servers will process these two requests effectively, in parallel.
14:11 - For the second requirement, the second requirement is that these two requests then have to be handled in parallel by the server and whether this is the case is application dependent.
14:24 - We have found that in most cases, applications can handle requests in parallel.
14:30 - One possible exception is when you target cryptographic operations, because those often have to be handled in a certain sequential order.
14:41 - The third requirement then, is that the order of the responses will reflect which requests finished processing first.
14:52 - And, in practice, that means that the server must immediately send a response after it finished processing a request.
15:03 - Additionally, the adversary must be able to reliably recover the response order.
15:10 - In other words, when the responses are sent back from the victim server, to the adversary, the network should not be reordering these responses.
15:20 - Now, in our experience, in our experiments, we noticed that these two responses generally follow the same network paths, meaning the order is maintained.
15:31 - But even if for some reason, the order of the responses is switched along the network path, then the adversary can still rely on TCP shields, such as the TCP sequence numbers, or the TCP timestamps to recover the order in which the server sent both responses.
15:53 - What is now the performance of a timeless timing attack? Well, let’s first, and for all, recall the performance of a traditional timing attack, where Tom explained the first part of this table here.
16:06 - On this slide we also added the performance of a traditional timing attack over the LAN and the local host.
16:13 - For instance, we can see that if we want to exploit a timing leak of five microseconds over the local host, this would require a bit more than 40 requests in a traditional timing attack.
16:28 - In contrast, if you look at our new timeless timing attacks, we can, for example, exploit a timing leak of five microseconds using roughly 50 requests pairs.
16:44 - And important here, is that the timing leak can then be exploited no matter where the adversary is located on the internet.
16:54 - And this is in contrast to traditional timing attacks because with traditional timing attacks, we were not able to exploit a timing leak of five microseconds over the internet, while for our timeless timing attacks we can.
17:09 - On top of that, I want to highlight that with the timeless timing attacks, we were able to exploit a timing leak of just 100 nanoseconds while with the traditional timing attacks, the best we could exploit was a timing leak of 150 nanoseconds when the adversary was in the same LAN or local host.
17:31 - And, in fact, over the internet a traditional timing attack, at best could exploit a timing leak of 50 microseconds.
17:40 - And this really shows that timeless timing attacks are an order of magnitude more powerful than traditional timing attacks.
17:50 - That covers how direct timing attacks work.
17:54 - It’s also possible to perform timeless timing attacks in a cross-site setting and to perform timeless timing attacks against Wi-Fi authentication.
18:04 - And Tom will now elaborate on the cross-site timing attack scenario.
18:10 - - So these cross-site timing attacks, they’re a bit different from the direct timing attacks that we were talking about before, because in this case it will not be the attacker who’s directly connecting to the targeted server, but it will be the victim sending requests, but still these requests are triggered by the attacker by launching some malicious JavaScript code.
18:36 - This could happen when the victim lands on a malicious website, for instance, by clicking on a link, or if there’s a malicious advertisement being shown, or if this victim has a really urgent need to look at some cute animal videos, and then ends up on the website of the attacker.
18:59 - So these requests that are triggered by the attacker with the JavaScripts, they will include the cookies.
19:07 - This means that the server will process the requests using the victim’s authentication.
19:15 - Once that request has been processed, the attacker can still observe the order of the response.
19:23 - For instance, when he’s using the Fetch API, he can look at which promised results first, and this will leak, or this might leak sensitive information that the victim has shared with the website.
19:41 - We actually applied this timing attack, this cross-site timing attack to the search functionality in HackerOne, where we were able to leak information about private and disclosed reports.
20:00 - One of the key differences is that with direct timing attacks, is that here, the attacker does not have direct control over the network.
20:11 - Just because the browser can choose how to send requests to the kernel and the kernel will then send it to the server.
20:20 - So we will need another technique to make sure that the two requests are joined into a single packet and in order to do so, we can leverage TCP congestion control.
20:34 - This mechanism will prevent the client from sending very large number of packets at once and it sets that attacker won’t have, sorry, the client will need to wait for an acknowledgement from the server before more packets can be sent.
20:58 - As soon as the client is waiting for acknowledgement from the server, any other requests that are made, all their data will be stored in the queue and they will be merged together into a single packet.
21:21 - So the attack looks fairly simple. First we make post requests with a sufficiently long body, and then immediately after we make the two target requests.
21:37 - So the one will be for the baseline and then the other one is the target URL for which we want to see if it matches the same timing as the baseline or whether it takes longer or shorter.
21:57 - Okay. Let’s see how this works in practice then.
22:01 - Here we show the victim’s packet queue. The TCP packets sent that are being queued to be sent to the server.
22:13 - First the attacker makes this post request which already sends off a number of TCP packets, but there will be also some packets that cannot be sent yet, because we need to wait for the acknowledgement from the server and they’ll be added to the queue.
22:33 - Here every square represents a single TCP packet.
22:42 - Then the second line of codes is where the attacker will make a request for the baseline URL.
22:51 - And as you can see here, it’s represented with this pink, light pink.
22:59 - And then, second, the third line of code will be the attacker sending the request for the target URL, represented as the darker pink.
23:12 - As you can see, both requests will be added to the same TCP packet.
23:20 - And then later on, when the client has received enough, or sufficient acknowledgements from the server, it’s possible to send all the remaining packets that are in the queue.
23:36 - And as you can see, the light and dark pink requests will be both in a single TCP packet, which is exactly what we need to perform our timeless timing attacks.
23:54 - In this presentation, we already discussed the direct timing attacks, where the attacker connects directly to the server.
24:02 - Then what I was just talking about are the cross-site timing attacks.
24:06 - And then now, Mathy will be talking about a third attack scenario where we use a timeless timing attacks to break a Wi-Fi authentication.
24:20 - - So it is also possible to perform timeless timing attacks against Wi-Fi, and in particular, we’re going to exploit the EAP-pwd protocol that can be used in WPA2 or even WPA3 enterprise networks.
24:36 - Most enterprise networks, they use certificates to authenticate the server and sometimes also to authenticate the user.
24:44 - However, this can be quite annoying to configure.
24:48 - So in practice, a small but significant amount of networks rely on the EAP-pwd protocol and stats to authenticate a user using a username and password.
25:00 - And when using these enterprise authentication protocols, the authentication happens between the clients and the authentication server.
25:07 - There the authentication server is commonly a free radius server, and the access point will simply forward messages between the client and the authentication server.
25:19 - And because of the authentication server can be located anywhere on the internet, the communication between the access point and the authentication server is typically protected using a TLS tunnel.
25:32 - on the TLS tunnel is called a RadSec connection.
25:36 - Now the EAP-pwd protocol that we will be targeting uses the so-called hash-to-curve algorithm to verify a password.
25:46 - Unfortunately, a timing leak was discovered in this algorithm.
25:51 - This was called a Dragonblood attack. However, against EAP-pwd this timing leak was fairly small.
26:01 - And it did not appear possible to combine multiple timing measurements in a more powerful attack.
26:09 - So in other words, it seemed almost impossible to exploit this timing leak at least against EAP-pwd.
26:19 - However, by using the timeless timing attack, we are able to attack the EAP-pwd protocol.
26:26 - And how does this work? Well, the adversary will spoof three clients shown here on the right.
26:34 - And first, the first two clients will associate to the access point.
26:39 - The access point will request the identity of the client.
26:43 - The client will reply with their identity and send the identity through the FreeRADIUS server, and then the FreeRADIUS server will tell these clients, okay, I recognize you, and both of you now have to perform the EAP-pwd authentication algorithm.
27:03 - Now up to this point, nothing special happened.
27:08 - Instead, the attack only starts now.
27:11 - And in the first part of the attack, the third client will send a special authentication frame to the access point.
27:21 - And the access point will forward this authentication frame over the TLS tunnel.
27:26 - So over the RadSec connection to the FreeRADIUS server and this will cause the buffer of the access point to slowly fill up with RadSec frames, and as a result, when Client 1 and 2 now sends their EAP-pwd authentication response, then because the access point already has some frames in its buffer, these two frames will be combined into one TLS record.
27:59 - Before I explain that in more detail, I want to highlight that the server here will send both these EAP-pwd authentication responses in a single physical Wi-Fi frame, and this single physical Wi-Fi frame is called an A-MPDU frame.
28:19 - This means that these two authentication responses will arrive at exactly the same time at the access point, and because the buffer at the access point is slowly starting to get full, the access point will combine both authentication requests in a single TLS record that is sent to the FreeRADIUS server.
28:42 - The FreeRADIUS server will then process both authentication requests simultaneously and parallel, and it will reply using whichever requests finished first, and then it will reply using the other request that finished processing second.
29:02 - So here, the request of the first client finished first at the server.
29:06 - So the server also sends it’s reply first, and then the reply of the second client is sent.
29:14 - And we can see that this order is also reflected at the adversary site.
29:20 - Namely, the adversary will first receive the reply of the Client 1, and then it was received the reply for Client 2.
29:31 - And the order of these replies allows an adversary to bruteforce the password of the victim.
29:42 - And in this case, the probability of them successfully bruteforcing the password of the client is higher than 99%, or to put it differently, the response order correctly reflects the execution time at the FreeRADIUS servers in 99% of the time.
30:08 - So only in less than 1% of the cases does the response order incorrectly reflect the execution time at the server.
30:19 - And we can use this information to then perform a off-line dictionary attack against the password.
30:26 - So for instances, if we take the RockYou password dump, which contains about 140 million passwords, then we need to perform 40 measurements, and with those 40 measurements we have an 86% success probability of correctly recovering the password.
30:45 - At least if the password is inside this password dump.
30:50 - And this bruteforce search costs about $1 on, for instance, an Amazon cloud server.
31:00 - So as a quick recap, we now explained direct timing attacks.
31:05 - We explained cross-site timing attacks, and we explained timeless timing attacks over Wi-Fi authentication.
31:13 - And with that, I hand the word back over to Tom.
31:17 - - All right. And that brings us to the demonstration.
31:22 - So for this demonstration, I created an application called The Vault, which is basically a place where people can securely share their text documents with military grade protection.
31:37 - So users can enter the title of the document and then select the required security level of the other users.
31:47 - It’s required to access the document. So it ranges from one, which is for public documents, up until five, which is for really top secret documents then the user can enter their content and finally post the document.
32:07 - Another functionality provided by The Vault is the search functionality where we can search for strings like “hi”, and then get a bunch of results.
32:22 - We can also look for other strings, like for instance, “Vegas”.
32:26 - And then we can see that there’s this one single result, or we can try some more secretive things like the string “DEFCON”, but unfortunately, we don’t find any documents containing that string.
32:44 - Well, I did create this application.
32:49 - So I know that there’s at least one document with the string “DEFCON”, but the reason why we’re not shown this document is because we don’t have the right security level.
33:05 - In this demonstration, what we will try to do is leak the password, that is in the same document as the one that contains the string “DEFCON”.
33:18 - And in order to do so, we will be using these timeless timing attacks.
33:26 - But before I go into the details there of how the attack works, I’ll first explain how the search functionality works.
33:37 - Basically, what we do is we do a textual search on the contents of all the documents using the query provided by the user.
33:47 - And then if there’s at least one document, we need to be able to tell whether the current user, their security level is high enough to view one of the documents.
34:04 - So getting the security level is done by launching a simple SQL query on the database that’s located on the same server as the web server.
34:17 - So that the timing or the latency there is very minimal, but still, this is where the timing leak is because if there are no documents, which means that there’s no match for the text, we don’t have to launch this SQL query.
34:39 - So this means that if there’s at least one document, the timing will take slightly a few microseconds longer.
34:54 - This is what we leverage in our attack. So we use PYTHON here, and we are using the H2 H2 time library, which is the one that we also use in our research.
35:12 - It’s made public and I’ll be sharing the link later on in the presentation.
35:19 - So what we do is we define the URL prefix, so we know that the actual password is prefixed by DEFCON_PASSWORD=.
35:33 - And then we guess the current character one by one.
35:40 - If the guess is incorrect, we don’t get shown any, there will be no documents matching the text search.
35:49 - So this SQL query will not be executed. And then we compare this to the baseline request where we use a character that’s not part of the character set.
36:03 - So we know that it will never have any matching documents.
36:08 - And then we run this with the H2 time library and we get the response order for 15 request pairs.
36:21 - Basically, we determine how many negative results there are.
36:27 - Negative means that the order in which the response arrived was reversed.
36:36 - So if we get for the response of r2 before the response of r1, the order is reversed, because r1 is sent before r2 typically, or the request is sent before, but then the response is received in a different order.
37:01 - And then we determine the percentage of how many responses were received in reverse order.
37:13 - And then see if this is above or below a certain threshold.
37:19 - And in this case, the threshold was set to 80%.
37:27 - Okay? So I think now we are ready to actually run the attack, which is done using just PYTHON.
37:38 - (suspenseful techno music) And we will be guessing the passwords character by character.
37:46 - So on the top most line, we can see all the characters that have been found so far.
37:55 - So we are at T and 1. Now we’re also guessing, or correctly guessed the letter m.
38:08 - Everything here is being shown in real time, of course, to make the demo a bit shorter, not the entire character set is used, but we just use some random guesses, but as you can see, so far, it’s going pretty well.
38:33 - On the bottom line we show for each character that’s being guessed, the percentage of the responses that arrived in the reverse order and for the characters that were guessed incorrectly the percentage is somewhat close to 50% because there’s a 50⁄50% chance for every request that’s being sent that it will arrive before or after (techno music) the other request.
39:15 - So whether the response order is the same or different, the chance is around 50%.
39:22 - Of course, there’s still a bit of randomness.
39:24 - So that’s why it’s not always 50, but it’s sometimes 40 sometimes 60.
39:33 - And I believe that we’re already getting quite close to extracting the passwords.
39:42 - And yeah, there it is. So we’ve managed to in just under two minutes, find that the password is TimeLessTiming.
39:58 - Of course, in a real world scenario, it would take a bit longer because we would have to guess all the characters one by one, but still, I think it shows that the tech works very, very well.
40:18 - All right. And that brings us to the conclusion.
40:22 - So we find that these timeless timing attacks are not affected by network jitter at all, neither on the upstream, nor the downstream.
40:36 - This allows the attacker to perform these remote timing attacks with an accuracy that is similar to as if the attack was launched against a local system.
40:48 - And in order to do so, we need to ensure that both requests are, they arrive at the same time at the server, in order do so, the attacker can leverage multiplexing, or if that’s not available, it might still be available to leverage a transport protocol that enables encapsulation of the requests.
41:22 - We also find that all the protocols that meet these criteria may be susceptible to timeless timing attacks.
41:31 - And we already, in our research we created practical attacks against HTTP/2 and EAP-pwd for Wi-Fi authentication.
41:44 - Okay. So with that, I would like to thank you for listening to the presentation.
41:51 - If you have any additional questions, or remarks, or ideas, don’t hesitate to reach out to us on Twitter.
41:59 - And then finally, you can find this, H2 time library that we used in our research, but also in the demo, in the following link.
42:12 - And then on the right-hand side, you can find the sources, or the source code of the demonstration, so you can play around with it as well.
42:24 - Okay. Thank you. .