!!Con West 2020 - isis agora lovecruft: big number small computer
Mar 20, 2020 18:52 · 3449 words · 17 minute read
Hi, I’m isis agora lovecruft. My pronouns are they/them. Once upon a time, I was a theoretical physicist, specializing in quantum and string cosmologies. But one day, a fairy princess came and kissed me on the nose and the curse was lifted. (laughter) Or I was doubly cursed, because now I’m a cryptographer. There’s this thing that happens sometimes; when I tell this to cute people at parties, they get really excited. Their eyes open wide, and they open their phones to show me their insta profiles.
00:55 - I’m usually confused at this point, and they ask if I want to do a shoot with them. This is when I realize that they probably misheard me, and they probably think that I said that I was a photographer. At this point, my strategy is usually to enunciate very clearly and tell them, I’m not a photographer, I’m a cryptographer. I bend maths to my will to keep secrets. In all cases but one, however, this led said cute person to dismay, shuffling off, usually mumbling about tech workers. I mean, fair. My new strategy is to talk about my photography, which albeit clearly second rate, doesn’t horrify them as much as my math interests.
01:43 - But today I’m going to talk about the truly horrifying. Not only am I going to talk about cryptography, I’m going to talk about postquantum cryptography – that is, cryptography that can be executed on a classical computer, but that remains secure against hypotherical adversaries in the hypothetical future where we have hypothetical ideal quantum computers suitable for quantum computation. Not only am I talking about post-quantum cryptography, but I will also be talking about 8 bit computers. How many of you had one of these machines as a kid? I didn’t! By the time I realized I desperately wanted a Commodore 64, I was in kindergarten and the year was 1995. I begged my parents for one. My parents were both hackers, for background.
02:35 - Around that time, I had only written a bit of shell scripts on an ancient DOS machine they had let me use. My dad had to pull down an old VCR and hook it up to the DOS machine to store my code on VHS magnetic tape. He said it would last longer and store more. I had my own modem, which being a little kid and too young to realize I could desolder the modem’s speaker, I used it under a pillow late at night in order to sign into IRC. My parents argued which “real” language I should learn first. My dad argued for assembly.
03:12 - Get to know the machine, the lowest bare metal workings of the hardware, before working up to higher level stuff. My mom argued for Perl. I don’t know why. I looked up a bunch of samples of code in various languages. I had nothing else to go on. And of course, I chose LISP. My parents asked me about it and I told them I picked it because it was cute, because the trailing parentheses were curvy like girls. They were mortified. It was that year that I first asked for a Commodore 64. They said you don’t want one of those. We’ll get you a Pentium Pro. Way better. I said, ok sure, but I still want a Commodore 64! Since then, I’ve always asked for one. Every birthday and holiday.
04:05 - They kept saying: You don’t want that! It’s old. It’s junk. This continued in my 20s, in which I moved to the EU and stopped asking for anything because of the daunting prospect of having to carry more luggage across an ocean again, but a couple of years ago, I moved back home to San Francisco. I mentioned to a friend how badly I wanted a Commodore 64, all the art and chiptunes I wanted to make with it. Maybe even a game. My friend responded… isis, you can just get one. You’re an adult now. This was a revelation. After thinking it through, I decided my friend might be right. Surely I was an adult. So I did what adults do on a random Tuesday for no special reason at all.
04:55 - I opened up eBay and I bought a Commodore 64. I texted my mom on Signal as soon as I got it to tell her the good news. She responded in all caps WHAT?! This was very confusing to me. So I said… You know, I’ve always wanted one. I’ve been asking for one since I was five. And still in all caps, she responded… Sweetie, we thought you were joking! (laughter) I can’t even remember what the code I wrote last week does without documenting it extremely well.
05:43 - And my parents think that I can keep a joke running for 22 years! So the MOS6502 chipset, which is the chipset in the Commodore 64, was introduced in 1975 and sold for 1⁄6 of the price of the competing chips from Motorola and Intel, leading it to be credited along with the Zilog Z80 for sparking the home computing revolution of the early ‘80s. These chips were used in a large number of devices in the 80s, the ATARI2600, the Atari 8-bit Family, APPLE II, the Nintendo Entertainment System (including the Famicom Family Computer – I had no idea these existed), the Commodore 64, of course, the Commodore PET, the ATARI Lynx, the Ohio Scientific Challenger 4P, the BBC Micro, and even Tamagotchis! All using the 6502 or variations of the basic design. The Commodore 64 claims to be one of the most popular home computers of all time, selling over 19 million units worldwide. I’m not sure if that still remains the case, but we’ll let them have it. For the 6502 architecture, there are 151 8-bit official op codes.
07:02 - An op code is simply a number in machine language that a processor recognizes as being a request to do a specific operation. These 151 op codes are organized into 56 instructions, some with multiple addressing modes, which I’ll get to in a second. Instructions are then named with human memorable names called mnemonics, which is what we use to write assembly languages. The first so-called page of memory, which is not actually a page in the modern sense, but rather a contiguous chunk of 256 bytes, which is easier for the processor to successively address on these chips, is referred to as zero page. The chip which was actually used in the Commodore 64 is a 6510.
07:43 - The only difference being that the first two bytes in memory on the zero page of the chip allow what is called bank switching. Where you may memory map or unmap hard coded ROMs or allow addresses to pass through into RAM. The ROMs on the Commodore 64 include the kernel, some i/o functionality, a thing called a character ROM that stores characters and glyphs, and a BASIC interpreter, which is a text adventure that people seemed to enjoy in the ‘80s. One important thing to note: Since it’s an 8 bit processor, the registers are 8 bits. And you only get three of them. A, X, and Y. A is the accumulator. So when you tell it to do something like ADC #5, which stands for add with carry the literal number 5, it does an add of 5 to the contents of the accumulator.
08:38 - The other registers are referred to as index registers. They’re generally used for storing indexes, and can only do incredibly basic tasks like being incremented, decremented, or stored in memory. Now onto addressing modes. This is immediate addressing. The first instruction you can see is LDA. Which means load A or load into the accumulator. The literal, that’s what the hashtag stands for, the hex, which is what the $ stands for, C0, which is the number 192, and then we clear the carry flag, which lets us know if there’s been an overflow in the accumulator. Then we do add with carry the literal number 5.
09:26 - Now the accumulator contains the number 197. The next – this is a pretty straightforward mode of addressing – so now we can see the same instructions again, but used with absolute addressing. Where we used absolute addresses to point to a certain region in memory. We load 192 into the accumulator, store it at the region in memory, CFFF, we load the number 5, the literal 5, into the accumulator, clear the carry bit, and then we add the contents of the address in memory at CFFF into A and now A again contains 197. Next mode is a little bit more complicated.
10:11 - And this is where things start to cause problems. You can quickly write code that does surprising things. We load 0xFA into the accumulator, we store that at the location in memory CAAA, and then we load CF into the accumulator. What we’re doing here by loading these literals 0xCF and 0xFA is we’re putting an address at another address. But because the accumulator is 8 bits, it can only store 1 byte at a time, so we have to do the topmost bits first and the bottommost bits second.
10:48 - We’re storing the bottom-most bits CF at CAAB, the next byte in memory, obviously. One up from CAAA. And then we’re loading the number 5 into the accumulator, storing it at CFFA, loading 192 into A, storing it at CFFF, clearing the carry flag, and then we do the add at CFAAA, which gets dereferenced to CFFA, and you take the contents of CFFA, which is 5, or sorry, is 192, and we load it into the accumulator. A now again contains 197. And I promise this is the last one. But this is where things start to get really, really weird, and you can cause bugs really quickly, which is amazing. So again, we’re doing this whole setup of loading 0xCFFA into the spot at memory at CAAA. We’re loading 5 into the accumulator. We’re loading – we’re storing it at CFFA. We’re loading 192 in the accumulator. Same thing as before. We’re storing at CFFF. That’s the same. Clear the carry flag.
11:54 - But now we do indirect indexed addressing. We give it an address where it’s gonna add an index to it first, then look at the location in memory to find another location in memory, then take the contents of that memory and add it to the accumulator. A now contains 197. As you can guess, this is a really quick way to cause bugs. So indirect mode is one of the ways that it’s sort of fucky is… It has no carry flag associated with the jump instruction.
12:32 - So it’s really easy to transfer program execution to the wrong place in memory by using indirect index addressing mode and giving it the wrong index, because suddenly if you’re at a page boundary, say you’re at 30FF, it doesn’t actually wrap around to grab the next address at 3100. But it wraps back around to 3000. So suddenly you’ve transferred program execution to another place in memory which you didn’t intend, which is really hard to debug, and I will show an example of that. So we load 40 into A. The literal 40. And we store it at 3000, as I mentioned. Load 0x80 into A, store at 30FF, load 50 into A, and store at the location in memory 3100. So you would guess that if you jump to 30FF, it would grab the next byte at 3100 and you would transfer program execution to the address at 5080. That’s not the case because it wraps back around.
13:42 - You jump to 4080 and now you have a horrible bug. Now that we understand a little bit about the assembly, to talk a little bit more about the op codes… So because the op codes are 8 bit, there are obviously 256 available op codes. There are 105 which exist but are unofficial, otherwise often called the illegal op codes. Many of them curiously combine other arithmetic operations, such as OR’ing the value of a register, while shifting right or left by the value stored in another register.
14:22 - Obviously it’s important to use as many of the illegal op codes as possible. (laughter) It may even be illegal not to use as many of the illegal op codes as possible. Sadly, though, I have yet to find a place in my current code to use any of them, so I’m leaving this as an exercise to the reader. Patches are extremely welcome and will be rewarded with a lapel badge which reads “be gay, do crime codes”. Alright. So now that we’ve covered the 6502, or 6510 instruction set a little bit, I’m gonna talk a bit about the cryptography.
14:59 - So hypothetically, in the hypothetical future, we will hypothetically have hypothetical ideal quantum computers, which are hypothetically suitable for generalized computation and hypothetically have suitably efficient hypothetical quantum error correction codes, hypothetically. If this were to hypothetically happen it would ruin most of our current discrete log-based public key cryptography. Some like myself believe that efficient quantum error correction is likely impossible and even if it were possible, we’re more likely to see non-generalized quantum computers designed to break only certain classic cryptographic algorithms or even certain key pairs for a classic cryptographic algorithm. Even if the worst case scenario is a remote chance, even if it’s.001% likely, it’s nonetheless important to prepare for, as any unciphered communications could be deciphered by a quantum-capable adversary.
16:03 - As mentioned before, the subfield of cryptography which deals with algorithms which can be run on a classical computer but withstand attacks from a quantum computer is referred to as postquantum cryptography. The reason why quantum computers will be better than classical computers at certain classes of problems relates to the complexity of a function called the Fourier transformation. Basically it takes a set of combined waveforms and breaks it down into individual components. A real world example of this would be taking a recording of a group of people talking in a party and breaking it down into recordings of distinct individuals. Fourier transforms are phenomenally expensive on classical computers, reaching exponential complexity in the number of distinct input waveforms.
16:51 - But the quantum Fourier transform is linear. This allows us to compute certain classes of problems faster on a quantum computer than a classical one, such as computing discrete logarithms and factoring composites of large primes, mathematical problems which much of our public key cryptography relies upon today. So before delving into the postquantum variant of this, let’s review classical elliptic curve key exchanges. We set a publicly known point, G, a generator of a group of elements, that is, points on the elliptic curve. This group has order P, which just means that there are P number of elements in the group.
17:31 - Alice and Bob both two secret scalars, which are just integers modulo some prime number, as A and B in the set of integers modulo that large prime number. Alice computes her public key as big A equals little A times G and sends this to Bob. Similarly, Bob computes his public key as big B equals little b times G and sends it to Alice. Alice computes their shared secret as S=little a times big B, which is equal to little a times little b times G. Bob computes theirs. Little b times big A, which is equal to little b times little a times G, and thus they arrive at the same shared value, and only an adversary which can break discrete logarithms in real time, such as a hypothetical generalized quantum computer, is able to take those at the same time, derive the secrets, and arrive at the shared value.
18:33 - So, without going into much details of the mathematics, but please feel free to talk to me later if you’d like to know more, to make this problem secure against adversaries with quantum computers, we need to rely on different mathematical problems. In the case of supersingular isogeny key exchanges, which – that’s a lot of big words, I’m going to kind of gloss over them! – rather than using points on elliptic curves as public keys, we use the elliptic curves themselves as public keys, by computing what are known as isogenes, one to one mappings where each point on a curve uniquely corresponds to a point on another curve and vice versa. So back to 8 bit. The catch phrase for the Commodore 64 was I adore my 64. People genuinely love these machines. I do too. If we’re gonna continue using them, it’s vital that we secure them against attacks from quantum computers today! (laughter) So one of the best known sets of parameters for implementing supersingular isogeny key exchanges requires implementing a specific field, a set of numbers that have addition (and hence subtraction), and multiplication (and hence division) defined. In this case, the field is integers modulo this 484-bit prime. Commodore 64s have 8 bit registers. Modulo arithmetic, 484 bit prime.
20:09 - Typically, in cryptography, we assume things about the chips we’re working on. Not only that they have a multiplication instruction but one that operates in constant time. The term constant time is slightly misleading. We don’t mean always runs in the same amount of time. But rather we mean: doesn’t change the amount of runtime with respect to secret data.
20:34 - Algorithms which are not constant time, also known as variable time algorithms, have numerous known attacks for recovering secret values, while running such algorithms, such as listening to the frequency of a whine that the CPU makes, watching latency over a network or through a coprocess, or seeing if some junk data has been evicted from a CPU cache. Not only do we assume we have a mole instruction, but we generally assume very specific things about the way its underlying algorithm works. This is where I should point out that the 6502 or 6510 architecture in Commodore 64s not only has no constant time instruction for multiplication, it has no multiply instruction at all. None. So if you want to multiply numbers, you have to implement it using bitwise shifts and addition. And manually handle the carry flag. To my knowledge, no one has ever attempted constant time cryptography on such an architecture.
21:34 - So I sat down to a blank emacs buffer, staring for about 20 minutes and like every normal completely rational developer does, I decided to see if anyone else had ever tried solving my problem before. I did this, knowing full well that I was doing something that explicitly had never been done before. And I opened up a new tab and went to Google and typed in the query: Big number, small computer. This went about as well as you can expect. So I sat down again and began to implement from scratch 440 8 bit integers of 56 words each modulo 484 bit prime on an 8 bit system. Like a completely normal rational developer does. So… At this point, I have bad news. I brought my Commodore 64 today. I intended to do a demo of the code. But at this point, the code is not quite yet finished. I’ve been working on it for two or three weeks now. The code is on GitHub. I do have the finite field arithmetic done and the Montgomery arithmetic. Largely done. What’s left is to implement the two isogenes and three isogenes, which are comparatively high level and sound like big words, but they’re actually easier than implementing the field, which involves implementing multiplication, which is not fun.
23:11 - Honestly, I was hoping to demo it live, as I mentioned. Because by my estimations of the cycle count from what I have done so far, I believe Alice’s key exchange should take a brief 40 minutes. And Bob’s side of the key exchange should take roughly an hour. Just in time for the end of this talk. That’s all. I would like to thank !!Con for inviting me. And thanks to the organizers for their hard work.
23:43 - And thank you to all the other wonderful speakers whose talks I am looking forward to today. And solidarity with the graduate students on strike. (applause) .