Lecture 7: Debugging and Profiling (2020)

Feb 2, 2020 03:41 · 8248 words · 39 minute read vim back static analyzer two

So welcome back. Today we are gonna cover debugging and profiling. Before I get into it we’re gonna make another reminder to fill in the survey. Just one of the main things we want to get from you is questions, because the last day is gonna be questions from you guys: about things that we haven’t covered, or like you want us to kind of talk more in depth. The more questions we get, the more interesting we can make that section, so please go on and fill in the survey. So today’s lecture is gonna be a lot of topics. All the topics revolve around the concept of what do you do when you have a program that has some bugs.

00:39 - Which is most of the time, like when you are programming, you’re kind of thinking about how you implement something and there’s like a half life of fixing all the issues that that program has. And even if your program behaves like you want, it might be that it’s really slow, or it’s taking a lot of resources in the process. So today we’re gonna see a lot of different approaches of dealing with these problems. So first, the first section is on debugging. Debugging can be done in many different ways, there are all kinds of…

01:10 - The most simple approach that, pretty much, all CS students will go through, will be just: you have some code, and it’s not behaving like you want, so you probe the code by adding print statements. This is called “printf debugging” and it works pretty well. Like, I have to be honest, I use it a lot of the time because of how simple to set up and how quick the feedback can be. One of the issues with printf debugging is that you can get a lot of output and maybe you don’t want to get as much output as you’re getting. There has… people have thought of slightly more complex ways of doing printf debugging and one of these ways is what is usually referred to as “logging”.

01:58 - So the advantage of doing logging versus doing printf debugging is that, when you’re creating logs, you’re not necessarily creating the logs because there’s a specific issue you want to fix; it’s mostly because you have built a more complex software system and you want to log when some events happen. One of the core advantages of using a logging library is that you can can define severity levels, and you can filter based on those. Let’s see an example of how we can do something like that. Yeah, everything fits here. This is a really silly example: We’re just gonna sample random numbers and, depending on the value of the number, that we can interpret as a kind of “how wrong things are going”. We’re going to log the value of the number and then we can see what is going on. I need to disable these formatters…

02:59 - And if we were just to execute the code as it is, we just get the output and we just keep getting more and more output. But you have to kind of stare at it and make sense of what is going on, and we don’t know what is the relative timing between printfs, we don’t really know whether this is just an information message or a message of whether something went wrong. If we just go in, and undo, not that one… That one, we can set that formatter. Now the output looks something more like this So for example, if you have several different modules that you are programming with, you can identify them with like different levels. Here, we have, we have debug levels, we have critical info, different levels.

And it might be handy because here we might only care about the error messages. Like those are like, the… We have been working on our code, so far so good, and suddenly we get some error. We can log that to identify where it’s happening. But maybe there’s a lot of information messages, but we can deal with that by just changing the level to error level. And now if we were to run this again, we are only going to get those errors in the output, and we can just look through those to make sense of what is going on.

04:28 - Another really useful tool when you’re dealing with logs is As you kind of look at this, it has become easier because now we have this critical and error levels that we can quickly identify. But since humans are fairly visual creatures, one thing that you can do is use colors from your terminal to identify these things. So now, changing the formatter, what I’ve done is slightly change how the output is formatted. When I do that, now whenever I get a warning message, it’s color coded by yellow; whenever I get like an error, faded red; and when it’s critical, I have a bold red indicating something went wrong. And here it’s a really short output, but when you start having thousands and thousands of lines of log, which is not unrealistic and happens every single day in a lot of apps, quickly browsing through them and identifying where the error or the red patches are can be really useful.

05:35 - A quick aside is, you might be curious about how the terminal is displaying these colors. At the end of the day, the terminal is only outputting characters. Like, how is this program or how are other programs, like LS, that has all these fancy colors. How are they telling the terminal that it should use these different colors? This is nothing extremely fancy, what these tools are doing, is something along these lines. Here we have… I can clear the rest of the output, so we can focus on this.

06:08 - There’s some special characters, some escape characters here, then we have some text and then we have some other special characters. And if we execute this line we get a red “This is red”. And you might have picked up on the fact that we have a “255;0;0” here, this is just telling the RGB values of the color we want in the terminal. And you pretty much can do this in any piece of code that you have, and like that you can color code the output. Your terminal is fairly fancy and supports a lot of different colors in the output.

06:42 - This is not even all of them, this is like a sixteenth of them. I think it can be fairly useful to know about that. Another thing is maybe you don’t enjoy or you don’t think logs are really fit for you. The thing is a lot of other systems that you might start using will use logs. As you start building larger and larger systems, you might rely on other dependencies. Common dependencies might be web servers or databases, it’s a really common one. And those will be logging their errors or exceptions in their own logs. Of course, you will get some client-side error, but those sometimes are not informative enough for you to figure out what is going on. In most UNIX systems, the logs are usually placed under a folder called “/var/log” and if we list it, we can see there’s a bunch of logs in here. So we have like the shutdown monitor log, or some weekly logs. Things related to the Wi-Fi, for example.

And if we output the 07:57 - System log, which contains a lot of information about the system, we can get information about what’s going on. Similarly, there are tools that will let you more sanely go through this output. But here, looking at the system log, I can look at this, and say: oh there’s some service that is exiting with some abnormal code and based on that information, I can go and try to figure out what’s going on, like what’s going wrong. One thing to know when you’re working with logs is that more traditionally, every software had their own log, but it has been increasingly more popular to have a unified system log where everything is placed. Pretty much any application can log into the system log, but instead of being in a plain text format, it will be compressed in some special format.

08:52 - An example of this, it was what we covered in the data wrangling lecture. In the data wrangling lecture we were using the “journalctl”, which is accessing the log and outputting all that output. Here in Mac, now the command is “log show”, which will display a lot of information. I’m gonna just display the last ten seconds, because logs are really, really verbose and just displaying the last 10 seconds is still gonna output a fairly large amount of lines. So if we go back through what’s going on, we here see that a lot of Apple things are going on, since this is a macbook.

09:33 - Maybe we could find errors about like some system issue here. Again they’re fairly verbose, so you might want to practice your data wrangling techniques here, like 10 seconds equal to like 500 lines of logs, so you can kind of get an idea of how many lines per second you’re getting. They’re not only useful for figuring out some other programs’ output, they’re also useful for you, if you want to log there instead of into your own file. So using the “logger” command, in both linux and mac, You can say okay I’m gonna log this “Hello Logs” into this system log. We execute the command and then we can check by going through the last minute of logs, since it’s gonna be fairly recent, and grepping for that “Hello” we find our entry.

Fairly recent entry, that we just created that said “Hello Logs”. As you become more and more familiar with these tools, you will find yourself using the logs more and more often, since even if you have some bug that you haven’t detected, and the program has been running for a while, maybe the information is already in the log and can tell you enough to figure out what is going on. However, printf debugging is not everything. So now I’m going to be covering debuggers. But first any questions on logs so far? So what kind of things can you figure out from the logs? like this Hello Logs says that you did something with Hello at that time? Yeah, like say, for example, I can write a bash script that detects… Well, that checks every time what Wi-Fi network I’m connected to.

11:29 - And every time it detects that it has changed, it makes an entry in the logs and says Oh now it looks like we have changed Wi-Fi networks. and then you might go back and parse through the logs and take like, okay When did my computer change from one Wi-Fi network to another. And this is just kind of a simple example But there are many, many ways, many types of information that you could be logging here. More commonly, you will probably want to check if your computer, for example, is entering sleep, for example, for some unknown reason. Like it’s on hibernation mode. There’s probably some information in the logs about who asked that to happen, or why it’s that happening. Any other questions? Okay.

12:14 - So when printf debugging is not enough, the best alternative after that is using… [Exit that] So, it’s using a debugger. So a debugger is a tool that will wrap around your code and will let you run your code, but it will kind of keep control over it. So it will let you step through the code and execute it and set breakpoints. You probably have seen debuggers in some way, if you have ever used something like an IDE, because IDEs have this kind of fancy: set a breakpoint here, execute,…

12:56 - But at the end of the day what these tools are using is just these command line debuggers and they’re just presenting them in a really fancy format. Here we have a completely broken bubble sort, a simple sorting algorithm. Don’t worry about the details, but we just want to sort this array that we have here. We can try doing that by just doing Python bubble.py And when we do that… Oh there’s some index error, list index out of range. We could start adding prints but if have a really long string, we can get a lot of information. So how about we go up to the moment that we crashed? We can go to that moment and examine what the current state of the program was. So for doing that I’m gonna run the program using the Python debugger. Here I’m using technically the ipython debugger, just because it has nice coloring syntax so it’s probably easier for both of us to understand what’s going on in the output. But they’re pretty much identical anyway. So we execute this, and now we are given a prompt where we’re being told that we are here, at the very first line of our program. And we can…

14:15 - “L” stands for “List”, so as with many of these tools there’s kind of like a language of operations that you can do, and they are often mnemonic, as it was the case with VIM or TMUX. So here, “L” is for “Listing” the code, and we can see the entire code. “S” is for “Step” and will let us kind of one line at a time, go through the execution. The thing is we’re only triggering the error some time later. So we can restart the program and instead of trying to step until we get to the issue, we can just ask for the program to continue which is the “C” command and hey, we reached the issue.

15:04 - We got to this line where everything crashed, we’re getting this list index out of range. And now that we are here we can say, huh? Okay, first, let’s print the value of the array. This is the value of the current array So we have six items. Okay. What is the value of “J” here? So we look at the value of “J”. “J” is 5 here, which will be the last element, but “J” plus 1 is going to be 6, so that’s triggering the out of bounds error. So what we have to do is this “N”, instead of “N” has to be “N minus one”. We have identified that the error lies there. So we can quit, which is “Q”. Again, because it’s a post-mortem debugger. We go back to the code and say okay, we need to append this “N minus one”. That will prevent the list index out of range and if we run this again without the debugger, okay, no errors now. But this is not our sorted list. This is sorted, but it’s not our list. We are missing entries from our list, so there is some behavioral issue that we’re reaching here.

16:27 - Again, we could start using printf debugging but kind of a hunch now is that probably the way we’re swapping entries in the bubble sort program is wrong. We can use the debugger for this. We can go through them to the moment we’re doing a swap and check how the swap is being performed. So a quick overview, we have two for loops and in the most nested loop, we are checking if the array is larger than the other array. The thing is if we just try to execute until this line, it’s only going to trigger whenever we make a swap. So what we can do is we can set a breakpoint in the sixth line.

17:11 - We can create a breakpoint in this line and then the program will execute and the moment we try to swap variables is when the program is going to stop. So we create a breakpoint there and then we continue the execution of the program. The program halts and says hey, I have executed and I have reached this line. Now I can use “locals()”, which is a Python function that returns a dictionary with all the values to quickly see the entire context. The string, the array is fine and is six, again, just the beginning and I step, go to the next line.

17:51 - Oh, and I identify the issue: I’m swapping one item at a time, instead of simultaneously, so that’s what’s triggering the fact that we’re losing variables as we go through. That’s kind of a very simple example, but debuggers are really powerful. Most programming languages will give you some sort of debugger, and when you go to more low level debugging you might run into tools like… You might want to use something like GDB. And GDB has one nice property: GDB works really well with C/C++ and all these C-like languages.

18:37 - But GDB actually lets you work with pretty much any binary that you can execute. So for example here we have sleep, which is just a program that’s going to sleep for 20 seconds. It’s loaded and then we can do run, and then we can interrupt this sending an interrupt signal. And GDB is displaying for us, here, very low-level information about what’s going on in the program. So we’re getting the stack trace, we’re seeing we are in this nanosleep function, we can see the values of all the hardware registers in your machine.

So 19:12 - you can get a lot of low-level detail using these tools. I think that’s all I want to cover for debuggers. Any questions related to that? Another interesting tool when you’re trying to debug is that sometimes you want to debug as if your program is a black box. So you, maybe, know what the internals of the program but at the same time your computer knows whenever your program is trying to do some operations. So this is in UNIX systems, there’s this notion of like user level code and kernel level code.

19:58 - And when you try to do some operations like reading a file or like reading the network connection you will have to do something called system calls. You can get a program and go through those operations and ask what operations did this software do? So for example, if you have like a Python function that is only supposed to do a mathematical operation and you run it through this program, and it’s actually reading files, Why is it reading files? It shouldn’t be reading files. So, let’s see. This is “strace”. So for example, we can do it something like this. So here we’re gonna run the “LS - L” And then we’re ignoring the output of LS, but we are not ignoring the output of STRACE. So if we execute that… We’re gonna get a lot of output. This is all the different system calls That this LS has executed. You will see a bunch of OPEN, you will see FSTAT.

21:08 - And for example, since it has to list all the properties of the files that are in this folder, we can check for the LSTAT call. So the LSTAT call will check for the properties of the files and we can see that, effectively, all the files and folders that are in this directory have been accessed through a system call, through LS. Interestingly, sometimes you actually don’t need to run your code to figure out that there is something wrong with your code. So far we have seen enough ways of identifying issues by running the code, but what if you… you can look at a piece of code like this, like the one I have shown right now in this screen, and identify an issue.

22:00 - So for example here, we have some really silly piece of code. It defines a function, prints a few variables, multiplies some variables, it sleeps for a while and then we try to print BAZ. And you could try to look at this and say, hey, BAZ has never been defined anywhere. This is a new variable. You probably meant to say BAR but you just mistyped it. Thing is, if we try to run this program, it’s gonna take 60 seconds, because like we have to wait until this time.sleep function finishes.

Here, sleep is just for 22:37 - motivating the example but in general you may be loading a data set that takes really long because you have to copy everything into memory. And the thing is, there are programs that will take source code as input, will process it and will say, oh probably this is wrong about this piece of code. So in Python, or in general, these are called static analysis tools. In Python we have for example pyflakes. If we get this piece of code and run it through pyflakes, pyflakes is gonna give us a couple of issues. First one is the one…. The second one is the one we identified: here’s an undefined name called BAZ.

23:15 - You probably should be doing something about that. And the other one is like oh, you’re redefining the the FOO variable name in that line. So here we have a FOO function and then we are kind of shadowing that function by using a loop variable here. So now that FOO function that we defined is not accessible anymore and then if we try to call it afterwards, we will get into errors. There are other types of Static Analysis tools. MYPY is a different one.

MYPY is gonna report the same two errors, but it’s also 23:53 - going to complain about type checking. So it’s gonna say, oh here you’re multiplying an int by a float and if you care about the type checking of your code, you should not be mixing those up. it can be kind of inconvenient, having to run this, look at the line, going back to your VIM or like your editor, and figuring out what the error matches to. There are already solutions for that. One way is that you can integrate most editors with these tools and here.. You can see there is like some red highlighting on the bash, and it will read the last line here. So, undefined named ‘baz’.

24:36 - So as I’m editing this piece of Python code, my editor is gonna give me feedback about what’s going wrong with this. Or like here have another one saying the redefinition of unused foo. And even, there are some stylistic complaints. So, oh, I will expect two empty lines. So like in Python, you should be having two empty lines between a function definition. There are… there is a resource on the lecture notes about pretty much static analyzers for a lot of different programming languages. There are even static analyzers for English. So I have my notes for the class here, and if I run it through this static analyzer for English, that is “writegood”. It’s going to complain about some stylistic properties. So like, oh, I’m using “very”, which is a weasel word and I shouldn’t be using it. Or “quickly” can weaken meaning, and you can have this for spell checking, or for a lot of different types of stylistic analysis.

25:48 - Any questions so far? Oh, I forgot to mention… Depending on the task that you’re performing, there will be different types of debuggers. For example, if you’re doing web development, both Firefox and Chrome have a really really good set of tools for doing debugging for websites. So here we go and say inspect element, we can get the… do you know? how to make this larger… We’re getting the entire source code for the web page for the class. Oh, yeah, here we go.

26:38 - Is that better? And we can actually go and change properties about the course. So we can say… we can edit the title. Say, this is not a class on debugging and profiling. And now the code for the website has changed. This is one of the reasons why you should never trust any screenshots of websites, because they can be completely modified. And you can also modify this style. Like, here I have things using the the dark mode preference, but we can alter that.

27:11 - Because at the end of the day, the browser is rendering this for us. We can check the cookies, but there’s like a lot of different operations. There’s also a built-in debugger for JavaScript, so you can step through JavaScript code. So kind of the takeaway is, depending on what you are doing, you will probably want to search for what tools programmers have built for them. Now I’m gonna switch gears and stop talking about debugging, which is kind of finding issues with the code, right? kind of more about the behavior, and then start talking about like how you can use profiling.

27:56 - And profiling is how to optimize the code. It might be because you want to optimize the CPU, the memory, the network,… There are many different reasons that you want to be optimizing it. As it was the case with debugging, the kind of first-order approach that a lot of people have experience with already is oh, let’s use just printf profiling, so to say, like we can just take… Let me make this larger. We can take the current time here, then we can check, we can do some execution and then we can take the time again and subtract it from the original time.

28:37 - And by doing this you can kind of narrow down and fence some different parts of your code and try to figure out what is the time taken between those two parts. And that’s good. But sometimes it can be interesting, the results. So here, we’re sleeping for 0.5 seconds and the output is saying, oh it’s 0.5 plus some extra time, which is kind of interesting. And if we keep running it, we see there’s like some small error and the thing is here, what we’re actually measuring is what is usually referred to as the “real time”. Real time is as if you get like a clock, and you start it when your program starts, and you stop it when your program ends. But the thing is, in your computer it is not only your program that is running.

29:23 - There are many other programs running at the same time and those might be the ones that are taking the CPU. So, to try to make sense of that, A lot of… you’ll see a lot of programs using the terminology that is real time, user time and system time. Real time is what I explained, which is kind of the entire length of time from start to finish. Then there is the user time, which is the amount of time your program spent on the CPU doing user level cycles. So as I was mentioning, in UNIX, you can be running user level code or kernel level code. System is kind of the opposite, it’s the amount of CPU, like the amount of time that your program spent on the CPU executing kernel mode instructions. So let’s show this with an example. Here I’m going to “time”, which is a command, a shell command that’s gonna get these three metrics for the following command, and then I’m just grabbing a URL from a website that is hosted in Spain. So that’s gonna take some extra time to go over there and then go back. If we see, here, if we were to just… We have two prints, between the beginning and the end of the program.

30:43 - We could think that this program is taking like 600 milliseconds to execute, but actually most of that time was spent just waiting for the response on the other side of the network and we actually only spent 16 milliseconds at the user level and like 9 seconds, in total 25 milliseconds, actually executing CURL code. Everything else was just waiting. Any questions related to timing? Ok, so timing can be can become tricky, it’s also kind of a black box solution. Or if you start adding print statements, it’s kind of hard to add print statements, with time everywhere. So programmers have figured out better tools. These are usually referred to as “profilers”. One quick note that I’m gonna make, is that profilers, like usually when people refer to profilers they usually talk about CPU profilers because they are the most common, at identifying where like time is being spent on the CPU.

31:56 - Profilers usually come in kind of two flavors: there’s tracing profilers and sampling profilers. and it’s kind of good to know the difference because the output might be different. Tracing profilers kind of instrument your code. So they kind of execute with your code and every time your code enters a function call, they kind of take a note of it. It’s like, oh we’re entering this function call at this moment in time and they keep going and, once they finish, they can report oh, you spent this much time executing in this function and this much time in this other function.

So on, so forth, which is the example that we’re gonna see now. Another type of tools are tracing, sorry, sampling profilers. The issue with tracing profilers is they add a lot of overhead. Like you might be running your code and having these kind of profiling next to you making all these counts, will hinder the performance of your program, so you might get counts that are slightly off. A sampling profiler, what it’s gonna do is gonna execute your program and every 100 milliseconds, 10 milliseconds, like some defined period, it’s gonna stop your program.

It’s gonna halt it, 33:05 - it’s gonna look at the stack trace and say, oh, you’re right now in this point in the hierarchy, and identify which function is gonna be executing at that point. The idea is that as long as you execute this for long enough, you’re gonna get enough statistics to know where most of the time is being spent. So, let’s see an example of a tracing profiling. So here we have a piece of code that is just like a really simple re-implementation of grep done in Python. What we want to check is what is the bottleneck of this program? Like we’re just opening a bunch of files, trying to match this pattern, and then printing whenever we find a match.

33:49 - And maybe it’s the regex, maybe it’s the print… We don’t really know. So to do this in Python, we have the “cProfile”. And here I’m just calling this module and saying I want to sort this by the total amount of time, that we’re gonna see briefly. I’m calling the program we just saw in the editor. I’m gonna execute this a thousand times and then I want to match (the grep Arguments here) is I want to match these regex to all the Python files in here. And this is gonna output some… This is gonna produce some output, then we’re gonna look at it.

First, 34:34 - is all the output from the greps, but at the very end, we’re getting output from the profiler itself. If we go up we can see that, hey, by sorting we can see that the total number of calls. So we did 8000 calls, because we executed this 1000 times and this is the total amount of time we spent in this function (cumulative time). And here we can start to identify where the bottleneck is. So here, this built-in method IO open, is saying that we’re spending a lot of the time just waiting for reading from the disk or…

35:14 - There, we can check, hey, a lot of time is also being spent trying to match the regex. Which is something that you will expect. One of the caveats of using this tracing profiler is that, as you can see, here we’re seeing our function but we’re also seeing a lot of functions that correspond to built-ins. So like, functions that are third party functions from the libraries. And as you start building more and more complex code, This is gonna be much harder. So here is another piece of Python code that, don’t read through it, what it’s doing is just grabbing the course website and then it’s printing all the…

35:58 - It’s parsing it, and then it’s printing all the hyperlinks that it has found. So there are like these two operations: going there, grabbing a website, and then parsing it, printing the links. And we might want to get a sense of how those two operations compare to each other. If we just try to execute the cProfiler here and we’re gonna do the same, this is not gonna print anything. I’m using a tool we haven’t seen so far, but I think it’s pretty nice.

36:25 - It’s “TAC”, which is the opposite of “CAT”, and it is going to reverse the output so I don’t have to go up and look. So we do this and… Hey, we get some interesting output. we’re spending a bunch of time in this built-in method socket_getaddr_info and like in _imp_create_dynamic and method_connect and posix_stat… nothing in my code is directly calling these functions so I don’t really know what is the split between the operation of making a web request and parsing the output of that web request. So, for that, we can use a different type of profiler which is a line profiler. And the line profiler is just going to present the same results but in a more human-readable way, which is just, for this line of code, this is the amount of time things took.

37:24 - So it knows it has to do that, we have to add a decorator to the Python function, we do that. And as we do that, we now get slightly cropped output, but the main idea, we can look at the percentage of time and we can see that making this request, get operation, took 88% of the time, whereas parsing the response took only 10.9% of the time. This can be really informative and a lot of different programming languages will support this type of a line profiling. Sometimes, you might not care about CPU. Maybe you care about the memory or like some other resource. Similarly, there are memory profilers: in Python there is “memory_profiler”, for C you will have “Valgrind”.

So here is a fairly simple example, 38:21 - we just create this list with a million elements. That’s going to consume like megabytes of space and we do the same, creating another one with 20 million elements. To check, what was the memory allocation? How it’s gonna happen, what’s the consumption? We can go through one memory profiler and we execute it, and it’s telling us the total memory usage and the increments. And we can see that we have some overhead, because this is an interpreted language and when we create this million, this list with a million entries, we’re gonna need this many megabytes of information. Then we were getting another 150 megabytes.

Then, we’re freeing this entry and that’s decreasing the total amount. We are not getting a negative increment because of a bug, probably in the profiler. But if you know that your program is taking a huge amount of memory and you don’t know why, maybe because you’re copying objects where you should be doing things in place, then using a memory profiler can be really useful. And in fact there’s an exercise that will kind of work you through that, comparing an in-place version of quicksort with like a non-inplace, that keeps making new and new copies. And if you using the memory profiler you can get a really good comparison between the two of them Any questions so far, with profiling? Is the memory profiler running the program in order to get that? Yeah…

you might be able to figure out like just looking at the code. But as you get more and more complex (for this code at least) But you get more and more complex programs what this is doing is running through the program and for every line, at the very beginning, it’s looking at the heap and saying “What are the objects that I have allocated now?” “I have seven megabytes of objects”, and then goes to the next line, looks again, “Oh now I have 50, so I have now added 43 there”. Again, you could do this yourself by asking for those operations in your code, every single line. But that’s not how you should be doing things since people have already written these tools for you to use. As it was the case with… So as in the case with strace, you can do something similar in profiling.

40:58 - You might not care about the specific lines of code that you have, but maybe you want to check for outside events. Like, you maybe want to check how many CPU cycles your computer program is using, or how many page faults it’s creating. Maybe you have like bad cache locality and that’s being manifested somehow. So for that, there is the “perf” command. The perf command is gonna do this, where it is gonna run your program and it’s gonna keep track of all these statistics and report them back to you. And this can be really helpful if you are working at a lower level.

So 41:37 - we execute this command, I’m gonna explain briefly what it’s doing. And this stress program is just running in the CPU, and it’s just a program to just hog one CPU and like test that you can hog the CPU. And now if we Ctrl-C, we can go back and we get some information about the number of page faults that we have or the number of CPU cycles that we utilize, and other useful metrics from our code. For some programs you can look at what the functions that were being used were. So we can record what this program is doing, which we don’t know about because it’s a program someone else has written.

42:35 - And we can report what it was doing by looking at the stack trace and we can say oh, It’s spending a bunch of time in this __random_r standard library function. And it’s mainly because the way of hogging a CPU is by just creating more and more pseudo-random numbers. There are some other functions that have not been mapped, because they belong to the program, but if you know about your program you can display this information using more flags, about perf. There are really good tutorials online about how to use this tool. Oh One one more thing regarding profilers is, so far, we have seen that these profilers are really good at aggregating all this information and giving you a lot of these numbers so you can optimize your code or you can reason about what is happening, but the thing is humans are not really good at making sense of lots of numbers and since humans are more visual creatures, it’s much easier to kind of have some sort of visualization.

43:42 - Again, programmers have already thought about this and have come up with solutions. A couple of popular ones, is a FlameGraph. A FlameGraph is a sampling profiler. So this is just running your code and taking samples And then on the y-axis here we have the depth of the stack so we know that the bash function called this other function, and this called this other function, so on, so forth. And on the x-axis it’s not time, it’s not the timestamps. Like it’s not this function run before, but it’s just time taken.

Because, again, this is a sampling profiler: 44:23 - we’re just getting small glimpses of what was it going on in the program. But we know that, for example, this main program took the most time because the x-axis is proportional to that. They are interactive and they can be really useful to identify the hot spots in your program. Another way of displaying information, and there is also an exercise on how to do this, is using a call graph. So a call graph is going to be displaying information, and it’s gonna create a graph of which function called which other function.

44:58 - And then you get information about, like, oh, we know that “main” called this “Person” function ten times and it took this much time. And as you have larger and larger programs, looking at one of these call graphs can be useful to identify what piece of your code is calling this really expensive IO operation, for example. With that I’m gonna cover the last part of the lecture, which is that sometimes, you might not even know what exact resource is constrained in your program. Like how do I know how much CPU my program is using, and I can quickly look in there, or how much memory. So there are a bunch of really nifty tools for doing that one of them is HTOP.

so HTOP is an 45:54 - interactive command-line tool and here it’s displaying all the CPUs this machine has, which is 12. It’s displaying the amount of memory, it says I’m consuming almost a gigabyte of the 32 gigabytes my machine has. And then I’m getting all the different processes. So for example we have zsh, mysql and other processes that are running in this machine, and I can sort through the amount of CPU they’re consuming or through the priority they’re running at. We can check this, for example. Here we have the stress command again and we’re going to run it to take over four CPUs and check that we can see that in HTOP.

46:37 - So we did spot those four CPU jobs, and now I have seen that besides the ones we had before, now I have this… Like this “stress -c” command running and taking a bunch of our CPU. Even though you could use a profiler to get similar information to this, the way HTOP displays this kind of in a live interactive fashion can be much quicker and much easier to parse. In the notes, there’s a really long list of different tools for evaluating different parts of your system. So that might be tools for analyzing the network performance, about looking the number of IO operations, so you know whether you’re saturating the the reads from your disks, you can also look at what is the space usage. Which, I think, here… So NCDU…

There’s a tool called “du” which stands for “disk usage” and 47:45 - we have the “-h” flag for “human readable output”. We can do videos and we can get output about the size of all the files in this folder. Yeah, there we go. There are also interactive versions, like HTOP was an interactive version. So NCDU is an interactive version that will let me navigate through the folders and I can see quickly that oh, we have… This is one of the folders for the video lectures, and we can see there are these four files that have like almost 9 GB each and I could quickly delete them through this interface.

48:37 - Another neat tool is “LSOF” which stands for “LIST OF OPEN FILES”. Another pattern that you may encounter is you know some process is using a file, but you don’t know exactly which process is using that file. Or, similarly, some process is listening in a port, but again, how do you find out which one it is? So to set an example. We just run a Python HTTP server on port 444 Running there. Maybe we don’t know that that’s running, but then we can use… we can use LSOF.

49:22 - Yeah, we can use LSOF, and the thing is LSOF is gonna print a lot of information. You need SUDO permissions because this is gonna ask for who has all these items. Since we only care about the one who is listening in this 444 port we can ask grep for that. And we can see, oh, there’s like this Python process, with this identifier, that is using the port and then we can kill it, and that terminates that process. Again, there’s a lot of different tools. There’s even tools for doing what is called benchmarking.

50:11 - So in the shell tools and scripting lecture, I said like for some tasks “fd” is much faster than “find” But like how will you check that? I can test that with “hyperfine” and I have here two commands: one with “fd” that is just searching for JPEG files and the same one with “find”. If I execute them, it’s gonna benchmark these scripts and give me some output about how much faster “fd” is compared to “find”. So I think that kind of concludes… yeah, like 23 times for this task. So that kind of concludes the whole overview. I know that there’s like a lot of different topics and there’s like a lot of perspectives on doing these things, but again I want to reinforce the idea that you don’t need to be a master of all these topics but more… To be aware that all these things exist. So if you run into these issues you don’t reinvent the wheel, and you reuse all that other programmers have done.

51:18 - Given that, I’m happy to take any questions related to this last section or anything in the lecture. Is there any way to sort of think about how long a program should take? You know, if it’s taking a while to run you know, should you be worried? Or depending on your process, let me wait another ten minutes before I start looking at why it’s taking so long. Okay, so the… The task of knowing how long a program should run is pretty infeasible to figure out. It will depend on the type of program. It depends on whether you’re making HTTP requests or you’re reading data… one thing that you can do is if you have for example, if you know you have to read two gigabytes from memory, like from disk, and load that into memory, you can make back-of-the-envelope calculation.

So like that shouldn’t take longer than like X seconds because this is 52:16 - how things are set up. Or if you are reading some files from the network and you know kind of what the network link is and they are taking say five times longer than what you would expect then you could try to do that. Otherwise, if you don’t really know. Say you’re trying to do some mathematical operation in your code and you’re not really sure about how long that will take you can use something like logging and try to kind of print intermediate stages to get a sense of like, oh I need to do a thousand operations of this and three iterations took ten seconds. Then this is gonna take much longer than I can handle in my case. So I think there are there are ways, it will again like depend on the task, but definitely, given all the tools we’ve seen really, we probably have like a couple of really good ways to start tackling that.

53:14 - Any other questions? You can also do things like run HTOP and see if anything is running. Like if your CPU is at 0%, something is probably wrong. Okay. There’s a lot of exercises for all the topics that we have covered in today’s class, so feel free to do the ones that are more interesting. We’re gonna be holding office hours again today. Just a reminder, office hours. You can come and ask questions about any lecture. Like we’re not gonna expect you to kind of do the exercises in a couple of minutes.

53:53 - They take a really long while to get through them, but we’re gonna be there to answer any questions from previous classes, or even not related to exercises. Like if you want to know more about how you would use TMUX in a way to kind of quickly switch between panes, anything that comes to your mind. .