Transient Error Handling with Polly Part 1

Dec 3, 2020 19:16 · 2239 words · 11 minute read catch 03 living json measuring

On today’s Visual Studio Toolbox, Carl is going to show us a tool called Polly, which gives us the ability to handle transient errors caused by service outages. >> Yeah. [MUSIC] >> Hi, welcome to Visual Studio Toolbox. I’m your host, Robert Green. On today’s show, we’re going to address the problem of services in the Cloud that can go down, and what can you do to make your applications more resilient. To do that, I asked my good friend, Mr. Carl Franklin, to join us to talk about a tool called Polly.

00:35 - Hey Carl, how are you? >> Hey, how’re you doing? Welcome to the Caves of Altamira. This is my wine cellar. >> Welcome to Visual Studio Toolbox. Carl, of course, is the co-host of the iconic and long running.NET Rocks podcast. >> You were a guest in the 40 or something, right? >> When I was in my 40s? No, wasn’t that long ago. >> No, show number 40 something. >> Episode number 43, if I recall correctly, December 15th, 2003, when I was the Product Manager for Visual Studio Tools for Office. >> Yeah. >> Now, 17 years later, I’m returning the favor and having you on Toolbox. >> I appreciate it. Thanks.

01:16 - >> I wanted to wait until you are good at this [inaudible] thing. >> Oh, God. [inaudible] , that it is. >> Polly gives us resiliency, transient error handling. We’re going to do this as two parts. We’re going to do a shorter introduction. Then in the second part, we’ll dive into more detail. >> Right. Polly is an interesting project that I noticed in the early 2000s or the mid-2000s, I suppose. What happened to my thing? What happened to my thing, Robert? The whole idea with this, is that you have services that are in the Cloud that are talking to each other.

02:07 - It’s a microservices world, we’re just living in it. The problem is when services go down that you need to talk to and you’re counting on them being there, what do you do about that? I mean, it’s one thing if you’ve got an application that’s in the browser and your Wi-Fi goes down, you get a 500 and people they expect that, they’re like, “Oh, well, my Internet’s down. I think I’ll figure out why that happened and I’ll come back. >> Try again later. >> Try again and everything’s fine. But that doesn’t happen in the Cloud. In the Cloud, you’ve got these issues where one microservices is talking to another one downstream, and nothing goes through. You make an HTTP request and nothing happens, and that could be for a number of reasons.

03:03 - One reason is that there’s an Internet outage in the Cloud, that usually doesn’t happen. But another reason is that the service is struggling. The service may be under a heavy load. The person who configured that service may not have given it enough resources in order to handle the network load. What happens is that if you’re just in a steady retry loop, you’re just retrying, retrying, retrying, you’re putting even more load on that struggling service. It’s almost like a denial of service attack that you didn’t expect to get. >> You’re not making the users happy.

03:49 - You’re not giving them any useful information. They just think your app is terrible. >> Exactly what happens. There are different scenarios. I think Netflix did a really good job of managing this with a circuit breaker pattern. A circuit breaker is we’re going to do a few retries and maybe we’ll spread them out and we’ll get more time in-between each retry. But after a minute or two, we’re just going to stop. >> Right. >> We’re going to stop sending requests. It reminds me of that old I Love Lucy episode, where Lucy and Ethel were taking strawberries off a conveyor belt or something and they just kept coming too fast and they’re throwing stuff [inaudible]. >> This happens occasionally.

04:38 - It’ll try, it’ll spin, spin, spin, and then it says, we’re having trouble playing this video. >> Right. >> Right? >> Yeah. >> Yes, it’s annoying, but at least it gives me the sense that they’re going to stop trying. So I can either watch something else or I can try again. It’s better than just leaving me hanging. >> Yeah, exactly. Gmail is a really good example of this. If your Wi-Fi goes down while using Gmail, at the top of it, it says, “Oops, something happened. We’re going to try again in 30 seconds” and it counts down. But if you want to retry now, you can. You can push that button.

05:17 - >> So Polly gives us the ability to do that in our.NET apps? >> Yeah. Polly gives you the ability to create policies, and that’s what Polly is really short for, policy. To create policies that say, we’re going to retry three times and we’re going to wait first 200 milliseconds and then times 10, times 10, whatever. Maybe we’ll do an exponential back off. Then maybe we’ll stop. We’ll do a circuit breaker. We’ll just stop all requests from coming in. But this all happens at a policy level. You have an HTTP client that actually the Polly mechanisms are built now into HttpClientFactory in.NET Core.

06:03 - Essentially, you can configure it all with a policy, and then you just make a call, and all the retries and stuff happen under the hood. You don’t even worry about it. You just wait for the result to come back. >> That sounds cool. How’s it work? >> Yeah, it’s great. What I’d like to tell people, the flip answer is, Polly is a giant wrapper around try-catch. It’s good to think of it like that, because we use try catch all the time but the problem is, okay, you’ve caught an error, now, what do you do? >> Right. >> The classic problem is we shift that off to the user or log with the real exception details, and then we tell the user, “A problem happened, sorry. I don’t know what to do.

We’ll just, 06:54 - I don’t know, press this button.” If you show my screen, I will show you the Polly repository on GitHub. This is a standard repository. Look at how long this page is. I mean, the documentation is really good. It shows you all the different kinds of policies and samples and stuff, but there’s also a different samples of repo. >> Cool. >> That’s what I’m going to show you here. There’s been a lot of work on Polly over the years.

07:27 - Can I just mention some stats real quick? >> Sure. >> We’re getting almost 150,000 downloads of Polly every day. >> Wow. >> It’s definitely in the top 20 of open source projects to be downloaded. But if you take out duplicates like multiple x unit packages and Newton soft stuff, Polly is in the top 10. It’s a very, very popular framework. This is the wiki right here. The wiki is great if you want to keep up with the road map and where things are going and suggests stuff.

08:14 - But I think we should just jump over to a scenario. >> Yeah. >> Right? >> Yes. >> To see how it works. Yeah. There’s two parts to the project right here. There’s a client application, which is just the Console App. We actually have this Polly test app out on Azure if you want to use that, but you can also use a local API. The API project is actually running. Now, this API project has a values controller right here. It’s very simple.

08:51 - You pass an ID and it says response from server to request ID, so it basically easy for you to use as a measuring device. But if you look in AppSettings, JSON, there are these rules where we have this throttling engine that throttles based on a period of time and a number of requests. This essentially says that the first three requests within any five second interval will come through just fine. On the fourth requests within five seconds, we’re going to throw 500 error. >> Okay. >> So that’s just how we can test. Now, in the sample application, and this is in Polly samples, got a bunch of asynchronous demos. I’m going to use the async ones. The first demo is no policy.

09:43 - In other words, all of these demos follow the same pattern. You’re going to see the same code pretty much except for the policy and maybe a little bit of stuff. >> Okay. >> Let’s go back to the demo here and you’ll see what we’ve got here. We’ve got an execute async with a cancellation token. We’re reporting some progress. We’re creating a new http client. There’s no policy here. There’s no failbacks. This is like a typical kind of application.

10:16 - You have a try-catch, and in the try, we call this Web API route API values and the total request, which is the number. We’ll display that, and if we get an exception, we’ll report that. Then we’re going to wait half a second and we’ll go back and continue doing this demo. Then it shows some statistics. This is pattern of these demos. Let me show you what happens with no error handling, and you can see the errors are in red, the good requests are in green. One, two, three within five seconds happen, and then these other ones through 10, like this is what you’d expect, right? >> Right. >> Okay.

When I press “Control C”, 11:06 - we can close that. Now, let’s go to the next demo, which is a retry for N number of times. This is the way that you create a policy in Polly. Here’s the policy. We’re handling regular exception and you can put in whatever exceptions you want, even looks them out. The policy name is retry async. We’re going to retry three times. >> Okay. >> Okay. There’s no delay in between each retry. It’s just going to retry three times. By the way, these demos aren’t, how shall I say, prescriptive, like these aren’t hey, do it this way.

11:55 - We’re showing you these demos to exercise the different policies so you can understand what they do. >> Right. >> Then we’re reporting the progress. Now, we have our HttpClient and within the try, we do, in a weight policy, execute a sync with a Lambda, and this is the code that we’re going to execute within the context of the policy. This is essentially what happens. It’s the same deal except that we’re executing it within the context of the policy. There you go. Let’s try this. We’re going to retry three times. What happens is, boom, boom, too many requests, then we call again, we get an error, boom, boom, those three and yellow are our retries. >> Okay. >> The first three go through. We do three retries right in a row. So that’s what happens when you just do a retry.

13:01 - What you really need to do is wait in between each retry. By the way, this code right here, this is essentially the exception handler. This is the code that executes when a retry happens. You get the exemption in the attempt, and this is what is shown in yellow right here. Let’s go to the next one. You can tell me when we need to stop and save the rest for the next show. >> Okay. >> I’ll leave that up to you.

This is wait and 13:37 - retry a number of times, but the wait is the key here. We’re waiting in between each retry, and you can see those yellow guys are taking a little bit longer to execute. That’s because, let me pull up this code right here, what we’re doing is we’re saying our policy has a wait and retry three times, but we’re waiting 200 milliseconds between each try. Otherwise this code is exactly the same. >> So we’re getting the same failures where we just now have the ability to detect and do something about it. Is that correct? >> Yeah. These different types of policies give you the ultimate flexibility, and I’ll show you how you can stack them and use them together. That’s where it really gets fun.

14:31 - >> Why don’t we stop here? >> Okay. >> We’d keep these nice and short, and in the Part 2 of this and the second episode, we’ll keep looking at the samples and get a little more in depth. >> Sounds good. >> But before we go, show me how I hook this all up. What do I need to add to my project is [inaudible]. >> Library. Yes, it’s so easy. You just install it via NuGet, Install-Pckage Polly. >> Okay. >> Polly.

If you can remember Polly, 15:09 - you can install NuGet-Package Polly. That’s it. >> All right. Excellent. This would be a good place to stop, and in the next episode, we’ll keep looking at this and see some additional scenarios. [MUSIC] .