Optimize for interactivity using Web Vitals (FID/TBT)

Dec 16, 2020 17:00 · 3982 words · 19 minute read lodash web plugin last five

(upbeat music) - Hi, I’m Cheney, and I’m here today to talk about optimizing your website for Core Web Vitals. Later on this talk, my colleague Damian will hop on and dive into a real-world case study. Today, we’re going to spend time talking about the interactivity pillar, why it matters, what metric to focus on, and how to improve it. As a refresher, the other two pillars of core web vitals are loading and visual stability, and you can learn more about that in Adios Molly’s talk from web.dev Live link down below. First, let’s talk about interactivity. What are we actually trying to improve and optimize for the user? Web pages are more dynamic, more touch-driven than ever before.

00:47 - There’s an ongoing dialogue between the user and the page, with multiple tabs, swipes and scrolls all within one navigation. When we touch or drag or swipe something, as humans, we’ve been trained by the world all around us to expect an instant response to that input. Yet the UIs we touch digitally don’t always seem to match those expectations. It’s frustrating to encounter experiences that seem sluggish or outright unresponsive. Of course, certain actions like tapping a search button or a filter button may not be trivial work.

01:27 - Maybe I’m just tabbing from one navigation tab to another. What this pillar looks at isn’t necessarily returning results immediately, but rather measuring whether the page was able to react to that input and give feedback instantly when necessary. So we ultimately want to improve something we call input latency, and that comes into three parts. The first part is the delay. That’s the time between the user interaction and when the browser was actually processing event handlers in response to that. Contrary to popular belief, when you touch something, the browser is still finishing up whatever it was doing beforehand. The second part is the processing time.

02:07 - That’s the time that it actually takes to execute event handlers that are tied in response to that user input. Notice how, in part two here, the user interface hasn’t actually updated. There’s no feedback yet given to the user that their input was actually registered and that it actually did something. So the third part is rendering, and that’s the time it takes to render the next frame after the browser knows well, what kind of UI update to put onto the screen? And this is where you can see that the UI shifted from fly to sleep. So the three parts here summarize what we call input latency.

02:43 - Now, the metric input delay refers to the very first part. We measure an input delay for every discreet action a user takes, such as a tap, a click, or a key press. Scrolling the page is not counted, since it can usually still happen even if the main thread is still busy. So looking at this example page load, the first input delay, it’s the very first interaction after a user navigates to your page. Notice how there are many input delays, but we want to concentrate on the first one because that’s often when the browser is the busiest, it’s parsing and executing various large JavaScript files that you might be loading, and it’s really also a great time for you to make a good first impression as a web developer.

03:28 - While we want to make sure that every user input on the page has minimal delay, in the current edition of core web vitals, we recommend optimizing the first input delay or short FID. In this page load, when the user taps at the very beginning of the page, the main thread was in the middle of the JavaScript task. And in order to start responding to that input, the browser needs to wait until the task finishes. And so the time between the point of the user input and the browser actually finishing that yellow task you see here, that time is the first input delay. And in order to have a strong likelihood that you can respond to the user in a fast, expected way, we recommend having no more than 100 millisecond delay at the 75th percentile or higher.

04:13 - It’s important to remember that first input delay is a field metric and it requires a real user. And it’s not easy to guess what a user might do. Some users are interested in X. Others might browse differently and scroll first before tapping on something else. Others, like myself, are a bit more impatient in tapping things immediately. And this is all impacted by what you show the user as splash screen or you have a loading carousel.

04:38 - When you show that, that particular user’s intent and what other work you might be doing underneath the UI, kind of hidden from the user. So the variation in input delays show the importance to collect and analyze FID data from your users in the wild, and can also concentrate on high percentiles like the 75th. You can collect this data using the web-vitals JavaScript library linked here, or check with your RUM analytics provider for any out-of-the-box support that they might provide. With the data collected, you can look at every type of input from the instant ones that don’t hit a blocking task, or the ones where there is a delay and you want to answer an important question. When the user did experience a delay in response to their input, how bad was that delay? Oh, look, Damien’s joining the call.

05:28 - - Yeah, I’ve been trying to join for the last five minutes, but I wasn’t getting a response. - Sorry, I was finishing up this long task of an introduction. - Maybe next time you’re can break it up into smaller chunks so you can respond faster. - That’s a great idea, but even better, maybe there was stuff I could have cut entirely. Anyways, I know you’ve been talking to real developers out in the field.

05:51 - And what are some common questions about first input delay that you’ve been hearing? - Yeah, that’s true. So FID is affected by many factors in the field. Which metric can I use when I am developing and testing in lab? - That’s a great question. So the problem, as we’ve said, is that every single user is different. You can’t generalize a test case in the lab that really represents the field. Every user has their own rightful choices on the page, and they could be very different. Taking an average or a median thus wouldn’t make sense since values could be zero because they didn’t hit a long task at all, or values could be pretty high because they touched the page right at the middle of a long task, as you see here on the left. So first input delay requires a user to have interacted with that page. And we know that it could be in the middle of some main thread work. And in the ideal scenario, we go to our lab tool and we say, well, what is my typical input delay? Now, we just explained that it’s not very easy for a lab tool to do that.

06:55 - So instead, in Lighthouse and DevTools, we surface a companion metric for the lab called Total Blocking Time. Now, total blocking time describes the root cause of a slow first input delay, which is long blocking tasks. We set a budget of 50 milliseconds for each task, but if you go beyond that amount, every millisecond after that is considered potential blocking time. And you do get a free 15 milliseconds because we think that gives the browser and the main thread enough time to do some work and reliably react visually to some user input in that timeframe. Now, that user input could happen at any time.

07:35 - It could hit the very first task on the page to the 50th task on the page, so it doesn’t make sense to measure tool a blocking time just for one task. So instead, we look at all the different tasks during the timeline of the page load, sum together all the different blocking times, and thus it’s called total blocking time. We’ll give you an example here. So on the slide, you see a main thread. You can see multiple tasks happening here, and there are long task, short tasks. The sum of all these different blocking regions, denoted in red, is what we call the Total Blocking Time. Now a developer comes along and wants to improve this, and one way they can improve that is by maybe optimizing the hydration step of the app.

08:17 - That might knock off 100 milliseconds off of that one task, and it’ll also knock off 100 milliseconds off of your Total Blocking Time. So Total Blocking Time, it doesn’t measure FID, in this case, but it does correlate with the FID because if you can compare, well, this main thread now looks a bit more open. So the chances, the probability that when I tap somewhere inside this main thread, I might have a higher chance of having a good first input delay due to it being more free. You can find Total Blocking Time surface inside Lighthouse. And this will be one of those top metrics that you’ll see there, but sometimes you might want to dig deeper.

08:55 - It might not just be your first party code that’s causing a slow Total Blocking Time. It could actually also be third parties. So Lighthouse is set up to try to help you optimize for this by having a Lighthouse diagnostic just for third parties. In this audit, you can see all the different third parties you’ve loaded and from the different domains listed out here, the size of the network transmitter, and you’ll find the total blocking time contribution on the right-hand side here. And you’ll find that sometimes even small scripts that are relatively fast to transfer over the network could have a really large impact on your blocking time due to the work that it executes on your main thread. - Well, that’s good. So you just mentioned that this metric correlate.

09:39 - But I see sides with a good FID in the field, but a poor TBT when assessed by lab tools. What would be the reason for that? - That’s a great question and a very common question. I suspect that the developers, and you have seen reports kind of like this, where at the top, it pulls data from the Chrome user experience report from the field. And this shows a green number, that you have a relatively good first input delay. And on the bottom, you open up Lighthouse.

10:06 - And we just said that total blocking time is a great tool to assess FID here, but we see our total blocking time is marked red. How does that make any sense? So let’s dive a little bit deeper about field data. Field data is a reflection of your actual users. So when we assess core web vitals at the 75th percentile, it’s checking if at least 75% of your actual user inputs fall into the fast bucket. The characteristics that make up this population could be very different from site to site. Lighthouse is a very general tool.

10:41 - It doesn’t have access to your user base and kind of your biases, and it might be emulating a target user that might match closer to a higher percentile, depending on what kind of site that you’ve built. We also know that FID data has a very wide range. Some inputs could be as low as zero because a user just taps when the main thread is free, and sometimes it could have a very high FID score. So we see a curve you’ve plotted out across all your different user inputs to be somewhat like this shape. And so what this means is that, when you assess it at a lower percentile, it could actually digress very far for something you might measure at the 95th and 99th percentile.

11:22 - When you start moving to the 75th percentile, it starts to predict with higher accuracy what that might be. But your tools might actually be assessing it at a higher percentile because it’s understanding a different subset of your users. So we know from experiments that total blocking time and first input delay are correlated, but they might be impacting the curve in different ways. Nevertheless, what that correlation means is that an improvement in total blocking time will likely lead to an improvement of first input delay across the curve in this example here. So the key word here is probability. Ultimately, first input delay is reliant on many different fields factors.

12:03 - Consider an edge case where a page has a single call to action, and it shows up very early on in the page load and is the only call to action. It just happens to be when the main threat is busiest. Most users will end up tapping at that moment, hitting that long main thread task, and they don’t wait for the page to fully settle. Now, a developer could come along and improve total blocking time in this scenario by cleaning up long tasks later down the page load. But in this case, the user was still tapping at the very beginning, hitting that same long task and getting the same delay.

12:36 - So this is one example where total blocking time did improve but the likely place that a user taps the page happens to still be hitting a long task, and thus leading to a bad first input delay. So what you need to remember is that the key here is probability. Again, when you improve total blocking time, it leads to a better chance that you improve your first input delay, but know that there are edge cases when out in the field. Speaking of the field, next, Damian will present a real-world case of how developers actually improve this metric. - Thanks, Cheney. In this part of the talk, we’ll review a real-world case of interactivity optimization.

13:18 - The case comes from Mercado Libre, the largest e-commerce and payments ecosystem in Latin America. Mercado Libre is a complex website developed by distributive teams with a mix of technologies. For that reason, implementing a performance strategy across the company can be a challenge. Despite this, Mercado Libre front-end architecture team took the job of monitoring speed throughout the site and applying performance optimizations when necessary. In this section, we’ll focus in a particular optimization for one of the core web vitals, first input delay.

13:54 - To start this journey, let’s show you how to monitor performance. Speed tools are divided into two major groups. Laboratory testing tools are run in a testing environment and are critical during development. Mercado Libre used Chrome DevTools, Lighthouse and WebPageTest while working in the lab. Real user monitoring tools collect data from the field, letting you understand how real users are experiencing your site.

14:23 - Mercado Libre team combined the Chrome UX report with other RUM tools to measure performance in the real world. When working on performance, the first step is to have a plan that allows you to identify issues, iterate on them, and analyze the results. As Cheney mentioned earlier, unlike other core web vitals, first input delay is a field-only metric. When working in the lab, you can use total blocking time as a proxy metric for first input delay. A tool that can be of great help when optimizing TBT locally is Chrome DevTools.

15:01 - The performance tab lets you easily visualize long tasks, which are those that take more than 50 milliseconds and have a red triangle in the top. In the lower left corner, the tool informs what is the total blocking time for that trace. After making changes locally and deploying new versions of the site, you can use tools like Lighthouse and WebPageTest to simulate how the site would load under certain conditions. Finally, you can measure the impact in the real world by querying the Chrome UX report. There are different ways of obtaining and visualizing these data.

15:38 - The CrUX dashboard, for example, lets you see how the different core web vitals have evolved in time. The CrUX API is another way to dig into the data and integrating it with your own tools and solutions. For example, Mercado Libre used the CrUX API to build a tool that let them easily compare their URLs against competing sites and create a ranking based on that information. But one of the CrUX integrations that Mercado Libre team found more useful was the search console core web vitals report. The teams started to receive search console warnings, alerting that product detail pages were having a poor first input delay.

16:21 - This helped the team understand on which part of the site they should focus their optimization efforts. After receiving this information, the next step was measuring long tasks in Mercado Libra product detail pages. The team started by running Lighthouse on a sample of product detail pages. And they found that the only metric in red was Max Potential First Input Delay with a value of 1.7 seconds. These metrics represents the duration of the first long task.

16:54 - Take into account that, at the moment, when Mercado Libre applied these optimizations, they were using lighthouse 5.2. In the latest version of Lighthouse, version six, the metric to use in cases like this is the one that Cheney covered at the beginning of this talk, Total Blocking Time. To get into these metrics, the next step was running simulations in real devices and connection types. Mercado Libre is present in 18 countries. Its main markets are Mexico, Brazil, and Argentina. With all these options, they needed decide in which country to work first.

17:35 - The team picked Mexico as the country to iterate on their solution, as WebPageTest offers a wide variety of devices to test from close locations. So to simulate the experience of users in Mexico, they decided to use the following profile. For the location, they picked Dulles, Virginia for being a relatively close city with a wide variety of real devices in WebPageTest. For the connection type, they picked 4G, and for the device, they chose a Moto G4, a relatively low-end phone, which can easily reproduce performance bottlenecks around interactivity. This is how the main thread looked like for product detail pages.

18:19 - As can be seen, there was a long running task occupying the main thread for two consecutive seconds. These explain the long values for Max Potential First Input Delay. Analyzing the corresponding waterfall, they found that a considerable part of those two seconds came from two files. Their tracking module, which is used not only in product detail pages but throughout the whole website, and the main bundle JS file which was 950 kilobytes and took a long time to parse, compile, and execute. Based on the information obtained, Mercado Libre set the goal of optimizing the two modules that were running expansive goal.

19:01 - Product detail pages allow the users to perform complex interactions, so the challenge was optimizing these files without interfering with valuable functionality. They started optimizing the performance of the internal tracking module. The module contained a CPU-heavy task that wasn’t critical for it to work and therefore could be safely removed. This led to a 2% reduction in JavaScript for the whole website. After that, they started to work on the main bundle. Mercado Libre used Webpack Bundle Analyzer to detect opportunities for optimization. For example, initially they were requiring the full Lodash module. This was replaced with a per-method require to load only a subset of Lodash instead of the whole library. They also used the Lodash web plugin to shrink the library even farther. After that, they applied the following babel optimizations.

20:03 - They used the transform-runtime plugin to reuse babel’s helpers throughout the code and reduce the size of the bundle considerably. Then they applied the search and replace plugin to replace tokens or build time in order to remove a large configuration file inside the main bundle. Finally, they use an additional plugin to save some extra bytes by removing the prop types. As a result of all these optimizations, the bundle size was reduced by approximately 16%. The changes lowered Mercado Libre’s consecutive long task from two seconds to one second.

20:44 - Running Lighthouse again, it showed a 57% reduction in Max Potential First Input Delay. But one second of consecutive JavaScript was still too long, so the team set the goal of optimizing this metric even more. Digging into the main thread, they identified which were the libraries producing long tasks. Since product detail pages were built with React, they found some inspirations in the guides and code labs at Web.dev/React. Here are some of the optimizations they made.

21:17 - First, continue reducing the main bundle size to optimize compile and parse time, for example, by removing duplicate dependencies throughout the different models. Second, applying code splitting at the component level to divide JavaScript in smaller chunks and allow for smarter loading of the different components. Finally, defer component hydration to allow for a smarter use of the main thread. This technique is commonly referred to as partial hydration. The new trace showed even smaller chunks of JS execution.

21:55 - This gave the browser more time to process user inputs, leading to a more responsive user interface. Running Lighthouse once again, they found that the Max Potential FID Time was reduced by an additional 60%. But the true goal of these optimizations was to improve the experience for real users. Mercado Libre collected their own real user data to measure core web vitals. This is a report obtained from New Relic showing how FID improved in product detail pages.

22:28 - The control group in yellow showed first input delay without any optimizations. The experiment group in purple shows a much lower first input delay after the changes were made. Every 28 days, the CrUX report presents new data from real users. Here we can see Mercado Libre’s first input delay progress between January and April 2020. Before the optimization project, 82% of the users were perceiving FID as fast. At the end of the journey, this number went up to more than 91%. This means that 9% more users perceiving these metrics as fast. At the beginning of this section, we said that Mercado Libre was receiving warnings from search console about the performance of product detail pages. After fixing these issues, they stopped receiving these warnings. Let’s do a quick recap of Mercado Libre’s case.

23:29 - At the beginning of the year, the team set the goal of optimizing interactivity in product detail pages. They combined laboratory and real user monitoring tools and used an incremental approach to apply optimizations. As a result, they achieved 90% reduction in Max Potential First Input Delay in Lighthouse and a 9% increase in users perceiving first input delay as fast in CrUX. But performance is never a finished work. Mercado Libre believes that the speed of their site is a crucial aspect of the user experience, so they are constantly monitoring and applying optimizations across all the core web vitals. I hope you have enjoyed this talk. Thanks for watching. (upbeat music) .