Introducing the Privacy Budget

Dec 15, 2020 20:00 · 1798 words · 9 minute read privacy sandbox changes namely watching

(upbeat music) - Hello, and welcome to this session. My name is Maud, and together we’ll take a look at the privacy budgets. First, take a moment and think about what you see when you browse the web. You see each tab has its own isolated world, but as a developer, you know that things are much more complicated than that. With an open environment, like the web, come risks and browsers do a lot to create or help to create security boundaries.

00:29 - For example, with cross-region isolation features that Camille Lamy demo’d in her talk. You can check the video link. And this is a win for your users’ security. Now creating boundaries in the web also protects users’ privacy because one problem today is people’s browsing activity can be tracked and linked across the web, sometimes in ways that users can’t easily see your control. In other words, covertly. Users typically don’t know about covert tracking because it’s hard to see it happening. And, even if they did, there would be no way to stop it unlike third-party cookies that you can see and block.

01:04 - So things need to change, how? Well, to perform web-wide covert tracking, there are a few mechanisms that can be used or rather abused. One of them is IP addresses. There’s a proposal to mitigate this problem called willful IP blindness, but it’s not enough because another mechanism that can be used is browser fingerprinting. We’ll look at how it works in a bit, but first, let me tell you about one proposal to prevent this. The privacy budget, both IP blindness and privacy budget are part of the privacy sandbox. A set of proposals to move towards a web that’s private by default. You can check the list on chromium.

org 01:43 - and all of the privacy inbox proposals or discussed in the open on GitHub. Now we believe that the privacy budget is how we can prevent browser fingerprinting while keeping the web powerful, but we’re early. We’re in the research phase, and in this talk I’ll be sharing with you how we’re trying to answer some hard questions about how the privacy budget could work. And by the time you watch this talk, we’ll be analyzing results or even have first insights to share. But so it’s too early for you to take specific actions on your site to prepare for the privacy budget because we don’t know yet how it will work exactly.

02:19 - All of this will come later and gradually, but it’s not too early to share your thoughts with us if you’d like to. We want this to be a conversation and we’re open to your feedback, I’ll tell you how later in this talk. Now let’s move on and take a look at how browser fingerprinting works? Imagine you’re trying to find a friend’s friend you’ve never met before. You’re told they’re wearing red t-shirt, but maybe 10 people in that crowd are wearing a red t-shirt, but if you also know that your friend’s friend is wearing sunglasses and a blue cap maybe, then you can identify them. Now, imagine you want to recognize someone anytime, anywhere, so the description of their clothes isn’t helping anymore.

03:01 - How about their handwriting, the way they draw and their main languages? This will stay the same for a while and should be unique enough when combined. Browser fingerprinting works in the same way. The fonts you’ve installed locally, the way your browser renders Canvas elements, your browser user agent string and more, or bits of information that remains somehow stable over time for one user, but vary a lot across different users and they’re easy for sites to access. You can actually quantify how much identity or piece of information exposes in bits with a measurable entropy. If an API is high entropy so highly identifying it can be used for browser fingerprinting. So it’s called a fingerprinting surface.

03:43 - When you combine several high entropy surfaces, they may uniquely identify him. A few interesting facts about entropy, you can calculate it with a formula that’s based on probabilities, for example, about 32 bits of entropy are needed to uniquely identify a single web user. But, and this is the tricky part, you mostly can just sum the entropy of different pieces of information from APIs to understand if a set of APIs would be identifying, for example, if it would expose over 32 bits of entropy. Because entropy is about probability, so APIs can correlate; for example, if a user speaks Greek, the probability that they have a specific font installed is much higher, so if you already know that they speak Greek, seeing that font in the local font lists doesn’t give you that much more information or entropy. Browser fingerprinting isn’t new. There are even libraries out there. You can share that with this demo, and this can be used for legitimate purposes, like for detection, but also for user tracking.

04:44 - And not only these fingerprinting-based tracking covert and easy but it’s usage may increase because it’s an alternative to third-party cookies that are being restricted in Chrome and other browsers. So what do we do about this? Well, web APIs, like Canvas, local fonts and others are great capabilities, but can hurt user privacy. So keeping the web as it is, is not an option. Now we could remove support for highly identifying APIs or not implement support for new APIs, or we could add noise to all API outputs, but this risks hurting the ability to build amazing web experiences, including for sites that have no intent of identifying users or sites that are only using one or two APIs. What if there was a middle ground? A way to get both capabilities and privacy.

05:32 - What if sites could continue using powerful APIs normally, but if a site uses too many highly identifying APIs, the browser could impose limitations to prevent the site from moving beyond the red line; namely, to prevent the site from entering the red zone where it could uniquely identify users. Well, that’s the idea of the privacy budget. As a developer, you would decide how to spend your site’s budget, a bit like performance budgeting, in a way, but the browser would define the upper limit and enforce it to protect user privacy. Now parallel to the privacy budget, Chrome is working on other measures to help move sites further away from the identifiability line. By reducing entropy where possible, like for some sensor APIs, by refactoring existing APIs to make them more focused, more purpose-built and less identifying, like User-Agent Client Hints instead of User-Agent string and by transforming passive fingerprinting surfaces information all sites can access without running any client-side code, like HTTP headers, into active surfaces.

06:37 - Information sites can access only by requesting it or running code on the client-side, like Canvas. This makes it easier for the browser to measure and control the budget. Back to the privacy budget, where are we? And what’s the status? Well, before the privacy budget can be enforced, some key questions need to be answered. Question one, where is the line? Well, very likely, quite high initially, so we can monitor impact and limit breakage, and then it will gradually move down towards 32 bits. The entropy needed to uniquely identify a web user.

07:11 - Question two, which sets of APIs move your sites closer to the line and by how much? And question three, today how many sites are above or below the line? We are hoping that most sites are already below so that the privacy budget enforcements only affect a small number of sites. But to answer these questions, we need data, like a lot of data, from the web, which is why a large-scale identifiability study is being run by Chrome. What’s really exciting is this study is being run in real-life conditions. Right now, there are some great sites that calculate how unique your browser is, but only compared to other visitors of that site, but the privacy budget needs to work for any user and any sites at scale. So the Chrome study is when for real Chrome users visiting real sites in a privacy preserving way, while looking at small subset of surfaces that are randomly selected.

08:03 - We’re excluding highly identifying surfaces and the data will be deleted after a short period of time. And across these users, the team is looking at how much identity every single site is accessing for all 300-plus identifying APIs. So how is the team doing this? Well, first, they measure how much identity each API exposes; for example, they look at how locally installed font files differ across users. Second, they measure how APIs correlate. Remember how we said entropy couldn’t be summed. Well, this is another special thing to the study does.

08:37 - The team is looking at how APIs actually influence each other in practice. From here, they derive how much identity subsets of API is exposed, and in parallel, they’re measuring which subsets of API sites are using. And, finally, they combine these insights to find out how much identity is linked to each site. So what’s next, and what does this change for you? Well, nothing yet. We’re in the exploration phase, and Chrome’s goal is to find a path towards the web that’s private by default.

09:06 - And some sites will need to change, but we know we can’t just overnight impose limitations on the APIs you’re using because we also want to keep the web powerful, which is why the team is running this in-depth large-scale study to strike the right balance between usefulness and privacy and find the line. We’re pretty excited about what this will tell us. Privacy sandbox changes are rolling out gradually and developer tooling will be made available for the privacy budget to help you out. We know you’ll have lots of questions about this so if you’re interested to hear more about the results of the study, stay tuned by subscribing to blink-dev or following ChromiumDev on Twitter and use these channels to share your feedback. Also take a look at some of the new less-identifying APIs that are already available. For example, User-Agent Client Hints.

09:53 - And, by the way, we have other new APIs that support third-party use cases without cross-site tracking, including an API to measure ad conversions. And, on this, check out Charlie Harrison’s video about the conversion measurement API and that’s it. Thanks for watching, and I’ll see you around. (upbeat music) .