Superpowers for next gen web apps: Machine learning

Dec 15, 2020 17:00 · 6010 words · 29 minute read building learning patterns less useful

(bright music) - Hello everyone. I’m Jason Mayes, Developer Advocate for TensorFlow.js here at Google. And today, we’re going to be talking about how you can give your future web applications super powers by using Machine Learning powered by TensorFlow.js. But you might be wondering, as a web developer, why should I care about Machine Learning? And first up, I just want to start by saying that Machine Learning could influence every industry out there. As web devs, we’re in a unique position where the apps that we create could be used for any one of those industries. And so there’s a good chance in the not-too-distant future clients will ask us to use Machine Learning too, as there are some unique benefits that we can get by doing this client side in the browser, as we’ll see later.

00:44 - Now, as a web engineer myself, it was not until a few years ago that I started using Machine Learning models in my web prototypes. So I’d like to share with you today just a few examples of how we managed to level up our apps to do some pretty amazing things in JavaScript. But wait, what is all this about exactly? Let’s back up just a little bit and address one of the elephants in the room. First off, what exactly are all these buzzwords anyhow? What’s the difference between Artificial Intelligence, Machine Learning, or Deep Learning? Let’s take a quick 101 on what’s going on here behind the scenes to help demystify these concepts. So first up Artificial Intelligence, or AI for short, is essentially defined as human intelligence exhibited by machines. But this is a very broad term.

01:31 - In fact, we’re actually at a point in time where we typically work with Narrow AI. Now, all that means is that these systems can do one or a couple of things as good or better than a human expert in that area, like recognizing objects, and a great example of that is in the medical industry where doctors use AI systems to help them identify issues in the grainy images that come back from scans of a human body. Now this allows them to spot things that they may have otherwise missed, leading to increase accuracy and reducing of the time taken to process, which of course is a great result for doctors and patients alike. Now there’s a lot of systems right now that work hand-in-hand with their human counterparts to create a workflow that’s more efficient than ever before. So next up we have Machine Learning, or ML for short. This is at the implementation level.

02:19 - It’s the actual program we can create that can learn from the training data presented to it, to find patterns in that data. It can then use this knowledge to classify previously unseen examples of the same class in the future. Now Machine Learning is an approach to achieve AI that we just spoke about on the previous slides. And the key thing here is that these systems, once programmed, can be reused. So if I create an ML system that recognizes cats, I can use the same code without modification to then recognize dogs, just by feeding it different training images for it to learn from, and this is very powerful and a big difference to how we used to program in the past. Take spam email as an example.

02:58 - With traditional programming, we may have had a bunch of conditionals or look-ups to check if a word was associated with spam. If it was, we’d block the email. However, the spammer can get savvy of this, modify the word just slightly, and then our system is broken, and then a war of keeping up to date between programmer and spammer starts to develop, which is not a good use of our time. Now, fast forward to today, and we can now use Machine Learning to solve this problem. Instead, thousands of users mark emails as spam and the Machine Learning will automatically figure out what words and features are most likely to have contributed. We can retrain the model every day with fresh content, and now no human needs to be involved, freeing up time to do other things.

03:39 - And here are just a few more common use cases for Machine Learning. Things like Object Detection, recognizing an object from an image, or what about regression, which basically means predicting a numerical value from some input value? For example, what’s the price of a house whose square footage is 1000 square foot? With enough data, you can predict this with Machine Learning, or how about natural language processing to understand human language? With this, we can mark if a sentence on a blog post comment is toxic, or if it’s positive or negative statement. One could use this to assist in blocking trolls on a website before it even gets posted. Or how about audio for speech recognition? I’m sure many of you have smartphones or have tried the web speech APIs, and this is all powered by Machine Learning too. And then finally we have generative or creative examples, one of which we can see on this slide right now, which is created by NVIDIA’s recent research.

04:32 - None of the faces in this animation are real. They’ve all been dreamt up by the Machine Learning model, just like if I asked you to imagine a purple cat, you could probably do so even though you’ve never seen one before. Now, the Machine Learning here has learnt the essence of what a human face is composed of, and they’ve been asked to generate new ones. Now, what about Deep Learning? Deep Learning is essentially one technique you can use to implement the Machine Learning program we just spoke about on the previous slides. You can think of this as one of the many possible algorithms you can pick from to make the program learn from the data that you present to it.

05:05 - There are, of course, many other techniques too, but essentially Deep Learning is where the code structures are arranged in many layers which loosely mimic how we believe the human brain to work, learning patterns of patterns, further down the layers you go. And what do I mean by that? Well, imagine at the early stages you can recognize something simple like lines. You go one level deeper, and those lines might combine to allow you to recognize shapes, and one level deeper still and those shapes might combine to allow you to recognize objects. For example, a face might be represented by several shape features in certain positions relative to each other. And generally, the deeper the network the more advanced patterns we can recognize, but this comes at cost of processing power.

05:49 - So in summary here, we can see how these three terms are actually linked. The Deep Learning is the algorithm you can use to drive the Machine Learning program and this Machine Learning program gives us the illusion of artificial intelligence, if you will. And these concepts go back to the 1950s. They’re not new, but it’s only now that we’ve got the resources at cheap enough cost, such as the RAM, the CPU, the GPU, to make these ideas even more feasible. We’re now living in a truly exciting time and we’re at the start of a new wave for how we create smarter systems in the future. So the next question you might be wondering is how on earth do we train such systems? That’s a great question.

06:26 - Now I know we all work in many different industries so feel free to adapt the following to your own area, but let’s do a thought experiment and pretend we’re trying to make a web connecting system for farmers who are trying to classify apples and oranges to speed up their delivery of picked fruits that are currently mixed together and need to be sent to the right destinations. Now, the first thing we need to identify are the features or attributes of the fruits that we could measure. Let’s take color and weight as an example. Both are easy to measure. We can use digital weighing scales and RGB values from a webcam, which allow us to do this. So going back to our high school maths, if we were to sample some apples and oranges and plot these values on a scatter chart as shown, we can see here that the red and green apples fall in the red and green spectrums of the x-axis of the graph, and tend to cluster together with similar weight variance in the y-axis.

07:13 - Now, your oranges, as they’re super juicy, tend to be heavier and are higher up in the charts. Now, if we can draw a line that separates the apples from the oranges, we can now, with some degree of certainty, decide what fruit something is simply by plotting its feature values on the chart. If it’s above the line, it’s most likely an orange, and if it’s below, it’s most likely an apple. We’ve essentially learned how to classify the fruits. So if we can get a piece of software to define the equation of this line by itself, we can then get a computer to learn how to classify fruits too.

07:46 - And this is the essence of what is going on behind the scenes for Machine Learning. Not so magic, right? Essentially, we’re just trying to figure out the best possible way to separate out the example data such that, for any new unseen example, we have a chance of classifying it correctly. But what if you had chosen bad features? Let us take ripeness and number of seeds. Here, the plot is less useful to us. There’s no straight or even curved line that would allow us to separate with data points. We can’t really learn from this data alone, and you might be thinking well, Jason why would you obviously choose such bad features and attributes? And that’s a great question.

08:23 - Sure, with this trivial example, it’s clear this would be unwise, but what about those medical scans we spoke about at the beginning of the presentation that are just RGB image pixels? How do you define the features for that? It’s not always so obvious. And what if we had more than two features? Previously we had just two features, so we used a two-dimensional chart to separate such data. If we had three features, we’d need a 3D chart to do so, as shown here. Now here we add weight to our previously unsuitable features and we can now use a plane, or a rectangle in 3D space, if you will, to the oranges from the apples. Now, hopefully in this image, you can see the orange is now further back in the y-axis, making them more separable from the apples, but it turns up, at 3 dimensions, it’s not typically enough for most ML problems.

09:07 - It’s not unusual to have tens, hundreds, thousands, even millions, in the case of images, where each pixel is a feature. And as humans, we’ve struggled to visualize higher than 3D. I tried and I failed miserably, so you have to trust me for that. However, for a computer, the mathematics works out much the same and is capable of doing such calculations. Instead of using a plane, we use something called a hyper plane which is simply one dimension less from the number of dimensions that we have, allowing us to split the data just like we did here, but with more features and attributes which can sometimes give us a better data separation. Okay, so back to TensorFlow.js.

09:44 - Now we understand what’s going on behind the scenes. You’ll be pleased to know that TensorFlow.js does a lot of the hard stuff for us. TensorFlow.js is a Machine Leaning library written for JavaScript. Doing Machine Learning in browser has several advantages such as lower latency, as no server involved, user privacy, as the data stays on device, and super easy deployment because anyone with a web browser can use it. And that means you can use Machine Learning anywhere JavaScript can run.

10:10 - And that includes the web browser, server-side, desktop, mobile, and even IoT devices. And if we dive into each one of these stacks in more detail you can see many of the technologies we know and love. In fact, JavaScript is one of the only languages that can run across all of these devices without any extra plugins, giving you the ability to deploy and run anywhere with just one code base. And this is a great win for JS devs, as you can make scaled web applications powered by Machine Learning in all of these environments and even control hardware from the browser or standalone if you wish. Now with TensorFlow.js you can run, retrain via transfer learning, or write your own models completely from a blank canvas.

10:52 - And with this, you can use it for anything you might dream up, things like sound recognition, gesture-based interaction, sentiment analysis, conversational AI, and much, much more. Now there’s a few ways we can use TensorFlow.js based on your familiarity with Machine Learning, JavaScript or both. The first way is to use our pre-trained models. These are really easy-to-use drafts for your classes that can be used for many common use cases, and there are many situations we do not need to train a brand new model from scratch and instead can then leverage existing work. Let’s take a look at some of those now.

11:25 - So here you can see several popular pre-made models available with TensorFlow.js, things like object detection or body segmentation, which is the act of classifying each pixel in an image to determine if it belongs to a human body or not. How about pose estimation to understand where the joints and skeleton might be? We even have many natural language processing models to understand the human language too. In fact, our new question and answer model allows you to ask a question on any piece of text and it can automatically tell you which part of that text actually answers the question. Imagine using that on a really long webpage to automatically scroll to the information that’s relevant to what you want to know. You can actually do that right now.

12:06 - And we have many more models you can check out via the link shown on the slide. So let’s see some of these in action. First up is object detection. This model is using something known as COCO-SSD behind the scenes and is trained on 90 common objects. It can recognize those objects in images and provide us with the location of each object with a boundary box, as you can see in this image on the right of the dogs. Notice how it can detect multiple objects at the same time. This is different from image recognition which understands something might be in the image, but not where or how many, and that’s why COCO-SSD is super useful.

12:42 - So let’s see it in action with a live demo in the browser. So here you can see us running COCO-SSD live on a webpage, and if I click on any one of these images here, you can see that it highlights if the object is found within them, even if they’re different classes. It can see here that this dog is very close to a bowl of treats, which might be useful to know if you want to send yourself an alert. Now we can go even better than this by enabling our webcam and if you do that you can see me live speaking to you right now and it’s classifying me in real time at a high frames per second. And what’s really cool here is that this is running all locally on my web browser.

13:18 - None of the webcam imagery is being sent to a remote server for classification, so your privacy is preserved as well. Next up we’ve got Face Mesh. This model is just three megabytes in size and has the ability to recognize 468 landmarks on the human face. Now, not only does this work super robustly, we’re starting to see real world use cases of people using this in production too, as we’ll see in just a bit. So let’s see Face Mesh in action, as it’s pretty cool model. So here is Face Mesh running live in my web browser.

13:47 - Notice how my face is being tracked whilst I’m talking to you right now, and on the left, we can show you the mesh of our face in real time, but because this is JavaScript, not only are we doing the Machine Learning but also rendering a 3D point cloud on the right-hand side in WebGL that’s fully interactive too. So let me show you that. You can see me moving the 3D points right now live in the browser, and it’s super smooth too. Now you’ll notice at the top left I’m getting about 22 frames per second, but that’s because I’m live streaming right now and of course, if I wasn’t live streaming we’ll get closer to 30, 35 frames per second at least on my current system. Now, JavaScript has a very rich ecosystem for 3D graphics and other charting which is far more mature than other environments, which makes it super fun to prototype new ideas in the browser with Machine Learning models. And even better, we can choose what backend to execute on, such as the CPU or GPU if we want to do so.

14:41 - And you can do this in this demo by clicking on the dropdown at the top right here, and that will give you a higher frames per second depending on the type of device that you’re running on. And here we see a demo by ModiFace, part of the L’Oreal group for AR Makeup Try On. It should be noted that the lady is not wearing any lipstick. Here, our Face Mesh model is combined with WebGL Shaders to augment the color chosen onto the person’s lips in real time in the browser. Next up, we’ve got body segmentation. This model can distinguish 24 body areas across multiple bodies all in real time.

15:13 - Now, this is hard to demo live, as I need more space, but notice from the image how the bodies of each person are correctly segmented with different colors representing different body parts. Even better, we get the pose estimation too, those lines in blue, to estimate where the skeleton is so we can do things like gesture recognition and much, much more. And with a little bit of imagination, we can actually emulate some of the superpowers we were promised from the sci-fi movies. First up, invisibility. This is a more advanced demo than simply replacing the background with a static image. For that, you wouldn’t even need Machine Learning, but notice here how, when I go on the bed, the bed still deforms in the image on the right as I move around or how the laptop screen still plays.

15:53 - This prototype uses BodyPix that we just saw to calculate where the body is not so that it can eventually learn all the background and keep updating parts where it’s safe to do so. Even better, this was made in just one day and runs entirely in the web browser. No background in Machine Learning is required to run this code. Simply clink a link and it just works, and the images are sent to the server for classification, leading to real-time results. Next up, Lasers. Another member of the community from the USA combined his love for WebGL Shaders with TensorFlow.

js to enable him to shoot lasers 16:25 - from his eyes and mouth just like Iron Man, and this uses Face Mesh model we previously saw to run in real time in the browser without any issues. Now, whilst this is a fun demo, you can imagine using this for a movie launch to amplify your reach by building a one-click creative experience for fans to drive excitement. Or how about teleportation? By combining TensorFlow.js with other emerging web tech, we can now create a digital teleportation of ourselves anywhere in the world in real time. Here, I segment myself from the bedroom using BodyPix.

16:57 - I transmit my segmentation anywhere in the world with WebRTC and then recreate myself in the real world environment with WebXR and Three.js. Remember, all of this is running in the web browser. No app install is even required leading to our frictionless experience for the end user. And having tried this myself, it really feels more personal than a regular video call, as you can actually walk up to the person and hear the audio from the correct direction as if they really are there. In fact, maybe next time when I’m presenting to you at a future event, I might be able to do so in your own room just like this, as if I was standing right in front of you.

17:31 - And of course, there are many other delightful creations we can make beyond super powers. How about this clothing size estimator? Here, I created a tool that can estimate your clothing size in under 15 seconds in the web browser to automatically select the correct size of clothing on a website. Now, I don’t know about you, but I can never remember my sizes for clothing. And with this tool, I simply enter my height, stand facing the camera, and once they have a side, and it can automatically choose for me the correct size at checkout. And of course this means less returns and less time wasted.

18:03 - This was created in just two days and can potentially be used by anyone with a single click at the point of checkout on any websites. And finally we have one more example from the community. Here, someone’s managed to bring an image of a model from a magazine to life using WebXR and WebGL. Note that even with these fancy particle effects and Machine Learning running in the background, this is running on a two-year-old Android device and still the performance is great. Now the second way to use TensorFlow.js is via transfer learning. This basically means re-training existing models to work with your own custom data.

18:37 - Now, if you’re familiar with Machine Learning, you can, of course, do this programmatically in code, but today I want to show you two easy ways to get started. First up is Teachable Machine. This is super easy to use and runs entirely in the web browser. both for training and for inference, which is the act of using the model to classify something. The best way to explain this is with a demo, so let’s try it out. So if we head over to TeachableMachine.withgoogle.com we’re presented with a screen like this.

19:04 - We can see three types of projects we can use: image, audio, or pose. We’re going to use image today because we want to do object detection. So click on that and you then see a screen like this. On the left-hand side are the classes that you want to recognize. You can add more than two if you wish by clicking add a class, but today we’re just going to recognize my face or a deck of playing cards. So let’s go ahead and give them more meaningful names. So for class one, I’m going to call this Jason, and for class two, I’m going to call this cards. Now, all we need to do is click on the webcam button, allow access to our cam, and there you can see a live preview. We can use this to sample that object. So the first one is my face. I’m just going to move my face around and take some samples by clicking this button below. And notice how I moved my head around to get some variety so it learns what my face looks like from the sides and different angles.

19:55 - I’ve got about 36 images here, and I’m going to try and do the same with a deck of cards by clicking the webcam button below and try and get the same number of images. Otherwise, I’ll have a bias in my system. So let’s bring the cards nice and close and try and get 36 images of that as well. 35, that’s good enough. So now I click on Train Model, and what’s going to happen is, behind the scenes, we’re going to retrain the top layers of the model to distinguish the difference between my face or a deck of playing cards. And you can see in the time it took me to say that it’s already finished training and we’ve got a live preview on the right-hand side. Currently it predicts Jason with 100%, which is correct. My face is indeed in view.

20:31 - If I bring the deck of cards into view, we can see it now predicts cards with 100%. Jason, cards, Jason, cards, and you can see how responsive that is, too. Now, this is great for prototyping, and if this is good enough for your needs, you can click on export model at the top right there, click on download, and now you can download the model.json file that you need to run in the web browser. You can then host this on your own website or CDN and use that in any way you wish with a nice graphical user interface and user experience.

21:01 - Now, Teachable Machine is great for prototypes, but if you want to launch a production model with gigabytes of training data, then Cloud Auto ML can be used for this, and it supports exporting to TensorFlow.js too. In this example, we see someone trying to classify flowers. All they’ve done is updated photos of flowers to Google cloud storage, and then we can move on to the next step of the training process. You can now select if you want to train for higher accuracy or faster prediction times. Of course, there’s usually a tradeoff between the two. Once complete, you’ll be able to export TensorFlow.js as shown, and you can simply download the files and host on your website or CDN. Now, some of you might be wondering, how hard is it to use that resulting model in JavaScript? Well, actually it’s pretty easy. In fact, it’s so easy, it fits on a single slide. So let me walk you through this right now. First, we have two HTML script imports. The first one is for TensorFlow.

js itself, 21:54 - and the second one is for the Cloud Auto ML library. We then have an image tag for a new image that we want to classify. In this case, I just grabbed an image of a daisy from the internet, but it could be anything. It could even be an image from a webcam stream if you wanted. And then finally, we’ve got the actual JavaScript code, which is just three lines of JavaScript to do the hard work. The first line, we simply call await tf.automl.loadImageClassification and pass to it the location of a Machine Learning model that we just trained. In this case, it’s called model.json and is located in the same directory. This is the file we downloaded in the previous step and it’s simply hosted somewhere on your web server. We use the await keyword here because the model load is asynchronous, meaning that it would take some time to complete. This allows us to wait for this to finish before continuing sequentially.

22:46 - Now, once the model’s loaded, we can then grab a reference to the image you want to classify by using document.getElementById and pass the ID via the element we wish to use. In this case, it’s a daisy, which represents the image tag above. Finally, we can call await model.classify and pass through it the image you want to classify. Again, depending on the model, this can take several milliseconds to execute, so this uses the await keyword, too.

23:12 - You’ll then get a JSON object returned to our predictions constant, which you can then loop through and print the results or do something more useful. It should also be noted that you can call model.classify as many times as you like with different images once the model itself has been loaded and that’s how we can achieve real-time performance on the webcam. So we’ve seen a lot of great demos, but what are the core benefits of doing Machine Learning in JavaScript? Well, first let’s start by explaining the TensorFlow.js architecture. We’ve got two APIs, and the first one is a high-level API known as the Layers API, which is very similar to Keras if you’re using Python already.

23:49 - Next we’ve got a low-level API, known as the Ops API, which is more mathematical in nature and allows you to do things like linear algebra should you wish to work at that level. So let’s see how these all come together. Here you can see how our pre-made models sit upon the Layers API, which itself sits upon the Core or Ops API. Now this lower level API can speak to different environments such as the client side, which includes things like the web browser for example, and each one of these environments can execute on a number of different backends. For example, the CPU which is always available, WebGL for GPU acceleration if supported, or WebAssembly, WASM for short if supported, for more efficient performance on CPUs. And there’s a similar story for the server side, too, such as in Node.js. Now, note here that our Node.

js implementation 24:35 - can talk to the same bindings that Python TensorFlow talks to. So performance is just as good or sometimes even better than Python due to the just-in-time compiler of JavaScript. The other thing to note is that, if you’re working on a machine learning research team who want to deploy their research to the web, there’s a good chance it might be coding in the Python flavor of TensorFlow, and with Node.js, you can execute the saved models they produce without any conversion required, making it super easy to integrate. However, if you want to run a Python save model in the web browser, then we’ve got a command line tool that helps you to do that which will convert the saved model to the JSON format required to use the model on the client side in the web browser.

25:13 - So to wrap up this section, there are five client-side benefits of doing Machine Learning in the browser that are worth pointing out. The first is privacy. As inference is performed on the client side, no data is ever sent to a third-party server, and that means we can maintain data privacy for the user. This is particularly important for medical and legal industries where it might be a requirement not to transfer data to a third party, not to mention, there are growing concerns around privacy these days, and here, you get it free. Next up is lower latency. As JavaScript has direct access to the sensors on device, such as the microphone, camera, accelerometer, and much more there’s no round-trip time to the server to analyze that data. There you can see that the server could be close to a hundred milliseconds on a mobile connection, but with TensorFlow.

js running on device 25:58 - we can go much faster than that. Next, cost. If no data is being sent to a server, then less bandwidth and hardware costs are required, as no CPU, GPU, or RAM are needed be hired to be running ²⁴⁄₇ for inference. You just have to pay for hosting or website assets on the model files, which is far cheaper. Next up, interactivity. Web tech has been great ever since the very start and has evolved to handle even richer formats: WebXR, WebGL and so on. And I encourage you all to see how you can push Machine Learning models further when combined with the rich ecosystem that JavaScript has to offer us. And then finally, reach and scale. Zero installation is required.

26:39 - Anyone can click on a web pipe link and load a webpage, and the Machine Learning will just work. That’s all you need to do to run a Machine Learning demo in the browser. So finally we’ll wrap up with some resources to get started if you wish to continue your TensorFlow.js journey. If there’s only one slide you bookmark and share it with folk, let it be this one here. This slide has all the resources you need to get started. And for example, our websites and API are available on tensorflow.org/js, and our models are also available to use. Today, we just touched on three of them, but there’s many, many more to check out too. We’re fully open sourced. So check us out on GitHub, and we welcome contributions. And if you’ve got more technical questions, check out our Google Group.

27:19 - We’ve also got some boilerplate code showing how to use pre-made models in minutes over on Codepen and Glitch. Now, if you’re looking for an all-in-one book, Deep Learning with JavaScript by Manning was written by folk on our team and takes you from zero Machine Learning knowledge to learning how to implement more advanced techniques. Familiarity with JavaScript is the only requirement, and no machine learning background is required, or check out one of our many Codelabs, if you prefer a more hands-on approach. Learn how to make your own smart webcam just like a Nest Cam, in minutes, create custom models or learn how we make teachable machine possible. Also, a quick shoutout to our community. Check out the #madewithTFJS on Twitter or LinkedIn to see many more amazing examples that people have been creating.

28:03 - New content is coming out every single week, and it’s a great way to get inspired. If you make something using TensorFlow.js, be sure to use the hashtag for a chance to be featured at future events, such as our show and tells here on YouTube and even on our blog posts. And finally, the only question left is, what will you make? This final example comes from our community member in Tokyo, Japan. By day, he’s a dancer, but he’s managed to use TensorFlow.js to create this amazing hip-hop video with some awesome visual effects using BodyPix.

28:31 - The reason I show you this is that Machine Learning really now is for everyone, and we’re super excited to see how TensorFlow.js will enable many more people to start their journey with Machine Learning. Creatives, artists, musicians. No matter what your background, you can still use models in ways never even dreamt up by the original model creator as you saw from just a few of the demos today, and we’re super excited to see what you create. Please do use the #madewithTFJS hashtag so we can find your work. And with that, feel free to stay in touch or reach out with any questions.

29:00 - You can add me on Twitter @Jason_Mayes, and I’d love to hear from you. Thank you for listening, and see you next time. (bright music) .