Turn Any PDF into an Audiobook | Saturday, 11AM

May 15, 2021 07:00 · 3489 words · 17 minute read

Hi, my name is PK Gulati. I’m the founder of the Assembly. If you’re here, you’re probably watching an Assembly workshop. We do these workshops every week and these are prepared by the Assembly team in Dubai. These workshops cover ideas from data sciences, hardware design, automation, robotics, drones, and all the other exponential technologies that can you can think about. The idea is for us to learn more than what curriculum teaches us and we are trying to bring people to start working with their own hands with these technologies, which have the capacity of changing the world.

So welcome to this workshop and learn more about new wonders what you can build. Hello everyone, a very good morning and welcome back to another session. Today’s session is a very interesting one where we use Python to turn any PDF into an audiobook. Now, there have been times when you might have been wanting to buy an audiobook but did not because it was way too expensive. Well fear not, today we’ll be showing you how you can use Python to turn almost any PDF into an audiobook for free.

So let’s get started. So before we begin, let’s talk a little bit about the Assembly. The Assembly is basically a smart lab based out of in5 since December 2014 and over the course of around 7 years, we have done over 300 free workshops. These workshops are divided into three categories: Hack, Code, and Data Science workshops. The hack workshops are the workshops that deal with hardware like Arduino, Raspberry Pi, and IoT workshops—all these are Hack workshops.

Code workshops, on the other hand, are solely software-based like the workshop that we’re doing today and they involve the use of various APIs, related to gaming, etc. Finally, we have the Data Science category, which is quite self-explanatory and it deals with advanced topics relating to machine learning, AI, big data, etc.

02:25 - Our target audiences are students, entrepreneurs, and professionals. And we focus on smart technology and its practical application. You can join our forum, which is members. theassembly. ae and also don’t forget to tag us on our social media: on Facebook, Instagram, and Twitter at the handle of @makesmartthings and you can follow us on Youtube on our channel, The Assembly.

02:47 - So to begin with, what is an audiobook? An audiobook is a recording of a book or other work being read out loud, so instead of the conventional way of reading the book, the book is actually read out as it would be by a person reading a book to you. And the reading of a complete text is described as unabridged, while readings of the shorter version are an abridgment.

03:11 - These spoken audio had been available in schools and public libraries and, to a lesser extent, in music shops since the 1930s and today we’ll be converting any normal PDF into an audiobook that you can listen to.

03:26 - So what will we be using today? We’ll be using a Python library called Pyttsx3, which is a text-to-speech conversion library in Python. There are other alternatives to do that in Python as well, but the advantage of using this particular library is that it works offline and it is compatible with both Python 2 and Python 3, so that is a big advantage of using this library.

03:52 - The second library that we’ll be using is called PyPDF2 and it is a pure Python library that is built as a PDF toolkit and has a lot of functionalities, like extracting text from the PDF, splitting documents page by page, cropping pages, etc. It has a lot of functionalities and by being pure Python, it should run on any Python platform without any dependencies and external libraries.

04:18 - And it also means that it can work entirely with StringIO rather than FileStream, which allows the PDF manipulation in memory. So it is also a very useful tool for websites that manage or manipulate PDFs.

04:33 - So without further ado let’s get started. So first off, we open our Visual Studio Code and create a new file. In my case, I have called it “audiobook. py” but you can name it anything.

04:47 - And what we’re essentially trying to do is we are trying to use the two libraries—the PyPDF2 and Pyttsx3—to create an audiobook. So we’ll use PyPDF2 to split the PDF page by page, and then we will read the text on each of these pages.

05:07 - And then, the text from these pages will be sent to Pyttsx3 library, which will, in turn, read out the text to the user.

05:17 - We’ll also allow the user like we can also save the output of the audiobook so that he can have an mp3 file that he can listen to any time.

05:27 - So yeah, let’s get started. So first, we’ll need to import the libraries and before you import them, you need to install them. So you need to go to the Terminal and install these libraries by typing “pip install pypdf2” and “pip install pyttsx3”. Now once you have done that, we can go ahead and import them, so “pyttsx3”.

05:55 - Also here, on to “import PyPDF2”. Make sure that you capitalize the P in the beginning and the PDF2.

06:06 - Okay. So now that we have imported both the libraries, what we need to do is we need to have a PDF file with us. Now in our case, we have a simple one-page PDF file just for the demo purpose and we’ve named it “demo. pdf”. Now we’re going to open up that PDF, so we’re just naming it as “book =” and we’re opening it.

06:30 - And obviously, we’ll pass in the name of the file. Now in our case, the name of the file is “demo. pdf”.

06:38 - And since we are reading, so we’ll say “rb” that we’re reading bytes.

06:45 - Next, we need to initialize the PyPDF reader. So in order to do that, we need to create a variable to store it and just call it “pdf_read = PyPDF2. PdfFileReader” and then we’ll pass in the name of our book, so we’ll just say “book” here.

07:13 - So that’s how simple it is to create a file reader object and now we can just extract the number of pages from the file. So this library will allow us, like it has a method that automatically gives us the number of pages in that particular PDF, so we can use that later in our workshop. So what we’ll do is we’ll say “num_pages =” whatever we named it, so “pdf_reader. numPages” and it’s this simple. So this will give us the number of pages in our PDF and we can try and print the number of pages and see if it actually worked.

So let’s just run our code quickly. Yeah. So as you can see, it ran the code and it printed one because in our case like our PDF has only one page, but it will display the number of pages that your PDF currently has. So, so far it’s working perfectly. Now we also need to initialize our text-to-speech library, so in order to do that, what we can do is we can say “engine = pyttsx3. init()” so this is how we initialize our text-to-speech engine.

08:38 - And uh we’re pretty much done with this. So what we can do now is we can—okay, just to give a sort of how this works, let me type in “engine. say()” and you can—what this function allows us to do is whatever text you give it, it will speak that out for you. So it will convert that text into speech and say it out aloud. So this way, you can check if it’s working properly or something is wrong. All right, so we can give pass in any text that we want.

For now we’ll just say “Hello world”. Okay, we need to spell that right.

09:20 - And we can say “engine. runAndWait()”. This will run the command for us and finally we can finish it off by saying “engine. stop()” So this is So, if everything goes well , what this should do is it should print the number of pages, and then say “hello world”. So let’s run the code and see if it’s working.

09:47 - “Hello world” Right, so as you heard, it said: “hello world”. Now this means that if we can get the hello world like the output, then we can also get the output from our PDF. So that is what we’ll be doing next. So we can comment this out or actually you can just let it be there or we can— let us just remove this part because we don’t want it to say “hello world” anymore. What we can do is we can or before we even go to the next part, let me just show you how you can change—so the good thing is that this library allows you to customize some of its features so you can change the speed of how fast the person is speaking and you can also change the voice of the person.

So let’s see how those certain things can be done and we can just give it a bigger string: “Hello World, it is me again” Okay and yeah, this will allow us to try out the different settings that we make. So, we can change the rate at which the person is speaking right now. So the way we do it is we say “rate”—we create a variable to store the rating—so we can then call our engine and “. getProperty”. Now this method allows us to get the property and, in our case, we are looking for a property which is called “rate”.

Now this will get the rate and if you want to see the rate, you can print it. You can say “print(rate)” and if you want to reset the rate, you can say again use the same way, “engine-dot” instead of. getProperty this time, you’ll just use “. setProperty” and then set the rate. You need to pass in the rate, as well as the new value, so you want to change the rate value and what is the new value? Uh we’ll just give it 200.

11:56 - Okay, 200 so let’s see how this goes. Okay, so let us run the code.

12:11 - “Hello world, it is me again” So you notice that it’s much faster this time and as compared to previously so we can decrease the speed as well by going back to 100 and saving our code and trying it again, so let’s see.

12:29 - “Hello world, it is me again” Now this was definitely much slower, but I mean you can play around with this and see what works for you and whatnot, so maybe 150 will be a good choice, so we can try that out.

12:50 - “Hello world, it is me again” Yeah, this sounds good.

12:58 - So this is how you can change the date. Now you can play around with this value and find whatever suits your needs.

13:06 - We can also change the volume, now obviously you can change the volume from your system settings, but this also allows us a way to change the volume. So you can—again, it is very much similar to how we change the rate it is: we create a variable to store the volume, and then we call in the function, like “engine. getProperty” and this time, we want the volume property.

13:34 - Again, we can print it and and now to reset the volume—now one thing to note that the minimum volume level is 0 and the maximum volume level is 1, so the value of volume should be - so it should be between 0 and 1 and again, we can set it to set the new value by saying “engine. setProperty” and pass in the two parameters. The first one will be what value do we want to change which, in our case, is “volume” and the second one is the new value that we want to assign it to, so we want the volume to always be at its max so we’ll just keep it as one.

If you want to change it, go ahead and change it, but I don’t think anyone would want to decrease the volume from here rather than the system settings. But yeah, that’s just a way that you can change the volume from here.

14:31 - Now the final thing, the final property which is also a very interesting one is the fact that you can have different voices. So currently, it has two different voices. So let’s see how we can change the voice of the person who is speaking. So do that it is very much similar, we can say “voices = engine. getProperty”. This time, we are looking for a property called voices and we can print voices if we want. However, it won’t be anything useful to us because it will be just an encrypted string.

15:11 - And finally, if you want to change, we can say “engine. setProperty” and then pass in our two parameters. Now in this case, the first parameter is obviously the “voice” and the second parameter—notice that when we get got the property, we called it voices and when we are setting it, it’s just “voice” rather than “voices”. So “voices” is the list that contains the both the voices, and “voice” is the actual variable that controls which voice is being played.

15:47 - So the second variable is what voice do we want? So we want the… from the voices, we want the first one, and then we can say “. id” So the ID of the first voice is what we want to be played, so the first voice—this basically means that speak as the first voice. Now we can see what the two voices sound like, so we can just run the code. (voice: “Hello world, it is me again”) So this is basically the same voice that we were hearing before, now we can again change this to “1” and see what happens now and run the code.

16:30 - “Hello world, it is me again” So you notice that this one, the second one, the one has a much deeper voice and those two voices are different and you can use whichever voice you prefer. So in our case, we’ll just go with the second one.

16:52 - And uh yeah, we can close this. And we can create a method or a for loop. Basically, we want to for loop to run for the number of pages and in that, for every page, extract the text, read it out, and then stop. So, we’ll create a for loop. So “for num in range()” So we want it to run for how many other pages there are in a PDF. So for example, our current PDF has one page. It will run only once, but if your PDF has like a thousand or hundred thousand pages, it will keep on running for that and reading and reading the pages out loud to you.

So our range will start from 0 and it will the ending will be the number of pages, so it will run only one time in our case because our PDF has only one page. But if you change your PDF, it will run for how many ever pages the PDF has. Now you can extract the page by saying “page = pdf_reader. getPage()“.

17:58 - And you can pass in the number of pages that you want. So this is how simple it is to extract the text from a page, so you just call your PDF reader, and then “. getPage”, and you pass in the number page number and it gets the page text all the text from that page. What we’ll do next is—oh sorry, I forgot the next step. This is just the page that we have gotten from the PDF. We did not actually extract the text yet, so to extract the text, we can call in another method, which is “page.

extracText” extract text So now we can print out our print out and see the text that was extracted, so this should get us the text that was extracted from each page. Now in our case, we have only one page, so it will get all the text from that one page and print it out. Now if you have lots of pages, you will get all the all the text extracted from all those pages, so let’s run it and see. So yeah, this is our PDF content from here, so “Welcome to SmallPDF”.

This is just a random PDF that I downloaded and all these still here is what is there in our PDF.

19:23 - So yeah, that works pretty well. Now what we can do next is we can or what we actually want to do is we want to we want to turn it into an audiobook. So the way we can do that is very simple: you can just say “engine. say” This is very similar to how we said “hello world” and instead of saying—passing in the text ourselves, this time we can just pass in our data and we can say “engine. runAndWait()” and finally, “engine. stop()” This should read the data on the page and read it out loud, so let’s run our program and see what happens.

20:14 - “Welcome to Small PDF digital documents all in one place. Access files anytime, anywhere Enhance documents in one click. Collaborate with others with the new Small PDF experience. You can” All right, so it was a long text, so I just stopped it in between. But you get the idea. It reads the text that is there in your audiobook, so it kind of it’s converting itself into an audiobook, so the text that was previously in the PDF form is now being read out to you.

Now, the last thing that we want to do is we want to be able to save this the audio output so that we can hear it later, pause and play our own preferences—according to our preferences. So the way we do that is very simple: we just want to It’s just a couple of statements that we want to write in. Yes, here we can go and then we can say “engine…

21:22 - dot-save” It is “engine. save” to file, so “save_to_file” Now, this method will—what it does is it will automatically save the file to your desktop or wherever your current working directory is, so “engine. save_to_file()” And the next thing that we want to do is we need to pass in what is the data that needs to be saved. So… or we can actually move it into our loop over here save to file—save data to file and we can call it “data” “data”, and then we can name it “test.

mp3” And we can try it out. “Welcome to small digital documents all… ” All right, so let’s head to our folder and see. Right, so as you can see over here, it has a file called “test. mp3”, which is what we just created from here. One thing that we need to do is we need to comment out this line if you want to save the output because this line or the file writing to the file will only happen once it has completely set all the content of the the PDF, which is a lengthy procedure.

So we can just comment that line out, and then get the mp3 file. So here we have our mp3 file. Let’s check it out and see what if it did “Welcome to Small PDF digital documents all in one place. Access files anytime, anywhere Enhance documents in one click” All right so as you heard, it clearly saved the PDF version as an mp3 file and this is how you can use Python to turn your normal PDFs into audiobooks. So, I hope you enjoyed it and that’s it for today.

23:48 - All right so that’s it for today’s video, I hope you guys enjoyed it. If you did, be sure to hit the like button, subscribe to our channel if you are interested in our content. And this is me Ammar, signing off.

23:59 - I’ll catch you in the next one. Buh-bye. .