WARNING! Do not open an eBook without making sure that the PDF file is clean.
This article focuses on the danger of free PDF files which float around the internet. I’ll describe in detail how I check for malicious content which might (or might not) be attached to the zip archive. Before I go any further, I’d like to state that this information presented here is based on my “notes” and for informational purposes only. I can not guarantee that the methods I describe here will work for you as they do for me.
The short answer is that I have to check the file before I open it. I do this with the help of PDF Tools. Please note that this article assumes that you use a computer that runs the powerful and robust Linux operating system. Any distribution can do what I describe here.
Let’s get the pdfid.py Python script
A quick search points me in the right direction which is the code author’s website
Once I open that web page, I press Ctrl + F to access the web browsers search feature and enter this into the search field without the quotes: “pdfid_v0_2_5.zip”
Next, I extract the zip file in my Downloads directory which I access by entering this into the Linux terminal: cd /home/youruserid/Downloads/pdfid-master (press enter).
Then I type “ls -l” to list the files inside the pdfid-master directory. This will reveal the following content:
pdfid-master]$ ls -l
drwxr-xr-x 2 me me 4096 May 27 2016 img
drwxr-xr-x 2 me me 4096 Dec 1 15:21 pdfid
-rw-r–r– 1 me me 3487 May 27 2016 README.md
-rw-r–r– 1 me me 311 May 27 2016 setup.py
Inside the pdfid directory are a few more files and the one that’s needed is called pdfid.py which is the actual script.
Before I go any further I need to specify where my PDF eBooks are because the path to the PDF files needs to be entered correctly. To help me find the correct path, I open a file browser like Dolphin or Thunar and navigate to the ebooks directory. The browser will display the correct path which I then use during the next step.
OK, back to the python script. To execute the pdfid.py script, I type this command into the terminal: python pdfid.py /home/myid/Downloads/ebookdir/ebooktitle.pdf
Then I press enter and wait until the terminal displays the result shown below.
PDF Header: %PDF-1.4
/Colors > 2^24 0
What to do with a PDF file that looks suspicious
pdf2ps NameTitleOfBook.pdf – | pdf2pdf – NewNameTitleOfBook.pdf
I then press enter and wait a bit.Depending on how fast a computer crunches numbers, this process can take a few minutes. Once the conversion is finished, we’ll see the new file change from a blank white placeholder.pdf to the actual eBook cover. I then compare the file size again to see if it removed a few megabytes. If yes, then we probably dodged a bullet. Now it’s time to open the new file and make sure that everything looks right and if it does, I delete the old original PDF eBook file.
I do this to all of my eBooks and delete the originals. For extra security, I quickly reboot my computer and read the virus-free eBooks.
A word of warning
Live booting from a USB goes a long way while reading PDF files or, if you have VirtualBox installed, then a throw-away Linux install that serves the purpose of “looking/testing something” will do as well. You have been warned.
Last but not least remember that privacy is dead and has been for many years. Got questions or comments? Fire away.
I wish that I knew this 5 years ago. Had lots of magazines but deleted them.
You did the right thing. Thanks for commenting. 🙂
I assume the command after the pipe should be ps2pdf instead of pdf2pdf ?