

The purpose of parsing is to free the data from its original format and use it for something useful. You build or use a parser to accomplish something else. Parsing is a process that can be interesting in itself, but it is rarely the end objective of a software. In the companion repository you will find the basic script we created for this article, together with the example PDFs we used. Thanks to competent and knowledgeable sysadmins you will be able to reliably extract tables from textual PDF, but you will get mediocre results, at best, with PDF made of images. We are going to see that you do not need developers for this process, but sysadmins. This way you can easily work with the data: you can process it, analyze it, and use it to take decisions. In this article we are going to see how to extract tables trapped into PDF files and put them in Excel files.

The code for this article is on GitHub: PDFToExcel
