
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
How to convert PDF files to Excel files using Python?
Python has a large set of libraries for handling different types of operations. Through this article, we will see how to convert a pdf file to an Excel file. There are various packages are available in python to convert pdf to CSV but we will use the Tabula-py module. The major part of tabula-py is written in Java that reads the pdf document and converts the python DataFrame into a JSON object.
In order to work with tabula-py, we must have java preinstalled in our system. Now, to convert the pdf file to csv we will follow the steps-
First, install the required package by typing pip install tabula-py in the command shell.
Now read the file using read_pdf("file location", pages=number) function. This will return the DataFrame.
Convert the DataFrame into an Excel file using tabula.convert_into(‘pdf-filename’, ‘name_this_file.csv’,output_format= "csv", pages= "all"). It generally exports the pdf file into an excel file
Example
In this example, we have used IPL Match Schedule Document to convert it into an excel file.
# Import the required Module import tabula # Read a PDF File df = tabula.read_pdf("IPLmatch.pdf", pages='all')[0] # convert PDF into CSV tabula.convert_into("IPLmatch.pdf", "iplmatch.csv", output_format="csv", pages='all') print(df)
Output
Running the above code will convert the pdf file into an excel (csv) file.
- Related Articles
- How to Convert Multiple Workbooks or Worksheets to PDF Files at Once in Excel?
- How to Crack PDF Files in Python?
- How to Convert Multiple XLS Files to XLSX Files in Excel?
- How to download all pdf files with selenium python?
- How to Merge PDF Files in Bash?
- Working with PDF files in Python?
- How to convert HTML to PDF using Python
- How to remove swap files using Python?
- How to create powerpoint files using Python
- How to search contents of multiple pdf files on Linux?
- How to process excel files data in chunks with Python?
- Python - How to Merge all excel files in a folder
- Convert PDF to CSV using Python
- How to rename multiple files recursively using Python?
- Creating a Dataframe using Excel files
