How to convert CSV File to PDF File using Python?


In today's world, data is generated at an unprecedented rate, and being able to effectively manage and present it is essential. CSV files are commonly used to store and transfer data between systems, but sometimes it is necessary to convert this data into a more readable format such as PDF.

Python, with its vast array of libraries, provides an easy and efficient way to convert CSV files to PDF files.

In this article, we will explore the steps involved in converting a CSV file to a PDF file using Python, and provide a sample code that you can use to get started. Whether you are a data analyst, scientist, or just looking to improve your data presentation skills, this guide will help you convert your CSV files into PDF files with ease.

Below is the approach mentioned that we will be taking to convert a CSV file to a PDF file in Python.

  • Converting CSV file to HTML using Pandas − Pandas is a powerful Python library used for data manipulation and analysis. It provides a method called "read_csv" that allows us to read the contents of a CSV file into a pandas dataframe. We can then use the "to_html" method to convert this dataframe into an HTML table.

  • Converting HTML file to PDF using PDFkit Python API − PDFkit is a Python wrapper for the wkhtmltopdf command line utility. It allows us to convert HTML files to PDF files by invoking the wkhtmltopdf utility from within our Python script.

Let's say that we have a CSV file named inputs.csv which contains the following data inside it.

inputs.csv

Name, Age, Occupation
John, 32, Engineer
Jane, 28, Teacher
Bob, 45, Salesperson

Converting CSV file to HTML

Now let's focus on the first step where we will take the above CSV file as input and then convert it to an HTML file.

Example

Consider the code shown below.

main.py

# Import the pandas library
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('inputs.csv')

# Convert the dataframe to an HTML table
html_table = df.to_html()

# Print the HTML table to the console
print(html_table)

Explanation

  • import pandas as pd − This imports the Pandas library and gives it an alias "pd" for easier use later in the code.

  • df = pd.read_csv('inputs.csv') − This reads the contents of the "inputs.csv" file into a pandas dataframe called "df". The contents of the file are assumed to be separated by commas, which is the default separator for the "read_csv" method.

  • html_table = df.to_html() − This converts the pandas dataframe "df" into an HTML table and assigns the resulting HTML code to the variable "html_table". By default, this method includes the index column of the dataframe as the first column of the HTML table.

  • print(html_table) − This prints the HTML table to the console. Alternatively, you could save the HTML table to a file using the "write" method of the "open" function.

To run the above code we first need to install the pandas library in our machine, and for that we can make use of the command shown below.

Output

The output of the above command is shown below.

<table border="1" class="dataframe">
   <thead>
      <tr style="text-align: right;">
         <th></th>
         <th>Name</th>
         <th>Age</th>
         <th>Occupation</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <th>0</th>
         <td>John</td>
         <td>32</td>
         <td>Engineer</td>
      </tr>
      <tr>
         <th>1</th>
         <td>Jane</td>
         <td>28</td>
         <td>Teacher</td>
      </tr>
      <tr>
         <th>2</th>
         <td>Bob</td>
         <td>45</td>
         <td>Salesperson</td>
      </tr>
   </tbody>
</table>

Converting the HTML to PDF

In order for us to be able to create PDF from the CSV after converting it to HTML, we first need to have wkhtmltopdf installed on our system, and for that we can visit the URL shown below.

https://wkhtmltopdf.org/downloads.html

From the above url, we can download the specific version of wkhtmltopdf executable on our system.

Once it is installed, we can run the code shown below.

main.py

import pdfkit
import pandas as pd

df = pd.read_csv('inputs.csv')
html_table = df.to_html()

options = {    'page-size': 'Letter',
   'margin-top': '0mm',
   'margin-right': '0mm',
   'margin-bottom': '0mm',
   'margin-left': '0mm'
}

pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')
pdfkit.from_string(html_table, 'outputs.pdf', options=options)

Note − Please note that on my machine, the wkhtmltopdf was installed on the path: /usr/local/bin/wkhtmltopdf hence I passed that path, it may vary for you.

To run the above code, we first need to install the pdfkit library in our machine, and for that we can make use of the command shown below.

pip3 install pdfkit

Once pdfkit is installed successfully, we can run the command shown below

python3 main.py

Once we run the above command in the terminal, a new file named outputs.pdf will be created in the same folder.

Below is a screenshot attached of the "outputs.pdf" file.

Conclusion

In conclusion, converting CSV files to PDFs using Python can be done using the pandas and pdfkit libraries.

First, the CSV file is converted to an HTML table using pandas, and then the pdfkit library is used to convert the HTML to PDF. With this approach, it is easy to generate nicely formatted and printable PDF documents from CSV data.

Updated on: 18-Apr-2023

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements