- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
- Python Advanced Tutorial
- Python - Classes/Objects
- Python - Reg Expressions
- Python - CGI Programming
- Python - Database Access
- Python - Networking
- Python - Sending Email
- Python - Multithreading
- Python - XML Processing
- Python - GUI Programming
- Python - Further Extensions
How to save HTML Tables data to CSV in Python
One of the most challenging taks for a data sceintist is to collect the data. While the fact is, there is plenty of data available in the web it is just extracting the data through automation.
I wanted to extract the basic operations data which is embedded in HTML tables from https://www.tutorialspoint.com/python/python_basic_operators.htm.
Hmmm, The data is scattered in many HTML tables, if there is only one HTML table obviously I can use Copy & Paste to .csv file.
However, if there are more than 5 tables in a single page then obviously it is pain. Isn't it ?
How to do it..
1. I will quickly show you how to create an csv file easily if you want to create a csv file.
import csv # Open File in Write mode , if not found it will create one File = open('test.csv', 'w+') Data = csv.writer(File) # My Header Data.writerow(('Column1', 'Column2', 'Column3')) # Write data for i in range(20): Data.writerow((i, i+1, i+2)) # close my file File.close()
The above code when executed produces a test.csv file with in the same directory as this code.
2. Let us now retrieve an HTML table from https://www.tutorialspoint.com/python/python_dictionary.htm and write it as a CSV file.
First step is to do imports.
import csv from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://www.tutorialspoint.com/python/python_dictionary.htm'
Open the HTML file and store it in html object using urlopen.
html = urlopen(url) soup = BeautifulSoup(html, 'html.parser')
Find the tables inside the html table and Let us bring the tables data. For demonstration purpose I will be extracting only the first table 
table = soup.find_all('table') rows = table.find_all('tr')
[<tr> <th style='text-align:center;width:5%'>Sr.No.</th> <th style='text-align:center;width:95%'>Function with Description</th> </tr>, <tr> <td class='ts'>1</td> <td><a href='/python/dictionary_cmp.htm'>cmp(dict1, dict2)</a> <p>Compares elements of both dict.</p></td> </tr>, <tr> <td class='ts'>2</td> <td><a href='/python/dictionary_len.htm'>len(dict)</a> <p>Gives the total length of the dictionary. This would be equal to the number of items in the dictionary.</p></td> </tr>, <tr> <td class='ts'>3</td> <td><a href='/python/dictionary_str.htm'>str(dict)</a> <p>Produces a printable string representation of a dictionary</p></td> </tr>, <tr> <td class='ts'>4</td> <td><a href='/python/dictionary_type.htm'>type(variable)</a> <p>Returns the type of the passed variable. If passed variable is dictionary, then it would return a dictionary type.</p></td> </tr>]
5. Now we will write the data to csv file.
File = open('my_html_data_to_csv.csv', 'wt+') Data = csv.writer(File) try: for row in rows: FilteredRow =  for cell in row.find_all(['td', 'th']): FilteredRow.append(cell.get_text()) Data.writerow(FilteredRow) finally: File.close()
6. The results are now saved into my_html_data_to_csv.csv file.
We will put everything explained above together.
import csv from urllib.request import urlopen from bs4 import BeautifulSoup # set the url.. url = 'https://www.tutorialspoint.com/python/python_basic_syntax.htm' # Open the url and parse the html html = urlopen(url) soup = BeautifulSoup(html, 'html.parser') # extract the first table table = soup.find_all('table') rows = table.find_all('tr') # write the content to the file File = open('my_html_data_to_csv.csv', 'wt+') Data = csv.writer(File) try: for row in rows: FilteredRow =  for cell in row.find_all(['td', 'th']): FilteredRow.append(cell.get_text()) Data.writerow(FilteredRow) finally: File.close()
Table in the html page.
- Related Articles
- How to save a Python Dictionary to CSV file?
- How to save a vector in R as CSV file?
- How to Parse HTML pages to fetch HTML tables with Python?
- How to save a csv and read using fread in R?
- How to save a matrix as CSV file using R?
- How to plot CSV data using Matplotlib and Pandas in Python?
- How to create tables in HTML?
- How to write data to .csv file in Java?
- Python Tkinter – How to export data from Entry Fields to a CSV file?
- How to save canvas data to file in HTML5?
- How to save data using sqlite in android?
- How to Convert Excel to CSV in Python
- How to save HTML5 canvas data to file?
- How to save multiple plots into a single HTML file in Python Plotly?
- How to read data from .csv file in Java?