How to Convert HTML to Markdown in Python?


Markdown is a lightweight markup language that allows you to write formatted text that can be easily read and understood on the web. On the other hand, HTML is a markup language used to structure and display content on the web. Converting HTML text to Markdown can be useful in situations where you want to simplify the content or make it more readable.

One way to convert HTML to Markdown is by using the markdownify package in Python. This package provides a simple and efficient way to convert HTML text to Markdown format. To begin the conversion process, you need to download and install the markdownify package in your Python environment. Once installed, you can import the package and use its functions to convert the HTML text to Markdown.

In this article, we will provide step-by-step instructions on how to download and install the markdownify package in Python, and demonstrate how to use its functions to convert HTML to Markdown. By the end of this article, you will have a clear understanding of how to convert HTML to Markdown using Python and markdownify.

Installation

Python does not have this module pre-installed, therefore you need to install it separately. To install the module, open the terminal and enter the following command

pip3 install markdownify

The approach to convert HTML text to Markdown using Python involves several steps, as outlined below −

  • Import module − The first step is to import the markdownify module into your Python script. This module provides a set of functions that can be used to convert HTML to Markdown.

  • Create HTML text − Next, you need to create the HTML text that you want to convert to Markdown. You can either input this text manually or read it from a file or a web page using Python libraries such as requests.

  • Use markdownify() function and pass the text to it − Once you have the HTML text, you can use the markdownify() function provided by the markdownify module to convert it to Markdown. This function takes the HTML text as input and returns the equivalent Markdown text.

  • Display markdowned text − Finally, you can display the Markdown text in the console or write it to a file using Python's built-in functions.

Overall, this approach involves importing the necessary module, creating the HTML text to be converted, passing it to the markdownify() function to obtain the equivalent Markdown text, and then displaying or writing the output. This process can be useful in situations where you want to convert HTML content to Markdown for easy reading and formatting.

Example 1: Converting HTML to Markdown

Now let's focus on the code where we will convert a simple html to markdown.

Consider the code shown below. In this code, we first import the markdownify module. Then, we create some sample HTML text to be converted to Markdown. In this case, we have a simple HTML heading and paragraph.

Next, we use the markdownify() function to convert the HTML text to Markdown format. This function takes the HTML text as input and returns the equivalent Markdown text.

Example

Finally, we display the converted Markdown text using the print() function. The output will be the equivalent Markdown text for the original HTML input

main.py

# Import markdownify module
import markdownify

# Create HTML text to be converted
html_text = "<h1>My HTML Title</h1><p>This is some sample HTML text.</p>"

# Use markdownify() function to convert HTML to Markdown
markdown_text = markdownify.markdownify(html_text)

# Display the converted Markdown text
print(markdown_text)

Output

On execution, we will get the following output:

# Import markdownify module import markdownify # Create HTML text to be converted html_text = "
My HTML Title
This is some sample HTML text.

" # Use markdownify() function to convert HTML to Markdown markdown_text = markdownify.markdownify(html_text) # Display the converted Markdown text print(markdown_text)

Example 2

Let's explore one more example with slightly complex HTML code. Consider the code shown below.

main.py

# Import markdownify module
import markdownify

# Create complex HTML text to be converted
html_text = """
<div class="article">
   <h1>My HTML Title</h1>
   <p>This is some sample HTML text.</p>
   <ul>
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
   </ul>
   <a href="https://www.tutorialspoint.com">Link to TutorialsPoint</a>
</div>
"""
# Use markdownify() function to convert HTML to Markdown
markdown_text = markdownify.markdownify(html_text)

# Display the converted Markdown text
print(markdown_text)

Output

On execution, we will get the following output.

# Import markdownify module import markdownify # Create complex HTML text to be converted html_text = """
My HTML Title
This is some sample HTML text.

Item 1
Item 2
Item 3
Link to TutorialsPoint
""" # Use markdownify() function to convert HTML to Markdown markdown_text = markdownify.markdownify(html_text) # Display the converted Markdown text print(markdown_text)

Conclusion

In conclusion, converting HTML to Markdown using Python can be a useful way to format and display content on the web. The markdownify module provides a simple and efficient solution for this task, allowing you to easily convert HTML text to Markdown format.

Updated on: 18-Apr-2023

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements