Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Working with .docx module
Word documents contain formatted text wrapped within three object levels. Lowest level − Run objects, Middle level − Paragraph objects and Highest level − Document object.
So, we cannot work with these documents using normal text editors. But we can manipulate these word documents in Python using the python-docx module.
Installation
The first step is to install this third-party module python-docx. You can use pip ?
pip install python-docx
Important: After installation, import docx NOT python-docx. Use docx.Document class to start working with the word document.
Creating a Basic Word Document
Let's create a simple Word document with headings, paragraphs, and formatted text ?
# import docx NOT python-docx
import docx
# create an instance of a word document
doc = docx.Document()
# add a heading of level 0 (largest heading)
doc.add_heading('Heading for the document', 0)
# add a paragraph and store the object in a variable
doc_para = doc.add_paragraph('Your paragraph goes here, ')
# add a run i.e, style like bold, italic, underline, etc.
doc_para.add_run('hey there, bold here').bold = True
doc_para.add_run(', and ')
doc_para.add_run('these words are italic').italic = True
# add a page break to start a new page
doc.add_page_break()
# add a heading of level 2
doc.add_heading('Heading level 2', 2)
# pictures can also be added to our word document
# width is optional
doc.add_picture('path_to_picture')
# now save the document to a location
doc.save('path_to_document')
Working with Existing Documents
You can also open and modify existing Word documents ?
import docx
# open an existing document
doc = docx.Document('existing_document.docx')
# add new content to the existing document
doc.add_paragraph('This is a new paragraph added to existing document.')
# save the modified document
doc.save('modified_document.docx')
Key Components
- Document: The top-level container for all document content
- Paragraph: A block-level element that can contain one or more runs
- Run: A contiguous sequence of characters with the same formatting
Common Methods
-
add_heading(text, level)− Add headings (levels 0-9) -
add_paragraph(text)− Add paragraphs -
add_run(text)− Add formatted text runs -
add_page_break()− Insert page breaks -
add_picture(path)− Insert images
Conclusion
The python-docx module provides a powerful way to create and manipulate Word documents programmatically. Remember to import docx after installing python-docx, and use the Document class as your starting point for all operations.
