Automating Microsoft Word Documents With Python-DOCX

avatar

docx.png

Automating simple repetitive tasks is just awesome. It saves a lot of time and time is one of the most precious things we have. We all perform different kinds of repetitive work either in professional or personal lives. Thanks to technology and programming tools, we are able to let the computers do the heavy lifting. I have written in the past about automating stocks and crypto market data, extracting data from pdf files, building excel spreadsheets, etc. Until recently, I haven't tried and didn't have a need for automating Microsoft Word documents. I was sure there are easy to use solutions available for top programming languages, especially for python. When I was received a request to automate MS Word forms, I didn't have to think twice and was glad for an opportunity to experiment with automating documents.

Python-docx is one of the python libraries that allows us to create and edit Microsoft Word files. It's documentation let's us get started and experiment with docx super quick. However, it lacks more detailed explanation of solving more complex problems. Python community is big, and there is plenty of resources available that help finding the right solutions.

The script/app I was writing had a very simple goal. To create a template of an existing document, let the user enter some of the data within an app and create a final document with proper naming. Since most of the data entered would be either in unchanging list, and options were limited, and only some of the items would have to change on daily basis, it did make sense to automate this process. This would save hours and decrease the time spent to seconds. It is interesting how many companies and organizations have so much bureaucracy involved and don't offer more efficient solutions to manage such processes.

To get started with python-docx is super simple. The documentation provides the following template code, which was copying and used in many tutorials on library. I too will share the code, just to demonstrate how easy it is to automate Microsoft Word documents.


from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')

The code self explanatory and once the document is created, by viewing the document we can see which line of code is create what paragraphs or parts of the document. I normally prefer to share my own code in posts like this. However, since the script I created had to do with a specific task and had to be run as app, it would be able to show how easy it is to use docx. The sample code above creates a documents and starts adding part of the document like the heading, paragraph, parts of the paragraph, unordered and ordered lists, tables, images, and applying styles.

When it comes to styles, fonts, colors, and layouts docx has multiple ways of achieving them. It is not clear right away what the standard or best practices are. In my situation, I had to experiment with different solutions to figure out a better way. While some actions are self explanatory and answers are available in the documentation, more specific situations are not easy to find solutions for. For example, in my document I had to create borders around some paragraphs. I wasn't able to find a solution in the documentation. I found the solution elsewhere, where someone who had a similar issues and shared their solution. But it didn't involve using simple methods and properties. Rather it involved lower level usage of document formatting. That part perhaps would require more studying and experimenting to take advantage of. But it also shows how powers this tool is, and very complex document task solutions can be found with enough time and effort.

I prefer integrating python scripts with Streamlit. This way scripts can be turned into apps and can easily be used by non-programmers. Streamlit apps are great to share automating solutions with teams, colleagues, friends, and clients. It gets the job done and doesn't require much of web development skills.

Have you used python-docx or other automating tools? Feel free to share your thoughts, experiences, and tools in the comments.



0
0
0.000
17 comments
avatar

I'm not a Pyhton interesting buy this piece of code is quite interesting

0
0
0.000
avatar

I didn't know about Python.. but after reading this, it caught my interest. I need to learn more about it.

0
0
0.000
avatar

My company have been trying to turn some repetetive work processes into robotic and they are very succesful till now and they remove some burdens/lots of work from workers. They may have used python for this transition.

0
0
0.000
avatar

Very cool, and good for populating templates with individual data. I do have to write up documents which is tedious as hell.., with my new found skills in Python I will likely use this library in my new role if they insist on documents (which nobody ever reads).

0
0
0.000
avatar

I heard python is one of the easiest to learn. There is many apps that can assist you to coding, It's so interesting !

0
0
0.000
avatar

The biggest advantage of technology is that it is saving us time. And if we use this time in the right place, we can achieve a lot of success. so in Pakistan is getting a lot of use now every small and big businessman has set this computer in the shop. Its biggest advantage is that you know within minutes what is short and what is high in your store. We will also use this tool as it is an automatic tool and take advantage of it. Thanks for sharing this new invention with us.

0
0
0.000
avatar

This is really useful. I haven't seen this library before but am sure I can find a use for it now. Thanks for sharing.

0
0
0.000
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

automation is really great.

Yes PLS!

0
0
0.000
avatar

This is awesome and a welcome development. Thank you for this update.

0
0
0.000
avatar

Excellent! I would like to learn some Phyton!

0
0
0.000