Last modified: Nov 09, 2025 By Alexander Williams
Merge docx Files in Python Using python-docx
Working with Word documents is common in business and automation. You often need to combine multiple docx files into one. Python makes this easy.
The python-docx library provides powerful tools for document manipulation. It allows you to read, create, and modify Word documents programmatically.
Installing python-docx Library
First, you need to install the python-docx library. Use pip for installation. The process is straightforward and quick.
pip install python-docx
This command downloads and installs the latest version. The library has no major dependencies. It works on Windows, Mac, and Linux.
Basic Approach to Merge docx Files
Merging docx files involves reading content from source documents. Then you add that content to a target document. The process preserves formatting.
You need to handle different document elements separately. This includes paragraphs, tables, and sections. Each requires specific handling methods.
Our approach will use the Document class from python-docx. We'll create a new document. Then we'll append content from each source file.
Complete Merge Function Code
Here is a complete function to merge multiple docx files. It handles all document elements properly. The code is well-commented for understanding.
from docx import Document
def merge_docx_files(output_path, input_paths):
"""
Merge multiple docx files into a single document
Args:
output_path (str): Path for the merged output file
input_paths (list): List of input file paths to merge
"""
# Create a new document
merged_doc = Document()
for i, file_path in enumerate(input_paths):
# Open each source document
source_doc = Document(file_path)
# Add all paragraphs from source document
for paragraph in source_doc.paragraphs:
new_paragraph = merged_doc.add_paragraph()
# Copy paragraph text and style
new_paragraph.text = paragraph.text
if paragraph.style:
new_paragraph.style = paragraph.style
# Add all tables from source document
for table in source_doc.tables:
# Create new table with same dimensions
new_table = merged_doc.add_table(
rows=len(table.rows),
cols=len(table.columns)
)
# Copy table content cell by cell
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
new_table.cell(i, j).text = cell.text
# Add page break between documents (except after last one)
if i < len(input_paths) - 1:
merged_doc.add_page_break()
# Save the merged document
merged_doc.save(output_path)
print(f"Merged {len(input_paths)} files into {output_path}")
Using the Merge Function
Now let's see how to use our merge function. Create a list of file paths to merge. Then call the function with output path.
# List of docx files to merge
files_to_merge = [
"document1.docx",
"document2.docx",
"document3.docx"
]
# Merge files and save as combined.docx
merge_docx_files("combined.docx", files_to_merge)
Merged 3 files into combined.docx
The function creates a new file called combined.docx. It contains all content from the three source files. Page breaks separate each document.
Handling Complex Document Elements
Real-world documents often contain complex elements. These include images, headers, footers, and styled text. Our basic function needs enhancements.
For advanced formatting, you might need our Python-docx Text Styling Guide. It covers text formatting in detail.
Headers and footers require special handling. They don't copy automatically with basic methods. You need to manually recreate them.
Enhanced Merge Function with Formatting
Here's an improved version that handles more document elements. It preserves basic formatting and styles better.
def merge_docx_advanced(output_path, input_paths):
"""
Enhanced merge function with better formatting preservation
"""
merged_doc = Document()
for i, file_path in enumerate(input_paths):
source_doc = Document(file_path)
# Copy paragraphs with runs (text with formatting)
for paragraph in source_doc.paragraphs:
new_para = merged_doc.add_paragraph()
# Copy each run to preserve formatting
for run in paragraph.runs:
new_run = new_para.add_run(run.text)
# Copy run formatting
new_run.bold = run.bold
new_run.italic = run.italic
new_run.underline = run.underline
if run.font.size:
new_run.font.size = run.font.size
# Copy tables with basic structure
for table in source_doc.tables:
copy_table(merged_doc, table)
# Add section break between documents
if i < len(input_paths) - 1:
merged_doc.add_section()
merged_doc.save(output_path)
def copy_table(target_doc, source_table):
"""Helper function to copy tables between documents"""
new_table = target_doc.add_table(
rows=len(source_table.rows),
cols=len(source_table.columns)
)
for i, row in enumerate(source_table.rows):
for j, cell in enumerate(row.cells):
new_table.cell(i, j).text = cell.text
Page Setup and Layout Considerations
When merging documents, page layout is important. Different source files might have different page setups. You need to standardize these.
Our Python-docx Page Setup guide covers layout customization. It helps you set consistent margins and orientation.
Page breaks control document flow. Use them strategically between merged sections. This keeps content organized and readable.
Batch Processing Multiple Files
For large-scale operations, batch processing is essential. You can automate merging dozens or hundreds of files. This saves significant time.
Check our Batch Generate docx Files in Python tutorial. It shows efficient batch processing techniques.
Use Python's file handling to automatically find docx files. Then process them in batches. This approach scales well for large projects.
Common Issues and Solutions
File not found errors are common. Always check if files exist before processing. Use try-except blocks for error handling.
Memory issues can occur with very large documents. Process files in smaller batches. Or use streaming approaches for huge files.
Formatting inconsistencies might appear. Test with sample files first. Adjust the code based on your specific document structure.
Best Practices for docx Merging
Always backup original files before processing. This prevents data loss if something goes wrong during merging.
Test with small files first. Verify the output looks correct. Then scale up to larger production files.
Use descriptive output filenames. Include timestamps or version numbers. This helps track different merged versions.
Conclusion
Merging docx files with python-docx is powerful and flexible. The library provides all necessary tools. You can automate document combination tasks.
The basic approach works for simple documents. Enhanced versions handle complex formatting. Choose the right approach for your needs.
Python makes document automation accessible. With python-docx, you can streamline Word document workflows. This saves time and reduces manual errors.
Start with the basic merge function. Then enhance it based on your specific requirements. The possibilities for document automation are extensive.