Last modified: Nov 25, 2025 By Alexander Williams
Compare Spreadsheets Row by Row Python pyexcel
Data comparison is a common task. You often need to compare Excel files. This helps find differences between versions. It also helps validate data changes.
Python's pyexcel library makes this easy. It provides simple tools for spreadsheet operations. You can compare data row by row efficiently.
Installing pyexcel and Dependencies
First, install the required packages. Use pip to install pyexcel and its plugins. These handle Excel file formats.
# Install pyexcel and Excel support
pip install pyexcel pyexcel-xlsx pyexcel-xls
These packages provide Excel file support. They let you read and write spreadsheet data. The installation is quick and straightforward.
Loading Spreadsheet Data
Start by loading your Excel files. Use pyexcel.get_array() function. It reads data into Python lists.
import pyexcel
# Load first spreadsheet
file1_data = pyexcel.get_array(file_name="spreadsheet1.xlsx")
# Load second spreadsheet
file2_data = pyexcel.get_array(file_name="spreadsheet2.xlsx")
print(f"File 1 rows: {len(file1_data)}")
print(f"File 2 rows: {len(file2_data)}")
File 1 rows: 5
File 2 rows: 5
This code loads both spreadsheets into memory. It shows the row count for each file. This confirms successful loading.
For more complex data loading scenarios, you might want to load Excel files into pandas with Python pyexcel for additional analysis capabilities.
Basic Row Comparison Function
Create a function to compare rows. It should identify differences between corresponding rows. This includes added, removed, or changed data.
def compare_rows(row1, row2, row_number):
"""Compare two rows and return differences"""
differences = []
# Check if rows have same length
if len(row1) != len(row2):
differences.append(f"Row {row_number}: Column count mismatch")
return differences
# Compare each cell in the row
for col_index, (cell1, cell2) in enumerate(zip(row1, row2)):
if cell1 != cell2:
differences.append(
f"Row {row_number}, Column {col_index + 1}: "
f"'{cell1}' vs '{cell2}'"
)
return differences
This function compares each cell in two rows. It returns a list of differences. The output shows exactly where values differ.
Complete Spreadsheet Comparison
Now implement the full comparison logic. Handle different row counts and compare all data. This gives a complete difference report.
def compare_spreadsheets(data1, data2):
"""Compare two spreadsheets and return all differences"""
all_differences = []
max_rows = max(len(data1), len(data2))
for row_index in range(max_rows):
# Handle different row counts
if row_index >= len(data1):
all_differences.append(f"Row {row_index + 1}: Added in second file")
continue
if row_index >= len(data2):
all_differences.append(f"Row {row_index + 1}: Removed in second file")
continue
# Compare current row
row1 = data1[row_index]
row2 = data2[row_index]
row_differences = compare_rows(row1, row2, row_index + 1)
all_differences.extend(row_differences)
return all_differences
This function handles files with different row counts. It identifies added and removed rows. It also finds cell-level differences.
Practical Example with Sample Data
Let's test with sample Excel files. Create two spreadsheets with slight differences. Then run the comparison to see results.
# Sample data for testing
sample_data1 = [
["Name", "Age", "City"],
["Alice", 25, "New York"],
["Bob", 30, "Boston"],
["Charlie", 35, "Chicago"]
]
sample_data2 = [
["Name", "Age", "City"],
["Alice", 26, "New York"], # Age changed
["Bob", 30, "Boston"],
["David", 40, "Seattle"] # New row
]
# Compare the sample data
differences = compare_spreadsheets(sample_data1, sample_data2)
print("Comparison Results:")
for diff in differences:
print(f"- {diff}")
Comparison Results:
- Row 2, Column 2: '25' vs '26'
- Row 4: Added in second file
The comparison found two differences. Alice's age changed from 25 to 26. A new row for David was added in the second file.
Handling Large Spreadsheets
For large files, consider performance optimizations. Process data in chunks if memory is limited. This prevents system slowdowns.
You can also batch process Excel files with Python pyexcel when working with multiple comparisons or large datasets.
def efficient_comparison(file1_path, file2_path, chunk_size=1000):
"""Compare large spreadsheets in chunks"""
differences = []
# Process files in chunks
for chunk_num, (chunk1, chunk2) in enumerate(
zip(pyexcel.get_sheet(file_name=file1_path, start_row=0),
pyexcel.get_sheet(file_name=file2_path, start_row=0))
):
chunk_differences = compare_spreadsheets(
chunk1.to_array(),
chunk2.to_array()
)
differences.extend(chunk_differences)
return differences
This approach processes data in manageable chunks. It reduces memory usage for large files. The comparison remains accurate.
Exporting Comparison Results
Save comparison results to a new spreadsheet. This creates an audit trail. It helps document the differences found.
def save_comparison_report(differences, output_file):
"""Save differences to a new Excel file"""
report_data = [["Row", "Column", "Difference Description"]]
for diff in differences:
# Parse difference description
if "Column" in diff:
parts = diff.split(":")
location = parts[0].strip()
description = parts[1].strip()
report_data.append([location, "", description])
else:
report_data.append([diff, "", "Row operation"])
pyexcel.save_as(array=report_data, dest_file_name=output_file)
print(f"Comparison report saved to {output_file}")
# Save our sample comparison results
save_comparison_report(differences, "comparison_report.xlsx")
Comparison report saved to comparison_report.xlsx
The report file shows all differences clearly. Each row documents one difference. This format is easy to review and share.
For more advanced reporting needs, you can generate Excel reports from JSON data using Python pyexcel to create comprehensive documentation.
Error Handling and Validation
Add proper error handling to your comparison script. This makes it more robust. It handles unexpected situations gracefully.
def safe_spreadsheet_comparison(file1, file2):
"""Compare spreadsheets with error handling"""
try:
data1 = pyexcel.get_array(file_name=file1)
data2 = pyexcel.get_array(file_name=file2)
if not data1 or not data2:
return ["Error: One or both files are empty"]
return compare_spreadsheets(data1, data2)
except FileNotFoundError as e:
return [f"Error: File not found - {e}"]
except Exception as e:
return [f"Error during comparison - {e}"]
This version includes basic error checking. It handles missing files and empty data. The function returns helpful error messages.
Conclusion
Comparing spreadsheets row by row is straightforward with pyexcel. The library provides simple data loading and manipulation. You can identify differences quickly and accurately.
The techniques shown work for various comparison needs. They handle different file sizes and data types. The approach is flexible and extensible.
Remember to always test your comparison logic with sample data first. This ensures it works as expected before processing important files.
Python pyexcel makes spreadsheet comparison accessible to all skill levels. With these examples, you can start comparing your Excel files today.