Last modified: Nov 19, 2025 By Alexander Williams
Compare Excel Sheets in Python Using xlrd
Excel file comparison is a common task in data analysis. Python's xlrd library makes this process efficient.
This guide shows you how to compare Excel sheets programmatically. You will learn practical techniques for data validation.
Installing xlrd Library
First, ensure xlrd is installed in your Python environment. Use pip for installation.
pip install xlrd
Xlrd supports older Excel formats (.xls). For newer .xlsx files, consider alternative libraries.
If you need installation help, check our guide on installing xlrd and xlwt in Python.
Loading Excel Workbooks
Start by importing xlrd and loading your Excel files. Use the open_workbook function.
import xlrd
# Load both workbooks for comparison
workbook1 = xlrd.open_workbook('file1.xls')
workbook2 = xlrd.open_workbook('file2.xls')
print(f"Workbook 1 sheets: {workbook1.sheet_names()}")
print(f"Workbook 2 sheets: {workbook2.sheet_names()}")
Workbook 1 sheets: ['Sheet1', 'Data']
Workbook 2 sheets: ['Sheet1', 'Reports']
This code loads two Excel files and displays their sheet names. You can now access specific sheets for comparison.
Accessing Specific Sheets
Select the sheets you want to compare. Use sheet_by_name or sheet_by_index methods.
# Access sheets by name
sheet1 = workbook1.sheet_by_name('Sheet1')
sheet2 = workbook2.sheet_by_name('Sheet1')
# Or access by index (first sheet)
# sheet1 = workbook1.sheet_by_index(0)
# sheet2 = workbook2.sheet_by_index(0)
print(f"Sheet1 dimensions: {sheet1.nrows} rows, {sheet1.ncols} columns")
print(f"Sheet2 dimensions: {sheet2.nrows} rows, {sheet2.ncols} columns")
Sheet1 dimensions: 50 rows, 5 columns
Sheet2 dimensions: 50 rows, 5 columns
Check sheet dimensions first. This ensures you're comparing compatible data structures.
Basic Sheet Comparison
Compare two sheets cell by cell. This method works for identical sheet structures.
def compare_sheets_basic(sheet1, sheet2):
differences = []
# Check if dimensions match
if sheet1.nrows != sheet2.nrows or sheet1.ncols != sheet2.ncols:
return ["Sheet dimensions don't match"]
# Compare each cell
for row in range(sheet1.nrows):
for col in range(sheet1.ncols):
cell1 = sheet1.cell_value(row, col)
cell2 = sheet2.cell_value(row, col)
if cell1 != cell2:
differences.append(f"Difference at ({row},{col}): '{cell1}' vs '{cell2}'")
return differences
# Run comparison
diff_results = compare_sheets_basic(sheet1, sheet2)
if diff_results:
print("Differences found:")
for diff in diff_results[:5]: # Show first 5 differences
print(diff)
else:
print("Sheets are identical")
Differences found:
Difference at (2,1): 'Product A' vs 'Product B'
Difference at (5,3): '150' vs '200'
This basic comparison identifies exact cell differences. It's useful for quality control and data validation.
Advanced Comparison with Data Types
Excel stores different data types. Xlrd preserves these types during comparison.
def compare_with_types(sheet1, sheet2):
differences = []
for row in range(min(sheet1.nrows, sheet2.nrows)):
for col in range(min(sheet1.ncols, sheet2.ncols)):
cell1 = sheet1.cell(row, col)
cell2 = sheet2.cell(row, col)
# Compare both value and type
if cell1.value != cell2.value or cell1.ctype != cell2.ctype:
type1 = xlrd.sheet.ctype_text.get(cell1.ctype, 'unknown')
type2 = xlrd.sheet.ctype_text.get(cell2.ctype, 'unknown')
differences.append(
f"Row {row}, Col {col}: "
f"Value '{cell1.value}' ({type1}) vs "
f"'{cell2.value}' ({type2})"
)
return differences
diff_results = compare_with_types(sheet1, sheet2)
for diff in diff_results[:3]:
print(diff)
Row 2, Col 1: Value '25' (number) vs '25.0' (text)
Row 5, Col 3: Value '2023-01-15' (date) vs '2023-01-16' (date)
This advanced method catches type mismatches. The same value with different types can cause issues.
For date handling specifics, see our guide on handling Excel dates and times with Python xlrd.
Comparing Specific Ranges
Sometimes you only need to compare specific data ranges. This saves processing time.
def compare_ranges(sheet1, sheet2, start_row, end_row, start_col, end_col):
differences = []
for row in range(start_row, end_row + 1):
for col in range(start_col, end_col + 1):
try:
val1 = sheet1.cell_value(row, col)
val2 = sheet2.cell_value(row, col)
if val1 != val2:
differences.append(f"({row},{col}): {val1} != {val2}")
except IndexError:
differences.append(f"Cell ({row},{col}) out of range in one sheet")
return differences
# Compare only rows 1-10 and columns 0-2
range_diffs = compare_ranges(sheet1, sheet2, 1, 10, 0, 2)
print(f"Found {len(range_diffs)} differences in specified range")
Found 3 differences in specified range
Range comparison is efficient for large files. Focus only on relevant data sections.
Learn more about reading specific rows and columns with Python xlrd for better performance.
Handling Large Excel Files
Large Excel files require efficient processing. Optimize your comparison code for performance.
def efficient_comparison(sheet1, sheet2, batch_size=1000):
differences = []
total_cells = sheet1.nrows * sheet1.ncols
for row_batch in range(0, sheet1.nrows, batch_size):
end_row = min(row_batch + batch_size, sheet1.nrows)
for row in range(row_batch, end_row):
for col in range(sheet1.ncols):
if sheet1.cell_value(row, col) != sheet2.cell_value(row, col):
differences.append((row, col))
progress = (end_row / sheet1.nrows) * 100
print(f"Progress: {progress:.1f}%")
return differences
# Use smaller batch size for memory management
diffs = efficient_comparison(sheet1, sheet2, batch_size=500)
print(f"Total differences: {len(diffs)}")
Progress: 20.0%
Progress: 40.0%
Progress: 60.0%
Progress: 80.0%
Progress: 100.0%
Total differences: 45
Batch processing prevents memory issues. It's essential for large datasets.
Generating Comparison Reports
Create meaningful reports from your comparison results. This helps with data analysis.
def generate_report(sheet1, sheet2, differences):
report = []
report.append("EXCEL SHEET COMPARISON REPORT")
report.append("=" * 40)
report.append(f"Sheet 1: {sheet1.nrows} rows, {sheet1.ncols} columns")
report.append(f"Sheet 2: {sheet2.nrows} rows, {sheet2.ncols} columns")
report.append(f"Total differences found: {len(differences)}")
report.append("")
report.append("SAMPLE DIFFERENCES:")
report.append("-" * 20)
for i, diff in enumerate(differences[:10]):
if isinstance(diff, tuple):
row, col = diff
val1 = sheet1.cell_value(row, col)
val2 = sheet2.cell_value(row, col)
report.append(f"{i+1}. Cell ({row},{col}): '{val1}' vs '{val2}'")
else:
report.append(f"{i+1}. {diff}")
return "\n".join(report)
# Generate and save report
report_text = generate_report(sheet1, sheet2, diffs)
print(report_text)
# Save to file
with open('comparison_report.txt', 'w') as f:
f.write(report_text)
EXCEL SHEET COMPARISON REPORT
========================================
Sheet 1: 50 rows, 5 columns
Sheet 2: 50 rows, 5 columns
Total differences found: 45
SAMPLE DIFFERENCES:
--------------------
1. Cell (2,1): 'Product A' vs 'Product B'
2. Cell (5,3): '150' vs '200'
Reports document your findings clearly. They're valuable for team communication and audits.
Conclusion
Excel sheet comparison with xlrd is powerful for data validation. You can automate quality checks and find discrepancies efficiently.
The techniques shown work for various scenarios. From basic cell comparisons to advanced type checking.
Remember that xlrd works best with .xls files. For newer Excel formats, consider pandas or openpyxl.
Start with simple comparisons and add complexity as needed. Your data quality will improve significantly.