Last modified: Nov 19, 2025 By Alexander Williams
Read Large Excel Files Efficiently with Python xlrd
Working with large Excel files can be challenging. Python xlrd offers solutions. This guide shows efficient techniques.
Understanding xlrd for Large Files
Xlrd is a Python library. It reads Excel files (.xls format). It handles large datasets well with proper techniques.
Memory management is crucial for big files. Xlrd loads entire files into memory. This can cause issues with large datasets.
Use xlrd version 1.2.0 or later. Earlier versions have limitations. They support only older Excel formats.
Install xlrd Library
First, install the library. Use pip package manager. The command is simple.
# Install xlrd using pip
pip install xlrd==1.2.0
Successfully installed xlrd-1.2.0
Basic File Loading
Start with loading your Excel file. Use the open_workbook function. It returns a workbook object.
import xlrd
# Open Excel file
workbook = xlrd.open_workbook('large_file.xls')
print(f"Number of sheets: {workbook.nsheets}")
Number of sheets: 3
Memory Efficient Reading
Large files need careful handling. Read only necessary data. Avoid loading everything at once.
Use on_demand=True parameter. This loads sheets only when accessed. It saves memory significantly.
# Memory efficient loading
workbook = xlrd.open_workbook('large_file.xls', on_demand=True)
# Access specific sheet
sheet = workbook.sheet_by_name('Data')
print(f"Sheet loaded: {sheet.name}")
Sheet loaded: Data
Read Specific Rows and Columns
Don't read entire sheets. Target specific data ranges. This reduces memory usage dramatically.
Use row and column indexing. Read only what you need. This is faster and more efficient.
# Read specific rows and columns
def read_data_range(filename, sheet_name, start_row, end_row, columns):
workbook = xlrd.open_workbook(filename, on_demand=True)
sheet = workbook.sheet_by_name(sheet_name)
data = []
for row_idx in range(start_row, min(end_row, sheet.nrows)):
row_data = []
for col_idx in columns:
row_data.append(sheet.cell_value(row_idx, col_idx))
data.append(row_data)
workbook.release_resources()
return data
# Example usage
sample_data = read_data_range('large_file.xls', 'Sales', 1, 100, [0, 2, 4])
print(f"Retrieved {len(sample_data)} rows")
Retrieved 99 rows
Handle Excel Dates and Times
Excel stores dates as numbers. Xlrd converts them properly. Use the xldate_as_datetime function.
This ensures correct date handling. It prevents data corruption. Your analysis will be accurate.
from datetime import datetime
import xlrd
# Convert Excel date to Python datetime
def convert_excel_date(excel_date):
return xlrd.xldate.xldate_as_datetime(excel_date, workbook.datemode)
# Example date conversion
workbook = xlrd.open_workbook('dates_file.xls')
sheet = workbook.sheet_by_index(0)
excel_date = sheet.cell_value(1, 0) # Assuming date in cell A2
python_date = convert_excel_date(excel_date)
print(f"Converted date: {python_date}")
Converted date: 2023-10-15 00:00:00
Filter and Search Data
Process data in chunks. Filter as you read. This prevents memory overload.
Implement search conditions early. Reduce dataset size quickly. Your code will run faster.
# Filter data while reading
def filter_large_data(filename, condition_column, condition_value):
workbook = xlrd.open_workbook(filename, on_demand=True)
sheet = workbook.sheet_by_index(0)
filtered_data = []
for row_idx in range(1, sheet.nrows): # Skip header
if sheet.cell_value(row_idx, condition_column) == condition_value:
row_data = [sheet.cell_value(row_idx, col) for col in range(sheet.ncols)]
filtered_data.append(row_data)
workbook.release_resources()
return filtered_data
# Filter for specific category
high_sales = filter_large_data('sales_data.xls', 3, 'High')
print(f"Found {len(high_sales)} high sales records")
Found 245 high sales records
Performance Optimization Tips
Close workbooks properly. Use release_resources(). This frees memory immediately.
Process data in batches. Don't keep all data in memory. Write processed data to disk quickly.
Use appropriate data types. Convert numbers and dates early. This improves processing speed.
Error Handling
Large files may have issues. Implement proper error handling. Your code should be robust.
# Safe file reading with error handling
def safe_read_large_file(filename):
try:
workbook = xlrd.open_workbook(filename, on_demand=True)
# Processing code here
return workbook
except xlrd.XLRDError as e:
print(f"Excel file error: {e}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
# Usage with error handling
workbook = safe_read_large_file('corrupt_file.xls')
if workbook:
print("File loaded successfully")
else:
print("Failed to load file")
Excel file error: File is encrypted
Failed to load file
Best Practices Summary
Use on_demand=True for large files. This is the most important optimization.
Read only necessary data. Don't process entire sheets if not needed. Be selective with your data reading.
Release resources promptly. Call release_resources() when done. This prevents memory leaks.
Process data in chunks. Break large operations into smaller ones. This maintains performance.
Conclusion
Reading large Excel files efficiently is achievable with xlrd. Use memory optimization techniques.
Focus on selective data reading. Implement proper resource management. Your applications will perform better.
Remember these key points. They will help you handle any large Excel file. Your data processing will be smooth and efficient.