Last modified: Nov 19, 2025 By Alexander Williams

Integrate Python xlrd with pandas for Data Analysis

Data analysis often starts with Excel files. Many businesses use spreadsheets for data storage. Python offers powerful tools for working with this data. Two key libraries are xlrd and pandas.

xlrd specializes in reading Excel files. pandas provides data manipulation capabilities. Combining them creates a powerful workflow. This integration helps analysts process Excel data efficiently.

Understanding xlrd and pandas

xlrd is a Python library for reading Excel files. It supports older .xls format files. xlrd extracts data from worksheets and cells. It handles dates, numbers, and text values.

pandas is a data analysis library. It provides DataFrame objects for structured data. pandas has built-in Excel reading capabilities. But xlrd offers more control for complex files.

You might need to install xlrd and xlwt in Python first. Proper installation ensures both libraries work together seamlessly.

Setting Up Your Environment

First, install the required libraries. Use pip for installation. The commands below install both packages.


pip install xlrd pandas

Verify the installation by importing them. No errors should appear. This confirms successful installation.


import xlrd
import pandas as pd
print("Libraries imported successfully")

Libraries imported successfully

Reading Excel Files with xlrd

xlrd opens Excel workbooks directly. The open_workbook function loads files. You can then access sheets and cells.

This approach works well for read large Excel files efficiently with Python xlrd. xlrd provides memory-efficient reading of large spreadsheets.


# Open an Excel workbook
workbook = xlrd.open_workbook('sales_data.xls')

# Get sheet names
sheet_names = workbook.sheet_names()
print("Sheet names:", sheet_names)

# Access first sheet
sheet = workbook.sheet_by_index(0)

# Read cell value
cell_value = sheet.cell_value(0, 0)
print("First cell value:", cell_value)

Sheet names: ['Sales', 'Customers', 'Products']
First cell value: 'Sales Report'

Converting xlrd Data to pandas DataFrame

After reading data with xlrd, convert it to pandas. This enables powerful data analysis. The process involves extracting rows and columns.

Create a function to convert sheets. It reads all data into a list. Then pandas creates a DataFrame from this list.


def sheet_to_dataframe(sheet):
    """Convert xlrd sheet to pandas DataFrame"""
    data = []
    for row_idx in range(sheet.nrows):
        row_data = []
        for col_idx in range(sheet.ncols):
            row_data.append(sheet.cell_value(row_idx, col_idx))
        data.append(row_data)
    
    # Use first row as headers
    headers = data[0]
    data_rows = data[1:]
    return pd.DataFrame(data_rows, columns=headers)

# Convert the sheet to DataFrame
df = sheet_to_dataframe(sheet)
print("DataFrame shape:", df.shape)
print(df.head())

DataFrame shape: (99, 5)
   OrderID Customer    Product  Quantity  Price
0      101    John   Laptop        1   1200
1      102    Mary   Mouse         2     25
2      103    Mike   Keyboard     1     75

Handling Different Data Types

Excel files contain various data types. xlrd preserves these types during reading. pandas then converts them appropriately.

Dates require special handling. xlrd represents dates as numbers. You need to convert them to Python dates.

Learn more about handle Excel dates and times with Python xlrd for proper date conversion in your analysis workflows.


def convert_xlrd_date(workbook, cell_value):
    """Convert xlrd date to Python datetime"""
    if cell_value:
        try:
            date_tuple = xlrd.xldate_as_tuple(cell_value, workbook.datemode)
            return datetime(*date_tuple)
        except:
            return cell_value
    return cell_value

# Example with date conversion
date_cell = sheet.cell_value(1, 4)  # Assuming date in column 4
converted_date = convert_xlrd_date(workbook, date_cell)
print("Original value:", date_cell)
print("Converted date:", converted_date)

Original value: 44123.65432
Converted date: 2020-10-15 15:42:13

Working with Multiple Sheets

Excel workbooks often contain multiple sheets. xlrd can iterate through all sheets. Convert each to separate DataFrames.

This approach helps when you need to compare Excel sheets in Python using xlrd for data validation and consistency checks.


# Process all sheets in workbook
dataframes = {}

for sheet_name in workbook.sheet_names():
    sheet = workbook.sheet_by_name(sheet_name)
    df = sheet_to_dataframe(sheet)
    dataframes[sheet_name] = df
    print(f"Sheet '{sheet_name}': {df.shape}")

# Access specific sheet DataFrame
sales_df = dataframes['Sales']
print("\nSales DataFrame columns:", list(sales_df.columns))

Sheet 'Sales': (99, 5)
Sheet 'Customers': (50, 3)
Sheet 'Products': (25, 4)

Sales DataFrame columns: ['OrderID', 'Customer', 'Product', 'Quantity', 'Price']

Advanced Data Processing

Once data is in pandas, you can perform advanced analysis. Calculate statistics, filter data, and create visualizations.

Combine xlrd's precise reading with pandas' analysis power. This creates a robust data processing pipeline.


# Perform analysis on the DataFrame
total_sales = (df['Quantity'] * df['Price']).sum()
average_price = df['Price'].mean()
top_products = df.groupby('Product')['Quantity'].sum().sort_values(ascending=False)

print(f"Total Sales: ${total_sales:,.2f}")
print(f"Average Price: ${average_price:.2f}")
print("\nTop Selling Products:")
print(top_products.head())

Total Sales: $184,525.00
Average Price: $245.67

Top Selling Products:
Product
Laptop     45
Mouse      32
Keyboard   28
Monitor    15
Tablet     12

Error Handling and Best Practices

Always include error handling when working with files. Files might be corrupted or have unexpected formats.

Use try-except blocks around file operations. This prevents crashes and provides helpful error messages.


try:
    workbook = xlrd.open_workbook('data.xls')
    sheet = workbook.sheet_by_index(0)
    df = sheet_to_dataframe(sheet)
    print("File processed successfully")
except FileNotFoundError:
    print("Error: File not found")
except xlrd.XLRDError:
    print("Error: Cannot read Excel file")
except Exception as e:
    print(f"Unexpected error: {e}")

File processed successfully

Memory Management for Large Files

Large Excel files can consume significant memory. xlrd provides options for efficient reading.

Use the on_demand parameter for large workbooks. This loads sheets only when accessed.


# Memory-efficient reading for large files
workbook = xlrd.open_workbook('large_data.xls', on_demand=True)

# Process sheets one by one
for sheet_name in workbook.sheet_names():
    sheet = workbook.sheet_by_name(sheet_name)
    df = sheet_to_dataframe(sheet)
    # Process DataFrame
    print(f"Processed {len(df)} rows from {sheet_name}")
    # Clean up
    workbook.unload_sheet(sheet_name)

Processed 10000 rows from Sheet1
Processed 5000 rows from Sheet2
Processed 7500 rows from Sheet3

Exporting Processed Data

After analysis, you often need to save results. pandas can export DataFrames to various formats.

Common formats include CSV, Excel, and JSON. Choose based on your needs and downstream applications.


# Export to different formats
df.to_csv('processed_sales.csv', index=False)
df.to_excel('analysis_results.xlsx', index=False)
df.to_json('data.json', orient='records')

print("Data exported to multiple formats")

Data exported to multiple formats

Conclusion

Integrating xlrd with pandas creates a powerful Excel data analysis workflow. xlrd provides reliable Excel file reading. pandas offers robust data manipulation capabilities.

This combination handles various Excel formats and structures. It works well for both small and large datasets. The approach provides flexibility and control over data processing.

Start with simple file reading. Progress to complex multi-sheet analysis. Always include error handling for production applications.

The integration empowers Python developers to efficiently process Excel data. It bridges the gap between spreadsheet data and advanced analysis.