Last modified: Nov 24, 2025 By Alexander Williams
Filter Search Spreadsheet Rows Python pyexcel
Working with spreadsheet data is common in data analysis. Python's pyexcel library makes this task easy. It provides simple tools for data manipulation.
This guide focuses on filtering and searching rows. You will learn practical techniques. These skills are essential for efficient data processing.
Installing pyexcel
First, ensure pyexcel is installed. Use pip for installation. This command installs the latest version.
pip install pyexcel pyexcel-xlsx
The second package adds Excel support. You might need other plugins for different formats. Check our format handling guide for details.
Reading Spreadsheet Data
Start by reading your spreadsheet file. Pyexcel supports various formats. Use the get_array function to load data.
import pyexcel as pe
# Read data from Excel file
data = pe.get_array(file_name="employees.xlsx")
print("Original Data:")
for row in data:
print(row)
Original Data:
['Name', 'Department', 'Salary', 'Join Date']
['John Doe', 'IT', 75000, '2020-03-15']
['Jane Smith', 'HR', 65000, '2019-07-22']
['Bob Johnson', 'IT', 80000, '2018-11-30']
['Alice Brown', 'Finance', 70000, '2021-01-10']
Basic Row Filtering
Filtering helps extract specific rows. Use list comprehension for simple filters. This method is Pythonic and efficient.
# Filter IT department employees
it_employees = [row for row in data if row[1] == 'IT']
print("\nIT Department Employees:")
for employee in it_employees:
print(employee)
IT Department Employees:
['John Doe', 'IT', 75000, '2020-03-15']
['Bob Johnson', 'IT', 80000, '2018-11-30']
Advanced Filtering with Multiple Conditions
Real-world scenarios often need multiple conditions. Combine criteria using logical operators. This provides precise control.
# Filter high-salary IT employees
high_salary_it = [row for row in data
if row[1] == 'IT' and row[2] > 70000]
print("\nHigh Salary IT Employees:")
for employee in high_salary_it:
print(employee)
High Salary IT Employees:
['Bob Johnson', 'IT', 80000, '2018-11-30']
Searching for Specific Values
Searching finds rows containing specific values. This is useful for text-based searches. Use the in operator for flexibility.
# Search for employees with 'John' in name
john_employees = [row for row in data
if 'John' in row[0]]
print("\nEmployees with 'John' in name:")
for employee in john_employees:
print(employee)
Employees with 'John' in name:
['John Doe', 'IT', 75000, '2020-03-15']
['Bob Johnson', 'IT', 80000, '2018-11-30']
Working with Column Headers
Column headers make code more readable. Use dictionaries instead of numeric indices. This approach is more maintainable.
# Get data as dictionary
data_dict = pe.get_dict(file_name="employees.xlsx")
print("Data as dictionary:")
print(data_dict)
# Access specific column
names = data_dict['Name']
print(f"\nEmployee names: {names}")
Data as dictionary:
{'Name': ['John Doe', 'Jane Smith', 'Bob Johnson', 'Alice Brown'], 'Department': ['IT', 'HR', 'IT', 'Finance'], 'Salary': [75000, 65000, 80000, 70000], 'Join Date': ['2020-03-15', '2019-07-22', '2018-11-30', '2021-01-10']}
Employee names: ['John Doe', 'Jane Smith', 'Bob Johnson', 'Alice Brown']
Filtering with pandas-like Syntax
Pyexcel can work with pandas-like syntax. This is familiar to many Python users. Use the get_records function.
# Get data as records
records = pe.get_records(file_name="employees.xlsx")
# Filter using record syntax
finance_employees = [record for record in records
if record['Department'] == 'Finance']
print("\nFinance Department Employees:")
for employee in finance_employees:
print(employee)
Finance Department Employees:
{'Name': 'Alice Brown', 'Department': 'Finance', 'Salary': 70000, 'Join Date': '2021-01-10'}
Saving Filtered Results
After filtering, you often need to save results. Pyexcel makes this straightforward. Use the save_as function.
# Save filtered IT employees to new file
pe.save_as(array=it_employees, dest_file_name="it_employees.xlsx")
print("Filtered data saved to it_employees.xlsx")
This creates a new Excel file. It contains only the filtered rows. You can use various output formats.
Handling Large Datasets
Large datasets need efficient processing. Pyexcel handles this well. Use the iget_records function for memory efficiency.
# Process large files efficiently
for record in pe.iget_records(file_name="large_dataset.xlsx"):
if record['Department'] == 'IT' and record['Salary'] > 50000:
print(f"Found: {record['Name']}")
This method processes one record at a time. It uses minimal memory. This is crucial for large files.
Data Cleaning Before Filtering
Clean data ensures accurate filtering. Remove duplicates and handle missing values. Our data cleaning guide covers this in detail.
# Example: Remove empty rows before filtering
clean_data = [row for row in data if any(row)]
print(f"Original rows: {len(data)}")
print(f"Clean rows: {len(clean_data)}")
Combining with Other pyexcel Features
Pyexcel offers many complementary features. You can merge sheets before filtering. Convert between formats after processing.
Check our merge and split guide for advanced operations. These techniques work well together.
Common Issues and Solutions
Beginners might face some challenges. Here are common issues and solutions.
File not found errors: Check file paths carefully. Use absolute paths for reliability.
Encoding problems: Specify encoding when reading files. UTF-8 is usually safe.
Memory errors with large files: Use iterator functions. Process data in chunks.
Best Practices
Follow these best practices for better code.
Always handle exceptions. Use try-except blocks for file operations.
Validate data before processing. Check for expected columns and data types.
Use meaningful variable names. This makes code easier to understand.
Conclusion
Filtering and searching spreadsheet rows is essential. Pyexcel makes this task simple and efficient.
You learned basic and advanced techniques. These skills will save time in data analysis projects.
Remember to clean data first. Combine filtering with other pyexcel features. This creates powerful data processing pipelines.
Start with simple filters. Gradually move to complex conditions. Your data manipulation skills will grow quickly.