Last modified: Nov 10, 2024 By Alexander Williams
Python Guide: Filter CSV Rows Efficiently with Code Examples
Working with CSV files often requires filtering data based on specific conditions. Python offers multiple approaches to filter CSV rows efficiently, whether you're using the built-in csv
module or more advanced libraries.
Using the CSV Module for Basic Filtering
Let's start with the basic approach using Python's built-in csv
module. For more details about CSV handling basics, check out our guide on Python CSV File Handling.
import csv
with open('data.csv', 'r') as file:
reader = csv.reader(file)
header = next(reader) # Skip header row
filtered_data = [row for row in reader if int(row[1]) > 25] # Filter ages > 25
print("Filtered Data:")
for row in filtered_data:
print(row)
Using DictReader for More Readable Filtering
The DictReader
makes filtering more intuitive by using column names. Learn more about its advantages in our article about CSV Reader vs DictReader.
import csv
with open('data.csv', 'r') as file:
reader = csv.DictReader(file)
filtered_data = [row for row in reader if row['city'] == 'New York']
print("Filtered Data:")
for row in filtered_data:
print(row)
Using Pandas for Advanced Filtering
For complex filtering operations, pandas provides a more powerful approach. It's especially useful when dealing with large datasets or multiple filtering conditions.
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv')
# Apply multiple filters
filtered_df = df[(df['age'] > 25) & (df['salary'] >= 50000)]
print(filtered_df)
Saving Filtered Results
After filtering, you might want to save the results to a new CSV file. This process is straightforward, and you can learn more about it in our guide about appending data to CSV files.
# Using CSV module
with open('filtered_data.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(header)
writer.writerows(filtered_data)
# Using Pandas
filtered_df.to_csv('filtered_data.csv', index=False)
Handling Large CSV Files
When working with large CSV files, memory management becomes crucial. For handling large files, check out our article on handling large CSV fields.
def filter_large_csv(input_file, output_file, condition):
with open(input_file, 'r') as infile, open(output_file, 'w', newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
header = next(reader)
writer.writerow(header)
for row in reader:
if condition(row):
writer.writerow(row)
Conclusion
Filtering CSV files in Python can be accomplished through various methods, each with its own advantages. The csv module provides basic functionality, while pandas offers more advanced features.
Remember to choose the appropriate method based on your specific needs: file size, filtering complexity, and performance requirements. Always consider memory usage when working with large datasets.