Last modified: Nov 10, 2024 By Alexander Williams

Python Guide: Filter CSV Rows Efficiently with Code Examples

Working with CSV files often requires filtering data based on specific conditions. Python offers multiple approaches to filter CSV rows efficiently, whether you're using the built-in csv module or more advanced libraries.

Using the CSV Module for Basic Filtering

Let's start with the basic approach using Python's built-in csv module. For more details about CSV handling basics, check out our guide on Python CSV File Handling.


import csv

with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # Skip header row
    filtered_data = [row for row in reader if int(row[1]) > 25]  # Filter ages > 25

print("Filtered Data:")
for row in filtered_data:
    print(row)

Using DictReader for More Readable Filtering

The DictReader makes filtering more intuitive by using column names. Learn more about its advantages in our article about CSV Reader vs DictReader.


import csv

with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    filtered_data = [row for row in reader if row['city'] == 'New York']

print("Filtered Data:")
for row in filtered_data:
    print(row)

Using Pandas for Advanced Filtering

For complex filtering operations, pandas provides a more powerful approach. It's especially useful when dealing with large datasets or multiple filtering conditions.


import pandas as pd

# Read CSV file
df = pd.read_csv('data.csv')

# Apply multiple filters
filtered_df = df[(df['age'] > 25) & (df['salary'] >= 50000)]
print(filtered_df)

Saving Filtered Results

After filtering, you might want to save the results to a new CSV file. This process is straightforward, and you can learn more about it in our guide about appending data to CSV files.


# Using CSV module
with open('filtered_data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(header)
    writer.writerows(filtered_data)

# Using Pandas
filtered_df.to_csv('filtered_data.csv', index=False)

Handling Large CSV Files

When working with large CSV files, memory management becomes crucial. For handling large files, check out our article on handling large CSV fields.


def filter_large_csv(input_file, output_file, condition):
    with open(input_file, 'r') as infile, open(output_file, 'w', newline='') as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        header = next(reader)
        writer.writerow(header)
        
        for row in reader:
            if condition(row):
                writer.writerow(row)

Conclusion

Filtering CSV files in Python can be accomplished through various methods, each with its own advantages. The csv module provides basic functionality, while pandas offers more advanced features.

Remember to choose the appropriate method based on your specific needs: file size, filtering complexity, and performance requirements. Always consider memory usage when working with large datasets.