Last modified: Nov 10, 2024 By Alexander Williams
Python: Sort CSV Data by Column - Complete Guide
Sorting CSV data is a common task in data processing. In this guide, we'll explore different methods to sort CSV data by column in Python, from basic to advanced approaches.
Using Python's Built-in CSV Module
Let's start with a simple approach using the Python csv
module. This method works well for basic sorting needs and small to medium-sized files. For complex scenarios, consider using pandas.
Basic CSV Sorting Example
Consider this sample CSV file (data.csv):
name,age,city
John,25,New York
Alice,30,London
Bob,22,Paris
Here's how to sort by age:
import csv
# Read CSV and sort by age
with open('data.csv', 'r') as file:
csvreader = csv.DictReader(file)
data = list(csvreader)
sorted_data = sorted(data, key=lambda x: int(x['age']))
# Write sorted data
with open('sorted_data.csv', 'w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=csvreader.fieldnames)
writer.writeheader()
writer.writerows(sorted_data)
Sorting with Pandas Library
For more complex sorting operations, pandas
provides powerful functionality. It's especially useful when dealing with missing data.
import pandas as pd
# Read CSV
df = pd.read_csv('data.csv')
# Sort by age
sorted_df = df.sort_values('age')
# Save sorted data
sorted_df.to_csv('sorted_data.csv', index=False)
Multiple Column Sorting
Sometimes you need to sort by multiple columns. Here's how to do it with pandas:
import pandas as pd
df = pd.read_csv('data.csv')
sorted_df = df.sort_values(['city', 'age'], ascending=[True, False])
sorted_df.to_csv('multi_sorted.csv', index=False)
Error Handling
When sorting CSV data, it's important to handle errors properly. Check out our guide on CSV module error handling for detailed information.
try:
df = pd.read_csv('data.csv')
sorted_df = df.sort_values('age')
sorted_df.to_csv('sorted_data.csv', index=False)
except FileNotFoundError:
print("File not found!")
except pd.errors.EmptyDataError:
print("Empty CSV file!")
Performance Considerations
For large CSV files, consider using pandas
with appropriate settings. Read more about efficient processing of large CSV files.
Conclusion
Sorting CSV data in Python can be accomplished using either the built-in CSV module or pandas. Choose the method that best fits your needs based on file size and complexity of sorting requirements.
For more advanced CSV operations, explore our guides on filtering CSV rows and CSV file handling.