Last modified: Nov 10, 2024 By Alexander Williams

Python: Sort CSV Data by Column - Complete Guide

Sorting CSV data is a common task in data processing. In this guide, we'll explore different methods to sort CSV data by column in Python, from basic to advanced approaches.

Using Python's Built-in CSV Module

Let's start with a simple approach using the Python csv module. This method works well for basic sorting needs and small to medium-sized files. For complex scenarios, consider using pandas.

Basic CSV Sorting Example

Consider this sample CSV file (data.csv):


name,age,city
John,25,New York
Alice,30,London
Bob,22,Paris

Here's how to sort by age:


import csv

# Read CSV and sort by age
with open('data.csv', 'r') as file:
    csvreader = csv.DictReader(file)
    data = list(csvreader)
    sorted_data = sorted(data, key=lambda x: int(x['age']))

# Write sorted data
with open('sorted_data.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=csvreader.fieldnames)
    writer.writeheader()
    writer.writerows(sorted_data)

Sorting with Pandas Library

For more complex sorting operations, pandas provides powerful functionality. It's especially useful when dealing with missing data.


import pandas as pd

# Read CSV
df = pd.read_csv('data.csv')

# Sort by age
sorted_df = df.sort_values('age')

# Save sorted data
sorted_df.to_csv('sorted_data.csv', index=False)

Multiple Column Sorting

Sometimes you need to sort by multiple columns. Here's how to do it with pandas:


import pandas as pd

df = pd.read_csv('data.csv')
sorted_df = df.sort_values(['city', 'age'], ascending=[True, False])
sorted_df.to_csv('multi_sorted.csv', index=False)

Error Handling

When sorting CSV data, it's important to handle errors properly. Check out our guide on CSV module error handling for detailed information.


try:
    df = pd.read_csv('data.csv')
    sorted_df = df.sort_values('age')
    sorted_df.to_csv('sorted_data.csv', index=False)
except FileNotFoundError:
    print("File not found!")
except pd.errors.EmptyDataError:
    print("Empty CSV file!")

Performance Considerations

For large CSV files, consider using pandas with appropriate settings. Read more about efficient processing of large CSV files.

Conclusion

Sorting CSV data in Python can be accomplished using either the built-in CSV module or pandas. Choose the method that best fits your needs based on file size and complexity of sorting requirements.

For more advanced CSV operations, explore our guides on filtering CSV rows and CSV file handling.