Last modified: Nov 06, 2024 By Alexander Williams

Convert Excel XLSX to JSON in Python

Converting Excel files to JSON format is a common task in data processing. Python provides powerful libraries like pandas that make this conversion straightforward and efficient.

Required Libraries

First, you'll need to install the necessary libraries. The main library we'll use is pandas, which excels at handling spreadsheet data. Install it using pip:


pip install pandas

Basic XLSX to JSON Conversion

Here's a simple example of how to convert an Excel file to JSON. This method uses read_excel() from pandas and to_json() for conversion.


import pandas as pd

# Read the Excel file
df = pd.read_excel('data.xlsx')

# Convert to JSON
json_data = df.to_json('output.json')

Advanced Conversion Options

For more control over the JSON output format, you can use additional parameters. The following example shows how to create a more readable JSON format with proper formatting.


import pandas as pd
import json

# Read Excel file
df = pd.read_excel('data.xlsx')

# Convert to JSON with custom options
json_data = df.to_json(orient='records', indent=4)

# Write to file with proper formatting
with open('output.json', 'w') as f:
    f.write(json_data)

Handling Multiple Sheets

Excel files often contain multiple sheets. Here's how to handle all sheets in an Excel file and convert them to JSON:


import pandas as pd

# Read all sheets into a dictionary
excel_file = pd.ExcelFile('data.xlsx')
sheets_dict = {}

for sheet_name in excel_file.sheet_names:
    df = pd.read_excel(excel_file, sheet_name=sheet_name)
    sheets_dict[sheet_name] = json.loads(df.to_json(orient='records'))

# Save as JSON
with open('output.json', 'w') as f:
    json.dump(sheets_dict, f, indent=4)

Data Cleaning and Preprocessing

Before converting to JSON, you might need to clean your data. Here's an example that handles common issues like NaN values and data type conversion.


import pandas as pd
import numpy as np

# Read Excel file
df = pd.read_excel('data.xlsx')

# Replace NaN values
df = df.replace({np.nan: None})

# Convert specific columns to desired types
df['date_column'] = pd.to_datetime(df['date_column']).dt.strftime('%Y-%m-%d')

# Convert to JSON
json_data = df.to_json(orient='records', date_format='iso')

Working with Large Files

For large Excel files, you might want to process the data in chunks. Here's how to handle large files efficiently:


import pandas as pd
import json

# Read Excel in chunks
chunk_size = 1000
json_list = []

for chunk in pd.read_excel('large_file.xlsx', chunksize=chunk_size):
    json_list.extend(json.loads(chunk.to_json(orient='records')))

# Save the complete JSON
with open('output.json', 'w') as f:
    json.dump(json_list, f, indent=4)

For more information about handling JSON in Python, you might want to check out how to use JSON dumps or learn about pretty printing JSON.

Conclusion

Converting Excel files to JSON in Python is straightforward with pandas. Remember to consider factors like file size, data cleaning, and output format when designing your conversion process.

For more advanced JSON handling, you might want to explore JSON parsing in Python or learn how to convert Python dictionaries to JSON.