Last modified: Dec 04, 2024 By Alexander Williams

Python Pandas concat(): Merging DataFrames in Python

The concat() method in Python's Pandas library is an efficient way to merge DataFrames along either rows or columns. It's one of the most commonly used tools for combining data in data analysis and data manipulation tasks. This article explains how to use concat(), its parameters, and how it works with practical examples.

Table Of Contents

What is the Pandas concat() Method?
Concatenating DataFrames Vertically (Along Rows)
Concatenating DataFrames Horizontally (Along Columns)
Dealing with Different Indexes
Using Keys for Multi-Level Indexing
Conclusion

What is the Pandas concat() Method?

The concat() method in Pandas is used to concatenate two or more DataFrames along a specified axis (either rows or columns). It helps you combine data, making it easier to analyze large datasets or merge different sources of data.

Syntax of concat() method:


import pandas as pd

pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None)

The method takes several parameters:

objs: A list of DataFrames to concatenate.
axis: The axis to concatenate along. Use 0 for rows and 1 for columns (default is 0).
join: Determines how to handle indexes. Use 'outer' (default) to union the indexes or 'inner' for an intersection.
ignore_index: If True, the new index will be labeled 0, 1, 2,... and the original indexes will be ignored.
keys: Used to create a hierarchical index in the resulting DataFrame.

Concatenating DataFrames Vertically (Along Rows)

axis parameter to 0 (the default).


import pandas as pd

# Creating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenating vertically
result = pd.concat([df1, df2], axis=0)

print(result)

The output of this code will look like this:

The indexes are preserved from the original DataFrames, resulting in duplicate index values. If you want to reset the index, you can use reset_index() to reindex the resulting DataFrame.

Concatenating DataFrames Horizontally (Along Columns)

If you want to concatenate DataFrames side by side (merge columns), set the axis parameter to 1.


# Concatenating horizontally
result = pd.concat([df1, df2], axis=1)

print(result)

Here's the output for the horizontal concatenation:


   A  B  A  B
0  1  3  5  7
1  2  4  6  8

As you can see, the DataFrames are now joined column-wise. Both DataFrames are concatenated without any issues, as they have the same number of rows.

Dealing with Different Indexes

When concatenating DataFrames with different indexes, the concat() function aligns the data based on the index. You can control this behavior using the join parameter.


df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
df2 = pd.DataFrame({'B': [3, 4]}, index=[2, 3])

result = pd.concat([df1, df2], axis=0, join='outer')
print(result)

Here’s the output of this concatenation using the 'outer' join:


     A  B
0  1.0  NaN
1  2.0  NaN
2  NaN  3.0
3  NaN  4.0

Notice that the missing data is filled with NaN values. If you prefer to keep only the rows that have matching indices in both DataFrames, you can use the inner join:


result = pd.concat([df1, df2], axis=0, join='inner')
print(result)

Output for an 'inner' join:


   A  B

Using Keys for Multi-Level Indexing

The keys parameter can be used to create a multi-level index. This is especially helpful when concatenating multiple DataFrames, making it easier to track which DataFrame each row belongs to.


df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)

The resulting output will show the multi-level index:

Conclusion

In this article, we have explored the concat() method in Pandas, including how to concatenate DataFrames vertically and horizontally, handle missing data with different join types, and use keys for multi-level indexing. This powerful function simplifies data merging tasks and helps you manage large datasets efficiently.

For additional techniques in managing DataFrame indexes, you might find it helpful to explore set_index() and reset_index().