Last modified: May 10, 2026 By Alexander Williams

Polars DataFrame Joins & Merges Guide

Combining data from multiple DataFrames is a core data analysis task. Polars provides fast, expressive methods for joining and merging datasets. This guide covers the key join types with clear examples.

Polars uses the join method for most merging operations. It is designed for speed and clarity. You will learn how to perform inner, left, outer, cross, and anti joins. We'll also explore merging on multiple keys and handling overlapping column names.

Why Joins Matter in Polars

Joins let you combine data from different sources. For example, you might have a customer table and an orders table. A join connects them using a common key, like a customer ID. This unlocks richer analysis.

Polars joins are lazy by nature. This means they optimize the query plan before execution. The result is faster performance, especially with large datasets. For more on structuring your data, see our guide on Polars Columns: Add, Rename & Drop.

Preparing Sample DataFrames

Let's create two simple DataFrames to demonstrate joins. We'll use a customer list and an order list.


import polars as pl

# Customer DataFrame
customers = pl.DataFrame({
    "customer_id": [1, 2, 3, 4],
    "name": ["Alice", "Bob", "Charlie", "David"]
})

# Orders DataFrame
orders = pl.DataFrame({
    "order_id": [101, 102, 103, 104],
    "customer_id": [1, 2, 2, 5],
    "amount": [250, 150, 300, 400]
})

print("Customers:")
print(customers)
print("\nOrders:")
print(orders)

Customers:
shape: (4, 2)
┌─────────────┬─────────┐
│ customer_id ┆ name    │
│ ---         ┆ ---     │
│ i64         ┆ str     │
╞═════════════╪═════════╡
│ 1           ┆ Alice   │
│ 2           ┆ Bob     │
│ 3           ┆ Charlie │
│ 4           ┆ David   │
└─────────────┴─────────┘

Orders:
shape: (4, 3)
┌──────────┬─────────────┬────────┐
│ order_id ┆ customer_id ┆ amount │
│ ---      ┆ ---         ┆ ---    │
│ i64      ┆ i64         ┆ i64    │
╞══════════╪═════════════╪════════╡
│ 101      ┆ 1           ┆ 250    │
│ 102      ┆ 2           ┆ 150    │
│ 103      ┆ 2           ┆ 300    │
│ 104      ┆ 5           ┆ 400    │
└──────────┴─────────────┴────────┘

Inner Join

An inner join returns only rows with matching keys in both DataFrames. It is the default join type in Polars.


# Inner join on customer_id
inner_join = customers.join(orders, on="customer_id")
print("Inner Join:")
print(inner_join)

Inner Join:
shape: (3, 4)
┌─────────────┬───────┬──────────┬────────┐
│ customer_id ┆ name  ┆ order_id ┆ amount │
│ ---         ┆ ---   ┆ ---      ┆ ---    │
│ i64         ┆ str   ┆ i64      ┆ i64    │
╞═════════════╪═══════╪══════════╪════════╡
│ 1           ┆ Alice ┆ 101      ┆ 250    │
│ 2           ┆ Bob   ┆ 102      ┆ 150    │
│ 2           ┆ Bob   ┆ 103      ┆ 300    │
└─────────────┴───────┴──────────┴────────┘

Notice that customer 3 (Charlie) and customer 4 (David) are missing. They have no orders. Customer 5 from orders is also absent. Only matching keys remain.

Left Join

A left join keeps all rows from the left DataFrame. Missing matches from the right side become null.


# Left join: keep all customers
left_join = customers.join(orders, on="customer_id", how="left")
print("Left Join:")
print(left_join)

Left Join:
shape: (4, 4)
┌─────────────┬─────────┬──────────┬────────┐
│ customer_id ┆ name    ┆ order_id ┆ amount │
│ ---         ┆ ---     ┆ ---      ┆ ---    │
│ i64         ┆ str     ┆ i64      ┆ i64    │
╞═════════════╪═════════╪══════════╪════════╡
│ 1           ┆ Alice   ┆ 101      ┆ 250    │
│ 2           ┆ Bob     ┆ 102      ┆ 150    │
│ 2           ┆ Bob     ┆ 103      ┆ 300    │
│ 3           ┆ Charlie ┆ null     ┆ null   │
│ 4           ┆ David   ┆ null     ┆ null   │
└─────────────┴─────────┴──────────┴────────┘

All customers are present. Charlie and David have null values for order data. This is useful when you need a complete list from one table.

Outer Join

An outer join (or full join) returns all rows from both DataFrames. Missing values are filled with null.


# Outer join
outer_join = customers.join(orders, on="customer_id", how="outer")
print("Outer Join:")
print(outer_join)

Outer Join:
shape: (5, 4)
┌─────────────┬─────────┬──────────┬────────┐
│ customer_id ┆ name    ┆ order_id ┆ amount │
│ ---         ┆ ---     ┆ ---      ┆ ---    │
│ i64         ┆ str     ┆ i64      ┆ i64    │
╞═════════════╪═════════╪══════════╪════════╡
│ 1           ┆ Alice   ┆ 101      ┆ 250    │
│ 2           ┆ Bob     ┆ 102      ┆ 150    │
│ 2           ┆ Bob     ┆ 103      ┆ 300    │
│ 3           ┆ Charlie ┆ null     ┆ null   │
│ 4           ┆ David   ┆ null     ┆ null   │
│ 5           ┆ null    ┆ 104      ┆ 400    │
└─────────────┴─────────┴──────────┴────────┘

Now customer 5 appears, even though they are not in the customers table. Their name is null. This join is useful for finding discrepancies between datasets.

Cross Join

A cross join creates the Cartesian product of both DataFrames. Every row from the left is paired with every row from the right. Use it with caution on large data.


# Cross join
cross_join = customers.join(orders, how="cross")
print("Cross Join (first 5 rows):")
print(cross_join.head(5))

Cross Join (first 5 rows):
shape: (5, 5)
┌─────────────┬───────┬──────────┬─────────────┬────────┐
│ customer_id ┆ name  ┆ order_id ┆ customer_id ┆ amount │
│ ---         ┆ ---   ┆ ---      ┆ ---         ┆ ---    │
│ i64         ┆ str   ┆ i64      ┆ i64         ┆ i64    │
╞═════════════╪═══════╪══════════╪═════════════╪════════╡
│ 1           ┆ Alice ┆ 101      ┆ 1           ┆ 250    │
│ 1           ┆ Alice ┆ 102      ┆ 2           ┆ 150    │
│ 1           ┆ Alice ┆ 103      ┆ 2           ┆ 300    │
│ 1           ┆ Alice ┆ 104      ┆ 5           ┆ 400    │
│ 2           ┆ Bob   ┆ 101      ┆ 1           ┆ 250    │
└─────────────┴───────┴──────────┴─────────────┴────────┘

The output duplicates columns. You can rename or drop them afterward. Cross joins are helpful for generating all possible combinations.

Anti and Semi Joins

Anti join returns rows from the left DataFrame that have no match in the right. Semi join returns rows from the left that have at least one match, without duplicating rows.


# Anti join: customers with no orders
anti_join = customers.join(orders, on="customer_id", how="anti")
print("Anti Join (customers with no orders):")
print(anti_join)

# Semi join: customers with at least one order
semi_join = customers.join(orders, on="customer_id", how="semi")
print("\nSemi Join (customers with orders):")
print(semi_join)

Anti Join (customers with no orders):
shape: (2, 2)
┌─────────────┬─────────┐
│ customer_id ┆ name    │
│ ---         ┆ ---     │
│ i64         ┆ str     │
╞═════════════╪═════════╡
│ 3           ┆ Charlie │
│ 4           ┆ David   │
└─────────────┴─────────┘

Semi Join (customers with orders):
shape: (2, 2)
┌─────────────┬───────┐
│ customer_id ┆ name  │
│ ---         ┆ ---   │
│ i64         ┆ str   │
╞═════════════╪═══════╡
│ 1           ┆ Alice │
│ 2           ┆ Bob   │
└─────────────┴───────┘

Anti joins are great for finding missing records. Semi joins filter rows without adding columns from the right table. This is useful for cleaning data before analysis. For more on handling missing data, see Handling Null & Missing Values in Polars.

Joining on Multiple Keys

You can join on multiple columns by passing a list to the on parameter. This is common with composite keys.


# DataFrames with composite keys
df1 = pl.DataFrame({
    "year": [2020, 2020, 2021],
    "product": ["A", "B", "A"],
    "sales": [100, 150, 200]
})
df2 = pl.DataFrame({
    "year": [2020, 2020, 2021],
    "product": ["A", "B", "B"],
    "target": [120, 160, 180]
})

# Inner join on two keys
multi_join = df1.join(df2, on=["year", "product"])
print("Multi-key Join:")
print(multi_join)

Multi-key Join:
shape: (2, 4)
┌──────┬─────────┬───────┬────────┐
│ year ┆ product ┆ sales ┆ target │
│ ---  ┆ ---     ┆ ---   ┆ ---    │
│ i64  ┆ str     ┆ i64   ┆ i64    │
╞══════╪═════════╪═══════╪════════╡
│ 2020 ┆ A       ┆ 100   ┆ 120    │
│ 2020 ┆ B       ┆ 150   ┆ 160    │
└──────┴─────────┴───────┴────────┘

Only matching year-product combinations appear. This ensures precise merging when a single column is not unique.

Handling Overlapping Column Names

When both DataFrames have columns with the same name (other than the join key), Polars adds a suffix. You can customize it using the suffix parameter.


# DataFrames with overlapping columns
left = pl.DataFrame({
    "id": [1, 2],
    "value": [10, 20],
    "info": ["a", "b"]
})
right = pl.DataFrame({
    "id": [1, 2],
    "value": [100, 200],
    "info": ["x", "y"]
})

# Join with default suffix
joined = left.join(right, on="id", suffix="_right")
print("Join with suffix:")
print(joined)

Join with suffix:
shape: (2, 5)
┌─────┬───────┬──────┬───────────┬──────────┐
│ id  ┆ value ┆ info ┆ value_rig ┆ info_rig │
│ --- ┆ ---   ┆ ---  ┆ ---       ┆ ---      │
│ i64 ┆ i64   ┆ str  ┆ i64       ┆ str      │
╞═════╪═══════╪══════╪═══════════╪══════════╡
│ 1   ┆ 10    ┆ a    ┆ 100       ┆ x        │
│ 2   ┆ 20    ┆ b    ┆ 200       ┆ y        │
└─────┴───────┴──────┴───────────┴──────────┘

You can also rename columns before joining for cleaner output. This avoids confusion and keeps your DataFrame schema tidy. For more on managing columns, check Polars Data Type Casting & Schema Management.

Performance Tips

Polars is optimized for speed. However, you can improve performance further. Always join on columns that are sorted if possible. Use lazy evaluation by chaining operations with pl.LazyFrame.

Avoid cross joins on large datasets. They can generate billions of rows quickly. Use the how parameter explicitly to make your intent clear. This also helps with code readability.

When working with large data, consider using the streaming option. It processes data in chunks, reducing memory usage. This is especially helpful for joins that produce large intermediate results.

Conclusion

Joining and merging DataFrames is a fundamental skill in Polars. You now know how to perform inner, left, outer, cross, anti, and semi joins. You can join on single or multiple keys. You can handle overlapping column names with suffixes.

Polars makes these operations fast and intuitive. Practice with your own data to build confidence. For deeper learning, explore the Polars expression system. It powers these join operations behind the scenes.