Last modified: May 11, 2026 By Alexander Williams

Polars Multi-threading & Performance Tuning

Polars is built for speed. It uses all CPU cores by default. But default settings may not be optimal for your hardware or data. This guide shows you how to tune Polars for maximum performance.

How Polars Uses Multi-Threading

Polars uses Rayon, a Rust library for parallel computing. It splits work across all available cores. This makes Polars faster than Pandas for most operations.

By default, Polars uses all CPU threads. This is great for big datasets. But for small data, overhead from threading can slow things down.

Check Current Thread Count

You can see how many threads Polars uses:


import polars as pl

# Check current thread count
print(pl.thread_pool_size())

16

This shows 16 threads on an 8-core CPU with hyper-threading.

Controlling Thread Count

You can set the thread pool size. This helps when running Polars alongside other processes.


import polars as pl

# Set thread count to 4
pl.set_thread_pool_size(4)

# Verify
print(pl.thread_pool_size())

4

Set this early in your script. Changing it mid-execution may not affect running queries.

When to Reduce Threads

Reduce threads when:

  • Your dataset fits in cache (under 1 GB)
  • You run multiple Polars scripts at once
  • Other apps need CPU time
  • You do I/O-bound work like reading many small files

More threads do not always mean faster. Overhead from context switching hurts small tasks.

Performance Tuning with Streaming

For datasets larger than RAM, use streaming. Streaming processes data in chunks. It avoids memory overload.


import polars as pl

# Create a large lazy query
q = (
    pl.scan_csv("large_file.csv")
    .group_by("category")
    .agg(pl.sum("value"))
)

# Execute with streaming
result = q.collect(streaming=True)
print(result)

shape: (10, 2)
┌──────────┬───────┐
│ category ┆ value │
╞══════════╪═══════╡
│ A        ┆ 54321 │
│ B        ┆ 12345 │
└──────────┴───────┘

Streaming uses less memory. It can be slower for small data. Use it only when data exceeds RAM.

Profiling Query Performance

Use profile() to see where time goes. This prints a detailed breakdown of each step.


import polars as pl

df = pl.DataFrame({
    "group": ["A", "B", "A", "B"] * 100,
    "value": range(400)
})

# Profile the query
q = df.lazy().group_by("group").agg(pl.mean("value"))
profile = q.profile()

print(profile[1])  # The profile data

shape: (3, 2)
┌─────────────────────────────────┬──────────────┐
│ node                            ┆ start_end    │
╞═════════════════════════════════╪══════════════╡
│ CsvReader                       ┆ 0.000123     │
│ GroupBy                         ┆ 0.000456     │
│ Collect                         ┆ 0.000789     │
└─────────────────────────────────┴──────────────┘

Look for slow nodes. GroupBy and joins are common bottlenecks.

Optimize with Lazy Evaluation

Lazy evaluation lets Polars optimize your query plan. It reorders operations for speed. Use lazy() and collect() instead of eager methods.

Learn more about Polars LazyFrame Query Optimization for deeper insights.


import polars as pl

df = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "score": [85, 92, 78]
})

# Eager (faster for small data)
result_eager = df.filter(pl.col("score") > 80)
print(result_eager)

shape: (2, 2)
┌───────┬───────┐
│ name  ┆ score │
╞═══════╪═══════╡
│ Alice ┆ 85    │
│ Bob   ┆ 92    │
└───────┴───────┘

Use Expressions Over Custom Functions

Polars expressions are fast. They run in Rust. Custom Python functions are slow because they break parallelism.

For complex logic, see Polars Custom Functions with map_elements & map_batches.


import polars as pl

df = pl.DataFrame({"x": [1, 2, 3, 4]})

# Fast: Expression
result_fast = df.with_columns(
    (pl.col("x") * 2).alias("double_x")
)

# Slow: Custom function
def double(val):
    return val * 2

result_slow = df.with_columns(
    pl.col("x").map_elements(double).alias("double_x_slow")
)

print(result_fast)
print(result_slow)

shape: (4, 2)
┌─────┬──────────┐
│ x   ┆ double_x │
╞═════╪══════════╡
│ 1   ┆ 2        │
│ 2   ┆ 4        │
│ 3   ┆ 6        │
│ 4   ┆ 8        │
└─────┴──────────┘
shape: (4, 3)
┌─────┬──────────┬───────────────┐
│ x   ┆ double_x ┆ double_x_slow │
╞═════╪══════════╪═══════════════╡
│ 1   ┆ 2        ┆ 2             │
│ 2   ┆ 4        ┆ 4             │
│ 3   ┆ 6        ┆ 6             │
│ 4   ┆ 8        ┆ 8             │
└─────┴──────────┴───────────────┘

Use expressions whenever possible. They are 10-100x faster.

Memory Management Tips

Polars manages memory efficiently. But you can help it:

  • Use scan_csv() instead of read_csv() for large files
  • Drop unused columns early
  • Use shrink_to_fit() to reduce memory after filtering

For large files, read Scan Large Files with Polars Without Memory Load.


import polars as pl

df = pl.DataFrame({
    "id": range(1000),
    "value": range(1000),
    "unused": ["x"] * 1000
})

# Drop unused column
df_clean = df.drop("unused")

# Shrink memory
df_shrunk = df_clean.shrink_to_fit()

print(df_shrunk.estimated_size("mb"))

0.015

Choosing the Right Data Types

Polars uses Arrow for data. Arrow supports many types. Using the smallest type saves memory and speeds up operations.


import polars as pl

df = pl.DataFrame({
    "big_int": pl.Series([1, 2, 3], dtype=pl.Int64),
    "small_int": pl.Series([1, 2, 3], dtype=pl.Int8),
})

print(df.schema)

{'big_int': Int64, 'small_int': Int8}

Use Int8 for small ranges. Use Float32 instead of Float64 when precision is not critical.

Conclusion

Polars is fast by default. But you can make it faster. Control thread count for your hardware. Use streaming for big data. Profile queries to find bottlenecks. Prefer expressions over custom functions. Choose the right data types.

These tuning steps will help you process data faster. Start with profiling. Then apply changes one by one. Measure the impact. You will see big improvements.