Last modified: May 11, 2026 By Alexander Williams
Polars Lazy vs Eager API: When to Use
Polars is a fast DataFrame library for Python. It offers two APIs: Eager and Lazy. Each serves a different purpose. Choosing the right one can make your code faster and cleaner.
In this guide, you will learn the key differences. You will see when to use each API. We will also show code examples with outputs. By the end, you will know which approach fits your workflow.
What Is the Eager API?
The Eager API executes operations immediately. When you run a method, Polars computes the result right away. This is similar to how pandas works.
Eager execution is simple to debug. You can inspect results line by line. It is great for interactive work or small datasets.
Example of Eager API
import polars as pl
# Create a DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
})
# Eager: filter and select immediately
result = df.filter(pl.col("age") > 28).select(["name", "salary"])
print(result)
shape: (2, 2)
┌─────────┬────────┐
│ name ┆ salary │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪════════╡
│ Bob ┆ 60000 │
│ Charlie ┆ 70000 │
└─────────┴────────┘
Notice the output appears instantly. Each step is computed in sequence.
What Is the Lazy API?
The Lazy API delays execution. You build a query plan first. Then you call .collect() to run it. This allows Polars to optimize the entire pipeline.
Lazy execution can reorder operations. It can filter early and project later. This reduces memory use and speeds up large datasets.
Example of Lazy API
import polars as pl
# Create a LazyFrame (same data)
lf = pl.LazyFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
})
# Build query plan (no execution yet)
query = lf.filter(pl.col("age") > 28).select(["name", "salary"])
# Execute the plan
result = query.collect()
print(result)
shape: (2, 2)
┌─────────┬────────┐
│ name ┆ salary │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪════════╡
│ Bob ┆ 60000 │
│ Charlie ┆ 70000 │
└─────────┴────────┘
The result is the same. But the process is different. The query plan is optimized before execution.
Key Differences
Here are the main points to consider:
- Execution timing: Eager runs immediately. Lazy runs on
.collect(). - Optimization: Lazy can reorder operations. Eager cannot.
- Memory: Lazy uses less memory for large data. Eager loads everything.
- Debugging: Eager is easier to debug step by step.
- Chaining: Both support method chaining. Lazy is more efficient.
When to Use Eager API
Use the Eager API when:
- You are exploring data interactively.
- Your dataset is small (fits in memory).
- You need to see results after each step.
- You are debugging a pipeline.
- You prefer simplicity over performance.
For example, if you have a CSV file under 100 MB, eager works fine.
# Quick inspection with eager
df = pl.read_csv("small_data.csv")
print(df.head())
When to Use Lazy API
Use the Lazy API when:
- Your dataset is large (hundreds of MB or GB).
- You have many chained operations.
- You want to minimize memory usage.
- You need to optimize performance.
- You work with streaming or out-of-core data.
For large datasets, lazy execution can be 2-10x faster.
# Lazy loading for large data
lf = pl.scan_csv("large_data.csv")
query = lf.filter(pl.col("value") > 100) \
.group_by("category") \
.agg(pl.col("value").mean())
result = query.collect()
print(result)
Performance Comparison
Let's compare both APIs on a larger dataset. We will create a DataFrame with 10 million rows.
import polars as pl
import time
# Create large DataFrame
n = 10_000_000
df = pl.DataFrame({
"id": range(n),
"value": [x % 100 for x in range(n)]
})
# Eager execution
start = time.time()
eager_result = df.filter(pl.col("value") > 50).select(["id"])
eager_time = time.time() - start
print(f"Eager time: {eager_time:.4f} seconds")
# Lazy execution
start = time.time()
lazy_result = df.lazy().filter(pl.col("value") > 50).select(["id"]).collect()
lazy_time = time.time() - start
print(f"Lazy time: {lazy_time:.4f} seconds")
Eager time: 0.3452 seconds
Lazy time: 0.2981 seconds
On this small test, lazy is slightly faster. On bigger data with multiple steps, the gap widens.
Combining Both APIs
You can mix eager and lazy in one project. Start with eager for exploration. Then switch to lazy for production pipelines.
Convert between them easily:
df.lazy()turns an eager DataFrame into a LazyFrame.lf.collect()turns a LazyFrame into an eager DataFrame.
# Convert eager to lazy
lazy_df = df.lazy()
# Convert lazy to eager
eager_df = lazy_df.collect()
Advanced Lazy Features
The Lazy API supports powerful optimizations:
- Predicate pushdown: Filters are applied as early as possible.
- Projection pushdown: Only needed columns are read.
- Join reordering: Polars chooses the best join order.
This makes lazy ideal for complex pipelines. For example, when using Polars Chaining Expressions Guide, lazy can reorder your chain for maximum speed.
If you work with window functions, lazy is also beneficial. Check out Polars Window Functions & Rolling Computations for more details.
For reshaping tasks like pivot or melt, both APIs work. But lazy can reduce intermediate memory. See Reshape Data in Polars: Pivot, Melt & Transpose for examples.
Common Pitfalls
Here are mistakes to avoid:
- Forgetting
.collect(): Lazy queries do nothing until collected. - Using eager on huge data: This can crash your memory.
- Over-optimizing small data: Lazy overhead may not be worth it for tiny datasets.
- Chaining many
.collect()calls: This defeats lazy optimization.
Debugging Tips
Debugging lazy queries can be tricky. Use these methods:
.explain()to see the query plan..show_graph()to visualize plan (requires graphviz).- Use
.collect()early to inspect intermediate results.
# See the query plan
plan = lf.filter(pl.col("age") > 28).select(["name"]).explain()
print(plan)
SELECT [col("name")] FROM
FILTER [(col("age")) > (28)] FROM
DataFrame
Conclusion
Choosing between Polars Lazy and Eager API depends on your task. Use Eager for quick exploration and small data. Use Lazy for large datasets and complex pipelines.
Both APIs are fast and clean. Lazy offers powerful optimizations. Eager offers simplicity. You can even combine them in one project.
Start with eager when learning. Then adopt lazy as your data grows. This will make your code efficient and maintainable.
For more Polars tips, explore the Polars Expression System Core Concept guide. It explains how expressions work in both APIs.