Last modified: May 10, 2026 By Alexander Williams

Handling Null & Missing Values in Polars

Missing values are common in real-world data. They can break calculations or lead to wrong insights. Polars provides powerful tools to handle nulls efficiently. This guide shows you how to detect, drop, fill, and manage missing values in Polars DataFrames.

What Are Null Values in Polars?

In Polars, a null represents a missing or undefined value. It is different from NaN (Not a Number) which is a floating-point value. Polars treats nulls and NaNs separately. Understanding this distinction helps you clean data correctly.

Nulls can appear in any column type. For example, a string column may have a null where a name is missing. A numeric column may have a null for an absent measurement. Polars uses null in Python and None in Rust.

Creating a DataFrame with Nulls

Let’s create a sample DataFrame with missing values. This will be our starting point for all examples.


import polars as pl

# Create a DataFrame with null values
df = pl.DataFrame({
    "name": ["Alice", "Bob", None, "David", "Eve"],
    "age": [25, None, 30, 22, None],
    "score": [85.5, 90.0, None, 78.0, 92.5]
})

print(df)

Output:


shape: (5, 3)
┌───────┬──────┬───────┐
│ name  ┆ age  ┆ score │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ i64  ┆ f64   │
╞═══════╪══════╪═══════╡
│ Alice ┆ 25   ┆ 85.5  │
│ Bob   ┆ null ┆ 90.0  │
│ null  ┆ 30   ┆ null  │
│ David ┆ 22   ┆ 78.0  │
│ Eve   ┆ null ┆ 92.5  │
└───────┴──────┴───────┘

Detecting Null Values

First, you need to find where nulls exist. Use the is_null() method to check each cell. It returns a boolean mask. For the opposite, use is_not_null().


# Detect null values
null_mask = df.select(
    pl.all().is_null().name.suffix("_is_null")
)
print(null_mask)

Output:


shape: (5, 3)
┌───────────┬───────────┬────────────┐
│ name_is_null ┆ age_is_null ┆ score_is_null │
│ ---       ┆ ---       ┆ ---        │
│ bool      ┆ bool      ┆ bool       │
╞═══════════╪═══════════╪════════════╡
│ false     ┆ false     ┆ false      │
│ false     ┆ true      ┆ false      │
│ true      ┆ false     ┆ true       │
│ false     ┆ false     ┆ false      │
│ false     ┆ true      ┆ false      │
└───────────┴───────────┴────────────┘

You can also count nulls per column with null_count(). This helps you understand the data quality quickly.


# Count nulls per column
null_counts = df.null_count()
print(null_counts)

Output:


shape: (1, 3)
┌──────┬─────┬───────┐
│ name ┆ age ┆ score │
│ ---  ┆ --- ┆ ---   │
│ u32  ┆ u32 ┆ u32   │
╞══════╪═════╪═══════╡
│ 1    ┆ 2   ┆ 1     │
└──────┴─────┴───────┘

Dropping Rows with Nulls

The simplest way to handle nulls is to drop them. Use the drop_nulls() method. By default, it removes any row that has at least one null. You can specify a subset of columns to check.


# Drop rows with any null
df_dropped_all = df.drop_nulls()
print(df_dropped_all)

# Drop rows with null only in 'age' column
df_dropped_age = df.drop_nulls(subset=["age"])
print(df_dropped_age)

Output:


# First output: rows with any null removed
shape: (2, 3)
┌───────┬─────┬───────┐
│ name  ┆ age ┆ score │
│ ---   ┆ --- ┆ ---   │
│ str   ┆ i64 ┆ f64   │
╞═══════╪═════╪═══════╡
│ Alice ┆ 25  ┆ 85.5  │
│ David ┆ 22  ┆ 78.0  │
└───────┴─────┴───────┘

# Second output: rows with null in age removed
shape: (3, 3)
┌───────┬─────┬───────┐
│ name  ┆ age ┆ score │
│ ---   ┆ --- ┆ ---   │
│ str   ┆ i64 ┆ f64   │
╞═══════╪═════╪═══════╡
│ Alice ┆ 25  ┆ 85.5  │
│ null  ┆ 30  ┆ null  │
│ David ┆ 22  ┆ 78.0  │
└───────┴─────┴───────┘

Important: Dropping rows reduces your dataset. Use it only when missing data is random and not too large.

Filling Null Values

Instead of dropping, you can fill nulls with a specific value. Use the fill_null() method. You can fill with a constant, a column mean, or a forward/backward strategy.

Fill with a Constant


# Fill nulls with a constant value
df_filled_constant = df.with_columns(
    pl.col("age").fill_null(0),
    pl.col("score").fill_null(0.0)
)
print(df_filled_constant)

Output:


shape: (5, 3)
┌───────┬─────┬───────┐
│ name  ┆ age ┆ score │
│ ---   ┆ --- ┆ ---   │
│ str   ┆ i64 ┆ f64   │
╞═══════╪═════╪═══════╡
│ Alice ┆ 25  ┆ 85.5  │
│ Bob   ┆ 0   ┆ 90.0  │
│ null  ┆ 30  ┆ 0.0   │
│ David ┆ 22  ┆ 78.0  │
│ Eve   ┆ 0   ┆ 92.5  │
└───────┴─────┴───────┘

Fill with Column Mean

For numeric columns, filling with the mean is common. It preserves the overall distribution.


# Fill nulls with column mean
mean_age = df.select(pl.col("age").mean()).item()
mean_score = df.select(pl.col("score").mean()).item()

df_filled_mean = df.with_columns(
    pl.col("age").fill_null(mean_age),
    pl.col("score").fill_null(mean_score)
)
print(df_filled_mean)

Output:


shape: (5, 3)
┌───────┬──────────┬──────────┐
│ name  ┆ age      ┆ score    │
│ ---   ┆ ---      ┆ ---      │
│ str   ┆ f64      ┆ f64      │
╞═══════╪══════════╪══════════╡
│ Alice ┆ 25.0     ┆ 85.5     │
│ Bob   ┆ 25.666667┆ 90.0     │
│ null  ┆ 30.0     ┆ 86.5     │
│ David ┆ 22.0     ┆ 78.0     │
│ Eve   ┆ 25.666667┆ 92.5     │
└───────┴──────────┴──────────┘

Forward Fill and Backward Fill

For time series data, use forward fill (fill_null(strategy="forward")) or backward fill (fill_null(strategy="backward")). This carries the last known value forward or the next value backward.


# Forward fill nulls
df_ffill = df.with_columns(
    pl.all().fill_null(strategy="forward")
)
print(df_ffill)

Output:


shape: (5, 3)
┌───────┬─────┬───────┐
│ name  ┆ age ┆ score │
│ ---   ┆ --- ┆ ---   │
│ str   ┆ i64 ┆ f64   │
╞═══════╪═════╪═══════╡
│ Alice ┆ 25  ┆ 85.5  │
│ Bob   ┆ 25  ┆ 90.0  │
│ Bob   ┆ 30  ┆ 90.0  │
│ David ┆ 22  ┆ 78.0  │
│ Eve   ┆ 22  ┆ 92.5  │
└───────┴─────┴───────┘

Interpolation

Polars also supports linear interpolation for numeric columns. This is great for time series gaps. Use the interpolate() method on a column.


# Interpolate nulls in score column
df_interp = df.with_columns(
    pl.col("score").interpolate()
)
print(df_interp)

Output:


shape: (5, 3)
┌───────┬──────┬───────┐
│ name  ┆ age  ┆ score │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ i64  ┆ f64   │
╞═══════╪══════╪═══════╡
│ Alice ┆ 25   ┆ 85.5  │
│ Bob   ┆ null ┆ 90.0  │
│ null  ┆ 30   ┆ 84.0  │
│ David ┆ 22   ┆ 78.0  │
│ Eve   ┆ null ┆ 92.5  │
└───────┴──────┴───────┘

Notice the null in the score column was replaced with 84.0, which is the linear interpolation between 90.0 and 78.0.

Replacing Nulls in Specific Columns

You can target specific columns using pl.col(). This gives you fine control. For example, fill only the name column with "Unknown".


# Replace nulls in name column
df_filled_name = df.with_columns(
    pl.col("name").fill_null("Unknown")
)
print(df_filled_name)

Output:


shape: (5, 3)
┌─────────┬──────┬───────┐
│ name    ┆ age  ┆ score │
│ ---     ┆ ---  ┆ ---   │
│ str     ┆ i64  ┆ f64   │
╞═════════╪══════╪═══════╡
│ Alice   ┆ 25   ┆ 85.5  │
│ Bob     ┆ null ┆ 90.0  │
│ Unknown ┆ 30   ┆ null  │
│ David   ┆ 22   ┆ 78.0  │
│ Eve     ┆ null ┆ 92.5  │
└─────── ──┴──────┴───────┘

Working with NaNs

Polars distinguishes between null and NaN. NaNs are floating-point values. Use is_nan() to detect them. You can replace NaNs with fill_nan().


# DataFrame with NaN
df_nan = pl.DataFrame({
    "value": [1.0, float('nan'), 3.0, float('nan'), 5.0]
})

# Detect NaN
print(df_nan.select(pl.col("value").is_nan()))

# Fill NaN with 0
df_filled_nan = df_nan.with_columns(
    pl.col("value").fill_nan(0.0)
)
print(df_filled_nan)

Output:


# is_nan output
shape: (5, 1)
┌───────┐
│ value │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ false │
│ true  │
│ false │
└───────┘

# After fill_nan
shape: (5, 1)
┌───────┐
│ value │
│ ---   │
│ f64   │
╞═══════╡
│ 1.0   │
│ 0.0   │
│ 3.0   │
│ 0.0   │
│ 5.0   │
└───────┘

Best Practices

Always inspect your data first. Use null_count() and describe() to understand missing patterns. Choose a strategy based on your data type and context. For important numeric fields, consider interpolation or mean filling. For categorical data, filling with a placeholder like "Unknown" often works.

Remember that dropping too many rows can bias your analysis. Use drop_nulls() sparingly. When working with time series, forward fill is usually better than dropping.

For more on data preparation, check our guide on Polars Data Type Casting & Schema Management. Also see Polars DataFrame Sorting & Slicing Guide for ordering your cleaned data.

Conclusion

Handling null and missing values is a core part of data cleaning. Polars offers fast and flexible methods like drop_nulls(), fill_null(), and interpolate(). You can target specific columns, use strategies like forward fill, or replace NaNs separately. By mastering these techniques, you ensure your data is ready for analysis. Practice with different scenarios to build confidence. For a deeper dive into data manipulation, explore Polars Expression System Core Concept.