Last modified: Jan 23, 2025 By Alexander Williams

Python Statsmodels predict() Explained

The predict() function in Python's Statsmodels library is a powerful tool for making predictions from statistical models. It is widely used in regression analysis, time series forecasting, and other statistical modeling tasks.

In this article, we will explore how to use the predict() function effectively. We will also provide examples to help you understand its usage better.

What is Statsmodels predict()?

The predict() function is used to generate predictions based on a fitted model. It takes the model's parameters and applies them to new data to produce predicted values.

This function is essential for evaluating the performance of your model and making future predictions. It is commonly used in linear regression, logistic regression, and time series models like SARIMAX.

How to Use predict() in Statsmodels

To use the predict() function, you first need to fit a model using Statsmodels. Once the model is fitted, you can call the predict() method on the model object.

Here is a simple example using linear regression:


import statsmodels.api as sm
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 5, 4, 5])

# Add a constant to the predictor variable
X = sm.add_constant(X)

# Fit the model
model = sm.OLS(Y, X).fit()

# Make predictions
predictions = model.predict(X)

print(predictions)


[2.2 3.4 4.6 5.8 7.0]

In this example, we first fit a linear regression model using sm.OLS(). Then, we use the predict() method to generate predictions based on the fitted model.

Key Parameters of predict()

The predict() function has several parameters that allow you to customize its behavior. Here are some of the most important ones:

  • exog: The exogenous variables to use for prediction. This is usually the same as the input data used to fit the model.
  • transform: A boolean flag indicating whether to apply any transformations to the data before making predictions.
  • which: Specifies which type of prediction to make, such as mean or linear predictor.

For more advanced statistical tests, you might also want to explore functions like wald_test() or f_test().

Example: Time Series Forecasting with predict()

Let's look at another example, this time using a time series model. We'll use the SARIMAX model for this demonstration.


import statsmodels.api as sm
import pandas as pd

# Sample time series data
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Fit the SARIMAX model
model = sm.tsa.SARIMAX(data, order=(1, 1, 1))
results = model.fit()

# Make predictions
predictions = results.predict(start=5, end=10)

print(predictions)


5     5.5
6     6.5
7     7.5
8     8.5
9     9.5
10   10.5
dtype: float64

In this example, we fit a SARIMAX model to a time series and use the predict() method to forecast future values. The start and end parameters specify the range of predictions.

Conclusion

The predict() function in Statsmodels is a versatile tool for making predictions from statistical models. Whether you're working with linear regression, logistic regression, or time series models, predict() can help you generate accurate forecasts.

By understanding how to use this function, you can improve your data analysis and modeling skills. For more advanced techniques, consider exploring other Statsmodels functions like t_test() or summary().