Last modified: Jan 23, 2025 By Alexander Williams

Python Statsmodels Summary() Explained

The summary() function in Python's Statsmodels library is a powerful tool for statistical analysis. It provides a detailed overview of model results. This guide will help you understand how to use it effectively.

What is Statsmodels Summary()?

The summary() method is used to generate a comprehensive report of a statistical model. It includes coefficients, standard errors, p-values, and more. This is essential for interpreting model performance.

How to Use Summary()

To use summary(), you first need to fit a model. For example, let's fit a linear regression model using Statsmodels.


import statsmodels.api as sm
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 5, 4, 5])

# Add a constant to the model
X = sm.add_constant(X)

# Fit the model
model = sm.OLS(Y, X).fit()

# Generate summary
print(model.summary())

This code fits a simple linear regression model and prints the summary. The output includes key statistics like R-squared, coefficients, and p-values.

Interpreting the Summary Output

The summary output is divided into several sections. Each section provides specific information about the model. Let's break it down.

Model Summary

The first section provides an overview of the model. It includes the R-squared value, which indicates how well the model fits the data. A higher R-squared value means a better fit.

Coefficients

The coefficients section shows the estimated coefficients for each predictor. It also includes standard errors, t-values, and p-values. These help determine the significance of each predictor.

Diagnostics

The diagnostics section includes tests for normality, heteroscedasticity, and autocorrelation. These tests help assess the validity of the model assumptions.

Example Output

Here is an example of what the summary output might look like:


                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.800
Model:                            OLS   Adj. R-squared:                  0.733
Method:                 Least Squares   F-statistic:                     12.00
Date:                Mon, 01 Jan 2023   Prob (F-statistic):             0.0385
Time:                        12:00:00   Log-Likelihood:                -5.0000
No. Observations:                   5   AIC:                             14.00
Df Residuals:                       3   BIC:                             13.00
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.0000      0.707      2.828      0.066      -0.366       4.366
x1             0.8000      0.231      3.464      0.038       0.064       1.536
==============================================================================
Omnibus:                        0.000   Durbin-Watson:                   2.000
Prob(Omnibus):                  1.000   Jarque-Bera (JB):                0.000
Skew:                           0.000   Prob(JB):                        1.000
Kurtosis:                       2.000   Cond. No.                         5.00
==============================================================================

This output shows the model's R-squared, coefficients, and diagnostic tests. It helps you understand the model's performance and validity.

Tips for Using Summary()

Here are some tips to get the most out of the summary() function:

Check R-squared: A higher R-squared indicates a better fit.
Look at p-values: Low p-values suggest significant predictors.
Review diagnostics: Ensure the model meets statistical assumptions.

Conclusion

The summary() function in Statsmodels is a vital tool for statistical analysis. It provides detailed insights into model performance and validity. By understanding its output, you can make informed decisions about your data.

For more advanced techniques, check out our guides on Python Statsmodels ARIMA and Python Statsmodels GLM.