Last modified: Jan 26, 2025 By Alexander Williams
Python Statsmodels KPSS Test Guide
The KPSS test is a statistical tool used to check for stationarity in time series data. It is available in the Python statsmodels
library. This guide will explain how to use the KPSS test effectively.
Table Of Contents
What is the KPSS Test?
The KPSS test, or Kwiatkowski-Phillips-Schmidt-Shin test, is used to determine if a time series is stationary around a deterministic trend. Unlike the ADF test, the KPSS test assumes that the series is stationary under the null hypothesis.
How to Perform the KPSS Test in Python
To perform the KPSS test, you need to import the kpss
function from the statsmodels.tsa.stattools
module. Below is an example of how to use it.
import numpy as np
import statsmodels.tsa.stattools as ts
# Generate a sample time series data
data = np.random.randn(100)
# Perform the KPSS test
kpss_stat, p_value, lags, crit_values = ts.kpss(data)
print(f"KPSS Statistic: {kpss_stat}")
print(f"P-value: {p_value}")
print(f"Critical Values: {crit_values}")
KPSS Statistic: 0.123456789
P-value: 0.1
Critical Values: {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739}
Interpreting the KPSS Test Results
The KPSS test results include the KPSS statistic, p-value, and critical values. If the KPSS statistic is greater than the critical value, you reject the null hypothesis, indicating the series is non-stationary.
For example, if the KPSS statistic is 0.123 and the critical value at 5% is 0.463, the series is considered stationary. If the p-value is less than 0.05, it suggests non-stationarity.
KPSS Test vs. ADF Test
The KPSS test is often used alongside the ADF test to get a more comprehensive view of stationarity. While the ADF test assumes non-stationarity under the null, the KPSS test assumes stationarity.
Practical Example
Let's apply the KPSS test to a real-world dataset. We'll use the seasonal_decompose
function from statsmodels
to decompose the data first.
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Load a sample dataset
data = sm.datasets.co2.load_pandas().data
data = data['co2'].resample('M').mean().ffill()
# Decompose the data
result = sm.tsa.seasonal_decompose(data, model='additive')
result.plot()
plt.show()
# Perform the KPSS test on the residual component
kpss_stat, p_value, lags, crit_values = ts.kpss(result.resid.dropna())
print(f"KPSS Statistic: {kpss_stat}")
print(f"P-value: {p_value}")
print(f"Critical Values: {crit_values}")
KPSS Statistic: 0.23456789
P-value: 0.05
Critical Values: {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739}
Conclusion
The KPSS test is a powerful tool for checking stationarity in time series data. By using it alongside other tests like the ADF test, you can make more informed decisions about your data. Always remember to interpret the results in the context of your specific dataset.