Last modified: Dec 22, 2025 By Alexander Williams

Time Series Analysis Python Guide & Forecasting

Time series data is everywhere. It tracks stock prices, weather, and sales. Python makes analyzing it easy. This guide will show you how.

You will learn key steps and tools. We use pandas and statsmodels. These are powerful Python libraries.

What is Time Series Data?

A time series is a data sequence. It is indexed by time. Each point links to a specific timestamp.

Examples include daily temperature or monthly revenue. The order of data points is crucial. It reveals trends and cycles.

Analysis helps us understand past patterns. It also allows us to predict future values. This is called forecasting.

Setting Up Your Python Environment

First, install necessary libraries. Use pip, the Python package installer. Open your terminal or command prompt.


pip install pandas numpy matplotlib statsmodels

These packages form the core toolkit. Pandas handles data manipulation. Matplotlib creates plots. Statsmodels provides statistical models.

Import them in your Python script or notebook. This is a standard practice.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

Loading and Inspecting Time Series Data

Data often comes from CSV files or databases. Pandas read_csv function is perfect for this. It loads data into a DataFrame.

Let's load a sample dataset. We will use airline passenger numbers.


# Load the time series data
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
df = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
print(df.head())

            Passengers
Month                 
1949-01-01         112
1949-02-01         118
1949-03-01         132
1949-04-01         129
1949-05-01         121

The parse_dates argument converts the 'Month' column. It becomes a datetime object. Setting it as the index is key.

Always inspect your data first. Use df.info() and df.describe(). This is part of Exploratory Data Analysis Python Guide & Techniques.

Data Cleaning and Preparation

Real-world data is messy. It may have missing values or incorrect formats. Cleaning is essential for accurate analysis.

Check for missing dates or values. Pandas provides tools for this.


# Check for missing values
print(df.isnull().sum())

# Handle missing data (forward fill example)
df_filled = df.fillna(method='ffill')

You might need to resample data. Convert daily data to monthly averages. Use the resample method.


# Resample to quarterly mean, if needed
df_quarterly = df.resample('Q').mean()
print(df_quarterly.head())

For more on data manipulation, see our Master Data Analysis with Pandas Python Guide.

Visualizing Time Series Data

Visualization reveals patterns. Plot your data using matplotlib. A simple line chart is a great start.


plt.figure(figsize=(12,6))
plt.plot(df.index, df['Passengers'])
plt.title('Monthly Airline Passengers (1949-1960)')
plt.xlabel('Year')
plt.ylabel('Passengers (Thousands)')
plt.grid(True)
plt.show()

Look for trends. Is the data increasing over time? Also look for seasonality. Are there repeating patterns each year?

Visualization helps confirm these components. It guides your next analytical steps.

Decomposing Time Series

A time series has components. These are trend, seasonality, and residual. Decomposition separates them.

Use seasonal_decompose from statsmodels. It helps you understand the structure.


# Decompose the time series
decomposition = seasonal_decompose(df['Passengers'], model='multiplicative')

# Plot the components
fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

The trend shows the long-term direction. Seasonality shows regular fluctuations. The residual is the random noise left over.

Understanding these parts is crucial. It informs which forecasting model to use.

Building a Forecasting Model

Forecasting predicts future values. A popular model is ARIMA. It stands for AutoRegressive Integrated Moving Average.

First, check if the data is stationary. Stationary data has constant statistical properties. Use a statistical test.


from statsmodels.tsa.stattools import adfuller

# Perform Augmented Dickey-Fuller test
result = adfuller(df['Passengers'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])

ADF Statistic: 0.8153688792060498
p-value: 0.991880243437641

A high p-value means data is non-stationary. We need to difference the data. This makes it stationary.


# Differencing the data
df['Passengers_diff'] = df['Passengers'].diff()
df['Passengers_diff'].dropna().plot()
plt.title('Differenced Passenger Data')
plt.show()

Now, fit an ARIMA model. We use the ARIMA class from statsmodels.


from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA model (order = (p,d,q))
model = ARIMA(df['Passengers'], order=(5,1,0))
model_fit = model.fit()

# Print model summary
print(model_fit.summary())

The summary shows model parameters. It includes statistical significance. Use it to evaluate model quality.

Making Predictions and Evaluating

Use the fitted model to forecast. The predict method generates future values.


# Forecast the next 24 months
forecast_steps = 24
forecast = model_fit.forecast(steps=forecast_steps)

# Create index for future dates
last_date = df.index[-1]
future_dates = pd.date_range(start=last_date, periods=forecast_steps+1, freq='MS')[1:]

# Plot original data and forecast
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Passengers'], label='Historical')
plt.plot(future_dates, forecast, label='Forecast', color='red')
plt.title('Airline Passengers Forecast')
plt.xlabel('Year')
plt.ylabel('Passengers')
plt.legend()
plt.grid(True)
plt.show()

Evaluate forecast accuracy. Use metrics like Mean Absolute Error (MAE). Compare predictions to actual data if available.

Always validate your model. Split data into training and testing sets. This prevents overfitting.

Handling External Data Sources

Data often comes from Excel files. You can use xlrd or openpyxl with pandas. Our guide on Integrate Python xlrd with pandas for Data Analysis explains this.


# Example: Reading from an Excel file
# df_excel = pd.read_excel('data.xlsx', parse_dates=['Date'], index_col='Date')

The process is similar to CSV. Ensure dates are parsed correctly. Set the datetime column as the index.

Conclusion

Time series analysis is a powerful skill. Python provides excellent tools. Start with data loading and cleaning.

Visualize to see patterns. Decompose to understand components. Build models like ARIMA for forecasting.

Remember to validate your predictions. Practice with different datasets. This builds your intuition and skill.

The journey from raw data to forecast is rewarding. It unlocks insights from temporal patterns. Start your analysis today.