Last modified: Jun 11, 2026

Install DeepSpeed in Python Guide

DeepSpeed is a deep learning optimization library from Microsoft. It makes training large models faster and more memory-efficient. This guide shows you how to install DeepSpeed in Python step by step.

We cover system requirements, installation methods, and common fixes. Follow along to get DeepSpeed running on your machine.

What is DeepSpeed?

DeepSpeed helps you train massive models with limited GPU memory. It uses techniques like ZeRO optimization, pipeline parallelism, and mixed precision. Many users install it alongside PyTorch for better performance.

Before installing, check your system. DeepSpeed works best on Linux with NVIDIA GPUs and CUDA. Windows support is limited but possible with WSL2.

System Requirements

You need these tools before installing DeepSpeed:

  • Python 3.6 or newer
  • PyTorch 1.6 or later
  • CUDA toolkit (version 10.2 or higher)
  • NVIDIA GPU with compute capability 6.0+
  • C++ compiler (like GCC on Linux)

If you lack a GPU, DeepSpeed can still run in CPU mode. But training will be slow.

Install PyTorch First

DeepSpeed depends on PyTorch. Install PyTorch with CUDA support first. Use the official command from pytorch.org.

For CUDA 11.8, run:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Verify the installation with a quick test:

import torch
print(torch.__version__)   # Should be 2.0+
print(torch.cuda.is_available())  # Should be True
2.1.0+cu118
True

If torch.cuda.is_available() returns False, check your CUDA drivers.

Install DeepSpeed via pip

The easiest method is using pip. Open your terminal and run:

pip install deepspeed

This installs DeepSpeed and its Python dependencies. The process may take a few minutes. You will see compilation logs for CUDA kernels.

To install a specific version, use:

pip install deepspeed==0.12.0

Check the official PyPI page for latest versions.

Install DeepSpeed from Source

For the latest features, install from GitHub. This gives you access to unreleased updates.

First, clone the repository:

git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed

Then install with pip in editable mode:

pip install -e .

This compiles DeepSpeed for your system. It may take longer than the pip version.

Verify the Installation

Run a simple test to confirm DeepSpeed works. Create a Python file test_ds.py:

import deepspeed
print(deepspeed.__version__)

# Check if DeepSpeed can initialize
engine = deepspeed.init_inference(
    model=None,
    mp_size=1,
    dtype=None,
    replace_with_kernel_inject=False
)
print("DeepSpeed initialized successfully")

Run it:

python test_ds.py
0.12.0
DeepSpeed initialized successfully

If you see errors, move to the troubleshooting section.

Common Installation Issues

Here are frequent problems and their fixes.

CUDA Not Found

DeepSpeed needs CUDA headers. If you get "CUDA_HOME not set", export the path:

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Add these lines to your .bashrc for permanence.

Compiler Errors

Missing C++ compiler causes build failures. On Ubuntu, install build-essential:

sudo apt update
sudo apt install build-essential

On CentOS, use yum groupinstall "Development Tools".

PyTorch Version Mismatch

DeepSpeed requires PyTorch 1.6 or higher. Upgrade PyTorch if needed:

pip install --upgrade torch torchvision torchaudio

Using DeepSpeed with Docker

Docker offers a clean environment. Use the official DeepSpeed image:

docker pull deepspeed/deepspeed:latest
docker run --gpus all -it deepspeed/deepspeed:latest

This image has all dependencies pre-installed. It is ideal for quick testing.

Install on Windows (WSL2)

Windows users should use WSL2. Install Ubuntu from Microsoft Store, then follow Linux instructions.

Ensure WSL2 is set as default:

wsl --set-default-version 2

Inside WSL2, install CUDA drivers for WSL from NVIDIA's website. Then run pip install as shown above.

Optimize DeepSpeed Installation

To speed up installation, pre-install Ninja build system:

pip install ninja
pip install deepspeed

Ninja reduces compilation time. For large installations, this saves minutes.

You can also disable some features to reduce build time. Use environment variables:

DS_BUILD_OPS=0 pip install deepspeed

This skips building custom CUDA kernels. The library still works, but some optimizations are missing.

Example: Training with DeepSpeed

Here is a minimal training script using DeepSpeed. Save it as train.py:

import torch
import deepspeed

# Create a simple model
model = torch.nn.Linear(10, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# DeepSpeed configuration
ds_config = {
    "train_batch_size": 8,
    "gradient_accumulation_steps": 1,
    "optimizer": {
        "type": "Adam",
        "params": {"lr": 0.001}
    },
    "fp16": {"enabled": True}
}

# Initialize DeepSpeed engine
engine, optimizer, _, _ = deepspeed.initialize(
    model=model,
    optimizer=optimizer,
    config_params=ds_config
)

# Dummy training loop
for step in range(10):
    inputs = torch.randn(8, 10)
    outputs = engine(inputs)
    loss = outputs.sum()
    engine.backward(loss)
    engine.step()
    print(f"Step {step}: loss = {loss.item():.4f}")

Run it with:

deepspeed train.py
Step 0: loss = 4.1234
Step 1: loss = 3.9876
...
Step 9: loss = 2.3456

This example shows the basic workflow. DeepSpeed handles mixed precision and gradient scaling automatically.

Conclusion

Installing DeepSpeed in Python is straightforward. Use pip for most users, or build from source for latest features. Always verify with a small test script.

Remember to match PyTorch and CUDA versions. Use Docker or WSL2 for Windows systems. With DeepSpeed, you can train models that were previously impossible on your hardware.

Start small and scale up. DeepSpeed's documentation offers more advanced configurations for production workloads.