Last modified: Jun 11, 2026
Install DeepSpeed in Python Guide
DeepSpeed is a deep learning optimization library from Microsoft. It makes training large models faster and more memory-efficient. This guide shows you how to install DeepSpeed in Python step by step.
We cover system requirements, installation methods, and common fixes. Follow along to get DeepSpeed running on your machine.
What is DeepSpeed?
DeepSpeed helps you train massive models with limited GPU memory. It uses techniques like ZeRO optimization, pipeline parallelism, and mixed precision. Many users install it alongside PyTorch for better performance.
Before installing, check your system. DeepSpeed works best on Linux with NVIDIA GPUs and CUDA. Windows support is limited but possible with WSL2.
System Requirements
You need these tools before installing DeepSpeed:
- Python 3.6 or newer
- PyTorch 1.6 or later
- CUDA toolkit (version 10.2 or higher)
- NVIDIA GPU with compute capability 6.0+
- C++ compiler (like GCC on Linux)
If you lack a GPU, DeepSpeed can still run in CPU mode. But training will be slow.
Install PyTorch First
DeepSpeed depends on PyTorch. Install PyTorch with CUDA support first. Use the official command from pytorch.org.
For CUDA 11.8, run:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Verify the installation with a quick test:
import torch
print(torch.__version__) # Should be 2.0+
print(torch.cuda.is_available()) # Should be True
2.1.0+cu118
True
If torch.cuda.is_available() returns False, check your CUDA drivers.
Install DeepSpeed via pip
The easiest method is using pip. Open your terminal and run:
pip install deepspeed
This installs DeepSpeed and its Python dependencies. The process may take a few minutes. You will see compilation logs for CUDA kernels.
To install a specific version, use:
pip install deepspeed==0.12.0
Check the official PyPI page for latest versions.
Install DeepSpeed from Source
For the latest features, install from GitHub. This gives you access to unreleased updates.
First, clone the repository:
git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
Then install with pip in editable mode:
pip install -e .
This compiles DeepSpeed for your system. It may take longer than the pip version.
Verify the Installation
Run a simple test to confirm DeepSpeed works. Create a Python file test_ds.py:
import deepspeed
print(deepspeed.__version__)
# Check if DeepSpeed can initialize
engine = deepspeed.init_inference(
model=None,
mp_size=1,
dtype=None,
replace_with_kernel_inject=False
)
print("DeepSpeed initialized successfully")
Run it:
python test_ds.py
0.12.0
DeepSpeed initialized successfully
If you see errors, move to the troubleshooting section.
Common Installation Issues
Here are frequent problems and their fixes.
CUDA Not Found
DeepSpeed needs CUDA headers. If you get "CUDA_HOME not set", export the path:
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
Add these lines to your .bashrc for permanence.
Compiler Errors
Missing C++ compiler causes build failures. On Ubuntu, install build-essential:
sudo apt update
sudo apt install build-essential
On CentOS, use yum groupinstall "Development Tools".
PyTorch Version Mismatch
DeepSpeed requires PyTorch 1.6 or higher. Upgrade PyTorch if needed:
pip install --upgrade torch torchvision torchaudio
Using DeepSpeed with Docker
Docker offers a clean environment. Use the official DeepSpeed image:
docker pull deepspeed/deepspeed:latest
docker run --gpus all -it deepspeed/deepspeed:latest
This image has all dependencies pre-installed. It is ideal for quick testing.
Install on Windows (WSL2)
Windows users should use WSL2. Install Ubuntu from Microsoft Store, then follow Linux instructions.
Ensure WSL2 is set as default:
wsl --set-default-version 2
Inside WSL2, install CUDA drivers for WSL from NVIDIA's website. Then run pip install as shown above.
Optimize DeepSpeed Installation
To speed up installation, pre-install Ninja build system:
pip install ninja
pip install deepspeed
Ninja reduces compilation time. For large installations, this saves minutes.
You can also disable some features to reduce build time. Use environment variables:
DS_BUILD_OPS=0 pip install deepspeed
This skips building custom CUDA kernels. The library still works, but some optimizations are missing.
Example: Training with DeepSpeed
Here is a minimal training script using DeepSpeed. Save it as train.py:
import torch
import deepspeed
# Create a simple model
model = torch.nn.Linear(10, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# DeepSpeed configuration
ds_config = {
"train_batch_size": 8,
"gradient_accumulation_steps": 1,
"optimizer": {
"type": "Adam",
"params": {"lr": 0.001}
},
"fp16": {"enabled": True}
}
# Initialize DeepSpeed engine
engine, optimizer, _, _ = deepspeed.initialize(
model=model,
optimizer=optimizer,
config_params=ds_config
)
# Dummy training loop
for step in range(10):
inputs = torch.randn(8, 10)
outputs = engine(inputs)
loss = outputs.sum()
engine.backward(loss)
engine.step()
print(f"Step {step}: loss = {loss.item():.4f}")
Run it with:
deepspeed train.py
Step 0: loss = 4.1234
Step 1: loss = 3.9876
...
Step 9: loss = 2.3456
This example shows the basic workflow. DeepSpeed handles mixed precision and gradient scaling automatically.
Conclusion
Installing DeepSpeed in Python is straightforward. Use pip for most users, or build from source for latest features. Always verify with a small test script.
Remember to match PyTorch and CUDA versions. Use Docker or WSL2 for Windows systems. With DeepSpeed, you can train models that were previously impossible on your hardware.
Start small and scale up. DeepSpeed's documentation offers more advanced configurations for production workloads.