Last modified: Jun 06, 2026

Install Ollama Python Guide

Ollama lets you run large language models locally. It is free and private. You can use Ollama with Python to build smart apps. This guide shows you how to install Ollama Python step by step.

You do not need a powerful computer. Ollama works on most systems. We will cover installation, setup, and a simple test. Let's start.

What is Ollama?

Ollama is a tool to run AI models on your machine. It supports models like Llama 2, Mistral, and Gemma. You can chat with them or use them in code.

Python is the main language for AI work. Ollama Python client makes it easy to call models from your scripts. You install it once and use it anywhere.

Prerequisites

Before you install Ollama Python, check these things:

Python 3.8 or newer installed on your system.
pip package manager (usually comes with Python).
At least 8GB of RAM for small models.
Internet connection to download models.

You can check your Python version with this command:

python --version

If you see version 3.8 or higher, you are ready. If not, download Python from the official site.

Step 1: Install Ollama Server

The Python client needs the Ollama server. First, install Ollama on your system.

Go to ollama.com and download the installer for your OS. Windows, macOS, and Linux are supported.

After installation, open a terminal and run:

ollama serve

This starts the server on localhost:11434. Keep this terminal open.

Step 2: Install Ollama Python Client

Now install the Python package. Use pip in a new terminal:

pip install ollama

This downloads the client library. It is lightweight and fast.

Verify the installation:

pip show ollama

You should see version details. If not, check your internet and try again.

Step 3: Pull a Model

Ollama needs a model to run. Download one with the pull command. For example, get the small Llama 3.2 model:

ollama pull llama3.2:1b

This model is only 1.3GB. It works on most computers. You can also use mistral or gemma:2b for smaller size.

Wait for the download to finish. The terminal shows progress.

Step 4: Test with Python Code

Open a Python file or interactive shell. Write this simple test:

# test_ollama.py
import ollama

# Send a prompt to the model
response = ollama.chat(model='llama3.2:1b', messages=[
    {'role': 'user', 'content': 'What is Python?'}
])

# Print the reply
print(response['message']['content'])

Run the script:

python test_ollama.py

You should see a short explanation about Python. If it works, Ollama Python is ready.

Step 5: Use the Generate Function

For simple text generation, use the generate method. It is faster than chat for one-shot tasks.

# generate_example.py
import ollama

# Generate a completion
result = ollama.generate(model='llama3.2:1b', prompt='Explain gravity in one sentence.')

# Print the result
print(result['response'])

Output example:

Gravity is a force that attracts objects with mass toward each other.

The generate function returns only the response text. Use it for simple queries.

Step 6: Stream Responses

Large models take time to respond. Use streaming to see output as it comes. This makes apps feel faster.

# stream_example.py
import ollama

# Stream the response
stream = ollama.chat(model='llama3.2:1b', messages=[
    {'role': 'user', 'content': 'Write a short poem.'}
], stream=True)

# Print each chunk
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Run it and watch the poem appear word by word. This is great for chatbots.

Step 7: Handle Errors

Sometimes things go wrong. Common errors and fixes:

Connection refused: Make sure Ollama server is running.
Model not found: Pull the model first with ollama pull.
Out of memory: Use a smaller model like llama3.2:1b.

Always check the server terminal for logs. Most issues are easy to fix.

Step 8: Build a Simple Chat App

Now combine everything into a mini chatbot. This script keeps a conversation history.

# chatbot.py
import ollama

# Start with system message
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'}]

print("Chat with AI (type 'quit' to stop)")
while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    messages.append({'role': 'user', 'content': user_input})
    response = ollama.chat(model='llama3.2:1b', messages=messages)
    reply = response['message']['content']
    print(f"AI: {reply}")
    messages.append({'role': 'assistant', 'content': reply})

Run it and have a real conversation. The bot remembers context.

Conclusion

Installing Ollama Python is simple. You installed the server, client, and a model. You tested with code and built a chatbot. Now you can build AI apps locally without paying for APIs.

Start with small models. Experiment with different prompts. Ollama Python gives you full control over AI on your machine.