Last modified: Dec 02, 2025 By Alexander Williams

Stream Large Responses with Python FastAPI

FastAPI is a modern Python web framework. It is known for its speed and ease of use.

But handling large data can be tricky. Sending big files or datasets all at once can crash your app.

It uses up too much memory. The solution is to stream the response.

Streaming sends data in small chunks. This keeps your application stable and fast.

This guide will show you how to do it. We will cover the basics and provide code examples.

Why Stream Large Responses?

Imagine sending a 5GB video file. Loading it all into memory is not efficient.

Your server's memory would spike. Other users might experience slow performance.

Streaming solves this problem. It sends the file piece by piece to the client.

The server only holds a small part in memory at any time. This is crucial for scalability.

It also allows the client to start processing data immediately. They don't have to wait for the whole download.

Common use cases include large CSV exports, video streaming, and log file downloads.

For more on making your app efficient, see our FastAPI Performance Optimization Guide.

Core Concept: StreamingResponse

FastAPI provides the StreamingResponse class. It is your main tool for streaming.

You import it from fastapi.responses. It takes an async generator or a normal generator.

This generator yields chunks of data (like bytes). FastAPI sends each chunk as it is produced.

You can also set media types and headers. This tells the client what kind of data is coming.

Let's look at a basic example. We will stream a fake large dataset.


from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio

app = FastAPI()

# An async generator function
async def fake_data_streamer():
    """Yields lines of data as bytes."""
    for i in range(100000):  # Simulating many rows
        # Simulate a row of data
        data_chunk = f"This is data row number {i}\\n"
        yield data_chunk.encode()  # Convert string to bytes
        # Optional small delay to simulate real work
        await asyncio.sleep(0.001)

@app.get("/stream-data")
async def stream_large_data():
    """Endpoint to stream a large dataset."""
    return StreamingResponse(
        content=fake_data_streamer(),
        media_type="text/plain"
    )

The fake_data_streamer function is an async generator. It uses yield.

Each loop creates a line of text. It encodes it to bytes and yields it.

The StreamingResponse uses this generator as its content. It streams each chunk.

The client receives the data line by line. The server memory stays low.

Streaming Large Files from Disk

A common task is sending large files. You can stream them directly from the filesystem.

Python's file operations can be used in a generator. This avoids reading the whole file into RAM.

Here is an example endpoint. It streams a video or any large binary file.


import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

def file_chunk_generator(file_path: str, chunk_size: int = 8192):
    """Generator to read a file in chunks."""
    with open(file_path, "rb") as file:
        while True:
            chunk = file.read(chunk_size)
            if not chunk:
                break
            yield chunk

@app.get("/stream-video/{video_name}")
async def stream_video(video_name: str):
    """Stream a large video file."""
    file_path = f"./videos/{video_name}"
    if not os.path.exists(file_path):
        return {"error": "File not found"}

    # Determine media type (simplified)
    media_type = "video/mp4"

    return StreamingResponse(
        content=file_chunk_generator(file_path),
        media_type=media_type,
        headers={"Content-Disposition": f"inline; filename={video_name}"}
    )

The file_chunk_generator function opens the file in binary mode.

It reads a chunk of size 8192 bytes (8KB). It yields that chunk and reads the next.

This continues until the file is empty. The file handle is properly closed by the with block.

The endpoint checks if the file exists. It then returns the StreamingResponse.

The client can play the video as it downloads. This is how streaming services work.

Handling Database Query Streams

You might need to stream results from a database. This is useful for large query results.

Many database drivers support streaming cursors. They fetch rows one by one or in batches.

Here is an example using an async database driver. We simulate a large dataset query.


from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio
import json

app = FastAPI()

async def stream_database_results():
    """Simulates streaming rows from a database cursor."""
    # This would be your actual async database fetch logic
    for row_id in range(50000):
        # Simulate a database row as a dictionary
        row_data = {"id": row_id, "value": f"item_{row_id}"}
        # Convert to JSON and add a newline for NDJSON format
        yield json.dumps(row_data) + "\\n"
        await asyncio.sleep(0.0001)  # Simulate small I/O delay

@app.get("/export-data")
async def export_large_dataset():
    """Stream a large dataset as JSON Lines (NDJSON)."""
    return StreamingResponse(
        content=stream_database_results(),
        media_type="application/x-ndjson"  # Newline Delimited JSON
    )

This endpoint streams JSON Lines format. Each row is a valid JSON object on one line.

The client can parse each line as it arrives. This is great for real-time data pipelines.

For managing database schemas in such apps, consider our guide on FastAPI Database Migrations with Alembic.

Important Considerations and Best Practices

Streaming is powerful but requires careful thought. Here are key points to remember.

Error Handling: If the generator crashes, the stream stops. Wrap yields in try/except blocks.

Client Compatibility: Not all clients handle streaming perfectly. Test with your target clients.

Headers: Set correct Content-Type and Content-Disposition headers. This helps clients understand the data.

Timeouts: Long-running streams can timeout. Configure your ASGI server (Uvicorn, Hypercorn) for longer timeouts.

Backpressure: The server should not produce data faster than the network can send it. Async sleeps can help.

Streaming also works well with FastAPI Background Tasks for preparing data before streaming.

Testing Your Streaming Endpoint

You can test streaming with simple command-line tools. curl is a great option.

Run your FastAPI app. Then use curl to call the streaming endpoint.


curl http://localhost:8000/stream-data

You will see data appearing in your terminal line by line. It won't wait for the entire response.

For more structured testing, use the TestClient. You can iterate over the response iterable.


from fastapi.testclient import TestClient
from .main import app  # Import your FastAPI app

client = TestClient(app)

def test_stream_endpoint():
    response = client.get("/stream-data")
    assert response.status_code == 200
    # The response is iterable
    chunk_count = 0
    for chunk in response.iter_lines():
        chunk_count += 1
        # You can assert things about each chunk
        assert b"data row" in chunk
        if chunk_count >= 5:
            break  # Test first few chunks
    print(f"Processed {chunk_count} chunks.")

Learn more about testing in our guide Test FastAPI Endpoints with pytest and TestClient.

Conclusion

Streaming large responses is essential for robust FastAPI applications. It prevents memory overload.

Use the StreamingResponse class with a generator function. This is the core pattern.

You can stream files, database results, or any large data. It improves user experience and server stability.

Remember to handle errors and set correct headers. Always test your streaming endpoints thoroughly.

By mastering streaming, you build applications that scale efficiently. They can handle large data with ease.

Start implementing streaming in your APIs today. Your server and your users will thank you.