Last modified: Nov 12, 2024 By Alexander Williams

Python Guide: Download Files from URLs Using Requests Library

Downloading files from URLs is a common task in web scraping and data collection. Python's requests library makes this process straightforward and efficient.

Basic File Download

The simplest way to download a file is using the get() method from requests. Here's a basic example:


import requests

url = "https://example.com/file.pdf"
response = requests.get(url)

with open("downloaded_file.pdf", "wb") as file:
    file.write(response.content)

Streaming Large Files

For large files, it's better to use streaming to avoid memory issues. The stream parameter helps manage memory efficiently:


import requests

url = "https://example.com/large_file.zip"
response = requests.get(url, stream=True)

with open("large_file.zip", "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            file.write(chunk)

Progress Tracking

When downloading large files, it's useful to track progress. Here's how to implement a progress bar using tqdm:


import requests
from tqdm import tqdm

url = "https://example.com/large_file.zip"
response = requests.get(url, stream=True)
total_size = int(response.headers.get('content-length', 0))

with open("large_file.zip", "wb") as file, tqdm(
    desc="Downloading",
    total=total_size,
    unit='iB',
    unit_scale=True
) as progress_bar:
    for data in response.iter_content(chunk_size=1024):
        size = file.write(data)
        progress_bar.update(size)

Error Handling

Always implement proper error handling when downloading files. Check out our guide on Python Requests Error Handling for more details.


import requests
from requests.exceptions import RequestException

try:
    response = requests.get(url)
    response.raise_for_status()
    with open("file.pdf", "wb") as file:
        file.write(response.content)
except RequestException as e:
    print(f"An error occurred: {e}")

Handling Different Content Types

When working with different file types, you might need to handle JSON responses differently. Learn more in our Guide to Handling JSON Responses.

Session Management

For multiple downloads, using sessions can improve performance. Check our Python Requests Session Guide for details.


with requests.Session() as session:
    session.headers.update({'User-Agent': 'Mozilla/5.0'})
    response = session.get(url)
    with open("file.pdf", "wb") as file:
        file.write(response.content)

Timeout Configuration

Set appropriate timeouts to handle slow downloads. Learn more in our Python Requests Timeout Guide.

Conclusion

Downloading files with Python requests is versatile and powerful. Remember to implement proper error handling, use streaming for large files, and consider session management for multiple downloads.