Last modified: Nov 12, 2024 By Alexander Williams
Python Guide: Download Files from URLs Using Requests Library
Downloading files from URLs is a common task in web scraping and data collection. Python's requests
library makes this process straightforward and efficient.
Basic File Download
The simplest way to download a file is using the get()
method from requests. Here's a basic example:
import requests
url = "https://example.com/file.pdf"
response = requests.get(url)
with open("downloaded_file.pdf", "wb") as file:
file.write(response.content)
Streaming Large Files
For large files, it's better to use streaming to avoid memory issues. The stream
parameter helps manage memory efficiently:
import requests
url = "https://example.com/large_file.zip"
response = requests.get(url, stream=True)
with open("large_file.zip", "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
file.write(chunk)
Progress Tracking
When downloading large files, it's useful to track progress. Here's how to implement a progress bar using tqdm:
import requests
from tqdm import tqdm
url = "https://example.com/large_file.zip"
response = requests.get(url, stream=True)
total_size = int(response.headers.get('content-length', 0))
with open("large_file.zip", "wb") as file, tqdm(
desc="Downloading",
total=total_size,
unit='iB',
unit_scale=True
) as progress_bar:
for data in response.iter_content(chunk_size=1024):
size = file.write(data)
progress_bar.update(size)
Error Handling
Always implement proper error handling when downloading files. Check out our guide on Python Requests Error Handling for more details.
import requests
from requests.exceptions import RequestException
try:
response = requests.get(url)
response.raise_for_status()
with open("file.pdf", "wb") as file:
file.write(response.content)
except RequestException as e:
print(f"An error occurred: {e}")
Handling Different Content Types
When working with different file types, you might need to handle JSON responses differently. Learn more in our Guide to Handling JSON Responses.
Session Management
For multiple downloads, using sessions can improve performance. Check our Python Requests Session Guide for details.
with requests.Session() as session:
session.headers.update({'User-Agent': 'Mozilla/5.0'})
response = session.get(url)
with open("file.pdf", "wb") as file:
file.write(response.content)
Timeout Configuration
Set appropriate timeouts to handle slow downloads. Learn more in our Python Requests Timeout Guide.
Conclusion
Downloading files with Python requests is versatile and powerful. Remember to implement proper error handling, use streaming for large files, and consider session management for multiple downloads.