Last modified: Nov 22, 2024 By Alexander Williams

Python Selenium: Download Files from URLs - Step by Step Guide

Downloading files using Python Selenium requires proper configuration and handling. This comprehensive guide will show you how to implement automated file downloads efficiently while following best practices for URL handling.

Setting Up Chrome Options for Downloads

Before downloading files, we need to configure Chrome options to specify the download directory and disable the download prompt. Here's how to set it up:


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# Configure Chrome options
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
    "download.default_directory": "C:\\Downloads",  # Change to your preferred directory
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "safebrowsing.enabled": True
})

driver = webdriver.Chrome(options=chrome_options)

Basic File Download Implementation

Here's a simple example of downloading a file by clicking a download link. First, we need to locate the download element by its URL:


from selenium.webdriver.common.by import By
import time

def download_file(url):
    try:
        driver.get(url)
        # Find and click the download button/link
        download_button = driver.find_element(By.CSS_SELECTOR, "a.download-link")
        download_button.click()
        
        # Wait for download to complete
        time.sleep(5)  # Basic wait - consider using better wait strategies
        
        print("File downloaded successfully!")
        
    except Exception as e:
        print(f"Error downloading file: {str(e)}")

# Example usage
download_file("https://example.com/download-page")

Implementing Wait for Download Completion

To ensure the file is completely downloaded, we should implement a proper waiting mechanism:


import os
import time

def wait_for_download(download_path, timeout=30):
    seconds = 0
    while seconds < timeout:
        # Check for partial downloads
        files = os.listdir(download_path)
        if any(file.endswith(".crdownload") for file in files):
            time.sleep(1)
            seconds += 1
        else:
            return True
    return False

Handling Different File Types

When downloading different file types, you may need to add additional configurations. Here's how to handle specific file types:


# Configure Chrome options for specific file types
chrome_options.add_experimental_option("prefs", {
    "download.default_directory": "C:\\Downloads",
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "safebrowsing.enabled": True,
    "plugins.always_open_pdf_externally": True  # For PDF files
})

Error Handling and Verification

It's important to verify the downloaded file and handle potential errors. Here's a complete example with error handling:


import os

def verify_download(file_path, expected_size=None):
    if not os.path.exists(file_path):
        return False
    
    if expected_size:
        actual_size = os.path.getsize(file_path)
        return actual_size == expected_size
    
    return True

# Example usage with verification
download_path = "C:\\Downloads\\example.pdf"
if verify_download(download_path):
    print("Download verified successfully!")
else:
    print("Download verification failed!")

Conclusion

Implementing file downloads with Python Selenium requires careful consideration of browser configurations, wait mechanisms, and error handling. Remember to always verify downloaded files and implement proper exception handling.

For more advanced implementations, consider combining this with proper logging practices to track download status and potential issues.