Last modified: Jan 10, 2026 By Alexander Williams

BeautifulSoup with Proxies and User Agents

Web scraping is a powerful tool. But websites often block automated scripts. They see your script as a bot. This is where proxies and user agents help.

They make your requests look like they come from real users. This guide shows you how. You will learn to integrate them with BeautifulSoup.

Why Use Proxies and User Agents?

Direct scraping can get your IP banned. Websites track your IP address. Too many requests from one IP trigger alarms.

Proxies hide your real IP. They route your request through another server. This masks your origin.

User agents identify your browser. The default Python user agent is obvious. It screams "bot" to servers.

Changing it mimics a real browser. Together, they are essential for large-scale scraping.

Setting Up Your Environment

First, install the necessary libraries. You need requests and beautifulsoup4. Use pip for installation.


pip install requests beautifulsoup4

Now, import them in your Python script. You are ready to start.


import requests
from bs4 import BeautifulSoup

Understanding and Setting User Agents

A user agent is a string. It tells the server about your browser and OS. Python's default is simple.

Websites can easily detect it. You need to use a common browser's user agent. Find one online or from your own browser.

Pass it in the request headers. Use the headers parameter in requests.get().


# Define a common user agent string
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'

# Create headers dictionary
headers = {
    'User-Agent': user_agent
}

# Make a request with the custom header
response = requests.get('https://httpbin.org/user-agent', headers=headers)
print(response.text)

{
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}

See? The server now sees a common browser. This reduces blocking risk.

Rotating Multiple User Agents

Using one agent is good. Rotating many is better. Create a list of user agent strings.

Randomly select one for each request. This makes your script even harder to detect.


import random

# List of user agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]

# Pick a random agent
chosen_agent = random.choice(user_agents)
headers = {'User-Agent': chosen_agent}

response = requests.get('https://httpbin.org/user-agent', headers=headers)
print(f"Used Agent: {chosen_agent}")

Integrating Proxies with Requests

Proxies are servers that forward your requests. They have their own IP addresses. You send your request to the proxy.

The proxy sends it to the target website. The site sees the proxy's IP, not yours.

Use the proxies parameter in requests.get(). It accepts a dictionary.


# Define proxy (format: protocol://ip:port)
proxy = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}

# Make request with proxy
try:
    response = requests.get('https://httpbin.org/ip', proxies=proxy, timeout=10)
    print(response.text)
except requests.exceptions.ProxyError as e:
    print(f"Proxy Error: {e}")

{
  "origin": "10.10.1.10"
}

The output shows the proxy's IP. Your real IP is hidden.

Finding and Using Free Proxies

Free proxy lists are available online. But be cautious. They can be slow, unreliable, or insecure.

For serious projects, consider paid services. They offer better speed and reliability.

You can scrape a free proxy list to get IPs. Then test them before use. Here's a basic example.


# Example: Scrape a proxy list site (hypothetical)
# Always check the site's terms of service first.
url = 'https://www.example-proxy-list.com/'
headers = {'User-Agent': 'Mozilla/5.0'}

resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.content, 'html.parser')

# Assuming proxies are in a table with class 'proxy'
# This is a template. Selectors will vary by site.
proxy_elements = soup.select('table.proxy tbody tr')
proxies_list = []
for row in proxy_elements[:5]:  # Get first 5
    ip = row.find('td', class_='ip').text
    port = row.find('td', class_='port').text
    proxies_list.append(f"http://{ip}:{port}")

print("Found Proxies:", proxies_list)

Combining Proxies and User Agents with BeautifulSoup

Now, let's put it all together. Use rotating user agents and a proxy. Then parse the result with BeautifulSoup.

This is the core of robust scraping. It mimics human behavior closely.


import requests
from bs4 import BeautifulSoup
import random

# Configuration
user_agents = ['Agent_String_1', 'Agent_String_2']  # Add real strings
proxy = {'http': 'http://your-proxy-ip:port', 'https': 'http://your-proxy-ip:port'}

target_url = 'https://books.toscrape.com/'

# Setup request
headers = {'User-Agent': random.choice(user_agents)}

try:
    # Make the request
    response = requests.get(target_url, headers=headers, proxies=proxy, timeout=15)
    response.raise_for_status()  # Check for HTTP errors

    # Parse with BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract data - example: book titles
    books = soup.select('article.product_pod h3 a')
    for book in books[:3]:
        print(book['title'])

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

This script fetches a page anonymously. It then extracts data with BeautifulSoup.

Best Practices and Error Handling

Always respect robots.txt. Check the website's scraping policy. Do not overload servers.

Add delays between requests. Use time.sleep(). This is polite and reduces detection.

Handle errors gracefully. Proxies fail often. Use try-except blocks.

For complex sites, consider combining tools. Our guide on Combine BeautifulSoup & Selenium for Web Scraping can help.

Also, ensure your data is saved correctly. Learn to Save Scraped Data to CSV with BeautifulSoup.

Common Issues and Solutions

Proxy connection errors are common. The proxy might be dead. Have a list and retry with another.

SSL errors can occur with HTTPS proxies. You might need to adjust SSL verification settings cautiously.

If you encounter encoding problems in your scraped text, refer to our BeautifulSoup Unicode Encoding Issues Guide.

Always test your proxy and user agent setup on a simple site first. Use a site like httpbin.org/ip.

Conclusion

Using proxies and user agents is crucial for successful web scraping. They help you avoid IP bans and mimic human traffic.

Start by setting a realistic user agent. Then integrate a reliable proxy. Rotate both for large-scale projects.

Combine these techniques with BeautifulSoup's parsing power. You can scrape data efficiently and responsibly.

Remember to scrape ethically. Respect website terms and rate limits. Happy scraping!