Last modified: Jan 28, 2026 By Alexander Williams
Python Open URL: Read Web Pages in Code
Python makes web interaction simple. You can open URLs to read web pages. This is called making an HTTP request. It is a core skill for many projects.
You might need to scrape data from a website. Or check if a site is online. Python has built-in tools for this. The main libraries are urllib and requests.
This guide will show you both methods. We will start with the basic built-in module. Then we will use the more powerful third-party library.
Using Python's Built-in urllib
The urllib module is part of Python's standard library. You do not need to install anything. It has tools for working with URLs.
The urllib.request module is key. It contains the urlopen() function. This function opens a URL and returns a response object.
Here is a simple example. It opens a test website and reads its content.
# Import the urlopen function from urllib.request
from urllib.request import urlopen
# Define the URL you want to open
url = "https://httpbin.org/html"
# Use urlopen() to send a GET request and get a response
response = urlopen(url)
# Read the HTML content from the response
html_content = response.read()
# Print the first 500 characters of the content
print(html_content[:500])
b'<!DOCTYPE html>\n<html>\n <head>\n </head>\n <body>\n <h1>Herman Melville - Moby-Dick</h1>\n <p>\n Availing himself of the mild...'
The output is a bytes object. You often need to decode it to a string. Use the .decode() method. Specify the encoding, usually 'utf-8'.
Always close the response object. Use a with statement. It handles closing automatically. This is a best practice.
from urllib.request import urlopen
url = "https://httpbin.org/html"
# Use a 'with' statement for automatic resource management
with urlopen(url) as response:
# Decode the bytes to a UTF-8 string
html_page = response.read().decode('utf-8')
print(f"Page title is approximately: {html_page[100:150]}")
Handling Errors with urllib
Web requests can fail. The site might be down. Or the URL could be wrong. You must handle these errors.
urllib.error contains useful exceptions. URLError handles network problems. HTTPError handles bad HTTP status codes like 404.
Use try-except blocks. This makes your code robust. It prevents crashes from unexpected errors.
from urllib.request import urlopen
from urllib.error import URLError, HTTPError
url = "https://httpbin.org/status/404" # A URL that returns a 404 error
try:
with urlopen(url) as response:
content = response.read()
print("Success!")
except HTTPError as e:
print(f"HTTP Error occurred: {e.code} - {e.reason}")
except URLError as e:
print(f"URL Error occurred: {e.reason}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
HTTP Error occurred: 404 - NOT FOUND
The Powerful Requests Library
The requests library is the popular choice. It is simpler and more intuitive than urllib. You must install it first.
Run pip install requests in your terminal. It is not part of the standard library. But it is the industry standard for HTTP in Python.
The main function is requests.get(). It sends a GET request. The response object is very easy to use.
# Import the requests library
import requests
url = "https://api.github.com"
# Send a GET request
response = requests.get(url)
# Check if the request was successful (status code 200)
print(f"Status Code: {response.status_code}")
print(f"Response Headers: {response.headers['content-type']}")
print(f"Page content length: {len(response.text)} characters")
Status Code: 200
Response Headers: application/json; charset=utf-8
Page content length: 1839 characters
The response.text is automatically decoded. No need to call .decode(). The response.json() method is great for APIs. It parses JSON responses directly.
Error handling is also simpler. Check the response.status_code. Or use response.raise_for_status(). It raises an exception for bad status codes.
Passing Parameters and Headers
Real-world requests often need extra data. You might need to send query parameters. Or set custom headers like a User-Agent.
Both libraries support this. With requests, you pass a dictionary to the params argument. For headers, use the headers argument.
This is common when interacting with web APIs or mimicking a real browser.
import requests
url = "https://httpbin.org/get"
# Define parameters and headers as dictionaries
params = {'key1': 'value1', 'key2': 'value2'}
headers = {'User-Agent': 'MyPythonScript/1.0'}
response = requests.get(url, params=params, headers=headers)
print(f"Final URL with parameters: {response.url}")
print(f"Response JSON: {response.json()}")
Conclusion: Which Method Should You Use?
Choosing between urllib and requests is easy. Use urllib for simple, dependency-free scripts. It is already installed with Python.
Use requests for almost everything else. Its API is cleaner. It has better documentation. It handles many complex tasks automatically.
The core concept is the same. You send a request to a URL. You receive a response. You then process the response data.
Remember to handle errors. Respect website terms of service. Do not overload servers with rapid requests. Now you can open URLs in Python. You can fetch data from the web for your projects.