Last modified: Jan 28, 2026 By Alexander Williams
Python Get HTML from URL: Quick Guide
Getting HTML from a URL is a core skill. It is the first step in web scraping and data collection. Python makes this task very easy.
You can use simple libraries to fetch web page content. This guide will show you how. We will cover the two most popular methods.
Why Fetch HTML with Python?
Python is great for automating web tasks. You might need data from a website. This data could be news, prices, or weather.
Fetching the HTML is step one. Once you have the HTML, you can parse it. You can extract the exact information you need.
This process is often called web scraping. It is powerful for research and analysis. Always check a website's robots.txt file and terms of service before scraping.
Method 1: Using the Requests Library
The requests library is the most popular choice. It is simple and user-friendly. You need to install it first.
Open your terminal or command prompt. Run the following command to install it.
pip install requests
After installation, you can start using it. The main function is requests.get(). It sends a GET request to a URL.
Here is a basic example. We will fetch the HTML from a sample website.
import requests
# Define the URL you want to fetch
url = "https://httpbin.org/html"
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the HTML content of the page
print(response.text)
else:
print(f"Failed to retrieve page. Status code: {response.status_code}")
The response.text attribute contains the HTML. It is a string. You can save it to a file or parse it immediately.
Always check the status code. A code of 200 means success. Other codes like 404 mean the page was not found.
Handling Errors with Requests
Networks can be unreliable. A website might be down. Your code should handle these errors gracefully.
Use a try-except block. This will catch connection errors and timeouts.
import requests
from requests.exceptions import RequestException
url = "https://example.com"
try:
response = requests.get(url, timeout=5) # Timeout after 5 seconds
response.raise_for_status() # Raises an error for bad status codes (4xx or 5xx)
html_content = response.text
print("HTML fetched successfully!")
except RequestException as e:
print(f"An error occurred: {e}")
This is a more robust approach. The timeout parameter prevents your program from hanging. The raise_for_status() method is a quick way to check for HTTP errors.
Method 2: Using urllib from the Standard Library
Python comes with a built-in module called urllib. You do not need to install anything. It is perfect for simple tasks.
The main function here is urllib.request.urlopen(). It opens a URL and returns a response object.
from urllib.request import urlopen
from urllib.error import URLError, HTTPError
url = "https://httpbin.org/html"
try:
# Open the URL
response = urlopen(url)
# Read the HTML content and decode it to a string
html_content = response.read().decode('utf-8')
print(html_content)
except HTTPError as e:
print(f"HTTP Error: {e.code} - {e.reason}")
except URLError as e:
print(f"URL Error: {e.reason}")
Note the .decode('utf-8') part. The read() method returns bytes. You must decode it to get a proper string.
This method is more verbose than using requests. It is good to know for environments where you cannot install external libraries.
What to Do After Getting the HTML?
Raw HTML is just text. To find specific data, you need a parser. Beautiful Soup is the most famous library for this.
Here is a quick example. We fetch a page and extract all the paragraph text.
import requests
from bs4 import BeautifulSoup
url = "https://httpbin.org/html"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Find all tags and print their text
for paragraph in soup.find_all('p'):
print(paragraph.get_text())
This is the power of combining fetching and parsing. You can target any HTML element. For example, you could learn to build Python HTML Tables That Look Like Excel Spreadsheets from scraped data.
Important Considerations and Best Practices
Fetching HTML is simple. Doing it responsibly is key. Follow these best practices.
First, respect robots.txt. This file tells bots which pages they can access. The robotparser module can help you read it.
Second, identify yourself. Use a descriptive User-Agent header. This tells the server who is making the request.
headers = {
'User-Agent': 'MyDataCollectorBot/1.0 ([email protected])'
}
response = requests.get(url, headers=headers)
Third, be polite. Do not send too many requests too quickly. Add delays between requests. This prevents overloading the server.
Use time.sleep() to pause your program. This is especially important when scraping many pages from one site.
Conclusion
Getting HTML from a URL in Python is straightforward. The requests library is the best tool for most people. The built-in urllib works when you cannot install packages.
Remember the core steps. Make the request, check for success, and handle errors. Then you can parse the HTML to find your data.
Always scrape ethically. Check permissions, identify your bot, and slow down your requests. This keeps the web open for everyone.
Now you have the foundation. You can start collecting data from the web. The next step is learning to parse it effectively, perhaps to create structured outputs like Python HTML Tables That Look Like Excel Spreadsheets.