Last modified: Jan 19, 2026 By Alexander Williams
Build a Web Scraper with BeautifulSoup Requests
Web scraping is a powerful skill. It lets you collect data from websites. This guide will teach you how to build a scraper from scratch. We will use Python's Requests and BeautifulSoup libraries. No prior scraping experience is needed.
You will learn the entire process. We start with setup and move to data extraction. By the end, you will have a working scraper. You can use it for your own projects.
Prerequisites and Setup
You need Python installed on your computer. Python 3.6 or higher is recommended. You also need to install two key libraries.
Open your terminal or command prompt. Run the following pip install commands. This will get the necessary tools.
pip install requests beautifulsoup4
The requests library fetches web pages. The beautifulsoup4 library parses HTML. It makes data extraction simple.
You might also want a code editor. VS Code or PyCharm are great choices. They help you write and debug code easily.
Understanding the Web Scraping Process
Web scraping has three main steps. First, you fetch the HTML content of a page. Second, you parse that HTML to find elements. Third, you extract and save the data you need.
Requests handles the first step. It sends an HTTP GET request to a URL. The server responds with the page's HTML code.
BeautifulSoup handles the second and third steps. It takes the raw HTML. Then it creates a parse tree. This tree lets you navigate the HTML structure.
For a deeper foundation, read our Web Scraping Guide with BeautifulSoup for Beginners.
Step 1: Fetching a Web Page with Requests
Let's start by getting a web page. We will use a simple example page. Create a new Python file. Name it scraper.py.
First, import the requests module. Then use the requests.get() function. Pass the target URL as an argument.
import requests
# URL of the page to scrape
url = 'https://books.toscrape.com/catalogue/page-1.html'
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
print('Page fetched successfully!')
# Store the HTML content
html_content = response.text
else:
print(f'Failed to retrieve page. Status code: {response.status_code}')
Page fetched successfully!
The response.text attribute contains the HTML. This is the raw data we will parse. Always check the status code. Code 200 means success.
Step 2: Parsing HTML with BeautifulSoup
Now we have the HTML. Next, we parse it. Import BeautifulSoup from the bs4 module. Create a BeautifulSoup object.
You must specify a parser. We will use the 'html.parser'. It's built into Python.
from bs4 import BeautifulSoup
# Create a BeautifulSoup object to parse the HTML
soup = BeautifulSoup(html_content, 'html.parser')
# The soup object now represents the document as a nested data structure
print(type(soup))
The soup object is your entry point. You can now find any HTML element. Use methods like find() and find_all().
Step 3: Finding and Extracting Data
This is the core of scraping. You locate elements by tag name, class, or id. Then you extract their text or attributes.
Let's scrape book titles from our example page. Inspect the page structure first. Titles are in tags.
# Find all tags on the page (each contains a book title link)
book_titles = soup.find_all('h3')
# Loop through the found elements and extract the text
for title in book_titles:
# The title text is inside an tag within the
book_title = title.a['title']
print(book_title)
A Light in the Attic
Tipping the Velvet
Soumission
...
The find_all() method returns a list. It contains all matching elements. We accessed the 'title' attribute of the tag.
You can also search by CSS class. Use the class_ parameter. Let's get book prices. They have the class 'price_color'.
# Find all elements with the CSS class 'price_color'
book_prices = soup.find_all('p', class_='price_color')
for price in book_prices:
print(price.text)
£51.77
£53.74
£50.10
...
Combining finds is powerful. You can navigate from one element to another. This is essential for complex pages.
For advanced data like news articles, see our BeautifulSoup News Scraping Tutorial.
Step 4: Structuring and Saving the Data
Printing data to the console is not enough. You usually want to save it. Common formats are CSV and JSON. Let's structure our book data into a list of dictionaries.
We will collect title, price, and availability. Then we save it to a CSV file.
import csv
books_data = []
# We'll find each book article container and extract details from it
book_articles = soup.find_all('article', class_='product_pod')
for article in book_articles:
title = article.h3.a['title']
price = article.find('p', class_='price_color').text
# Availability is in a p tag with class 'instock availability'
availability = article.find('p', class_='instock availability').text.strip()
book_info = {
'title': title,
'price': price,
'availability': availability
}
books_data.append(book_info)
# Save to a CSV file
with open('books.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=['title', 'price', 'availability'])
writer.writeheader()
writer.writerows(books_data)
print('Data saved to books.csv')
This code loops through each book container. It extracts specific data points. Then it writes them to a CSV file. You now have a permanent data record.
Important Considerations and Best Practices
Web scraping is powerful but comes with responsibilities. Always respect the website's robots.txt file. This file states scraping rules.
Do not overload servers with rapid requests. Add delays between requests. Use the time.sleep() function.
Some sites block scrapers. They check user-agent headers. You can set a custom header in your request.
headers = {
'User-Agent': 'MyScraperBot/1.0 ([email protected])'
}
response = requests.get(url, headers=headers)
For a full guide on staying under the radar, read Avoid Getting Blocked While Scraping BeautifulSoup.
Always check if elements exist before accessing them. Use try-except blocks or conditional checks. This prevents your script from crashing.
Conclusion
You have built a web scraper from scratch. You learned to fetch pages with Requests. You parsed HTML with BeautifulSoup. You extracted and saved structured data.
This is just the beginning. You can scrape multiple pages. You can handle login forms. You can scrape dynamic JavaScript content.
Remember to scrape ethically. Respect website terms and server load. Use your new skill for good projects.
Happy scraping! The internet's data is now within your reach.