Last modified: Jan 19, 2026 By Alexander Williams

Web Scraping Guide with BeautifulSoup for Beginners

Web scraping is a powerful skill. It lets you collect data from websites.

Python and BeautifulSoup make this easy. This guide will teach you the basics.

You will learn to extract information from HTML pages step by step.

What is Web Scraping?

Web scraping is automated data collection from the web. It's like copying and pasting but faster.

It is used for price comparison, research, and data analysis. Always check a website's robots.txt file and terms of service.

Scraping responsibly is crucial. Do not overload servers.

Prerequisites

You need Python installed on your computer. Basic Python knowledge helps.

We will use two main libraries: requests and beautifulsoup4.

Open your terminal or command prompt to get started.

Step 1: Install Required Libraries

First, install the necessary packages. Use the pip package manager.


pip install requests beautifulsoup4
    

This command downloads and installs both libraries. You only need to do this once.

Step 2: Import Libraries

Create a new Python file. Start by importing the modules.


import requests
from bs4 import BeautifulSoup
    

The requests library fetches web pages. BeautifulSoup parses the HTML content.

Step 3: Fetch a Web Page

Use requests.get() to download a page. We will use a simple example page.


URL = 'http://example.com'
response = requests.get(URL)

# Check if the request was successful
if response.status_code == 200:
    print('Page fetched successfully!')
else:
    print('Failed to retrieve page')
    

The status_code 200 means success. Always handle possible errors.

Step 4: Parse HTML with BeautifulSoup

Create a BeautifulSoup object. This object lets you navigate the HTML structure.


soup = BeautifulSoup(response.content, 'html.parser')
    

We pass the page content and the parser type. 'html.parser' is built into Python.

Step 5: Explore the Page Structure

Use your browser's Developer Tools. Right-click on a webpage and select "Inspect".

This shows the HTML code. Identify the tags containing your target data.

Look for unique class names or IDs. This makes extraction precise.

Step 6: Extract Data by Tag Name

Find elements using their tag name. Use the find() or find_all() methods.


# Find the first 

tag title_tag = soup.find('h1') print(title_tag.text) # Find all paragraph

tags all_paragraphs = soup.find_all('p') for p in all_paragraphs: print(p.text)


Example Domain
This domain is for use in illustrative examples...
    

find() returns the first match. find_all() returns a list of all matches.

Step 7: Extract Data by Class or ID

Tags often have class or ID attributes. These are more specific selectors.


# Find element with a specific class
div_with_class = soup.find('div', class_='example-class')

# Find element with a specific ID
main_content = soup.find(id='main')
    

Note: class_ has an underscore because 'class' is a Python keyword.

Step 8: Extract Attributes and Links

You can get attributes like 'href' from links. Treat the tag like a dictionary.


# Find the first link  tag
link = soup.find('a')
print('Link Text:', link.text)
print('Link URL:', link['href'])
    

Link Text: More information...
Link URL: https://www.iana.org/domains/example
    

This is useful for collecting all links on a page.

Step 9: Putting It All Together: A Complete Script

Let's build a script that scrapes book titles from a mock page.


import requests
from bs4 import BeautifulSoup

# Target URL (a mock book listing site)
url = 'https://books.toscrape.com/catalogue/page-1.html'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Find all article tags with class 'product_pod'
books = soup.find_all('article', class_='product_pod')

for book in books:
    # Find the h3 tag inside the article, then the 'a' tag inside it
    title_tag = book.h3.a
    title = title_tag['title']  # The title is stored in the 'title' attribute
    print(title)
    

This script finds all book articles. It then extracts the title from each one.

Step 10: Handle Common Issues

Websites change. Your script might break if the HTML structure updates.

Use try-except blocks to handle missing elements gracefully.


try:
    price = soup.find('p', class_='price_color').text
except AttributeError:
    price = 'Price not found'
    

This prevents your program from crashing. For more complex debugging, see our guide on Debug and Test BeautifulSoup Scripts Efficiently.

Next Steps and Best Practices

You now know the basics. Real-world projects need more techniques.

To scrape data across many pages, learn to Scrape Multiple Pages with BeautifulSoup.

Modern sites load data dynamically. For this, check our guide on Scrape AJAX Content with BeautifulSoup.

Always scrape ethically. Add delays between requests. Respect robots.txt rules.

Conclusion

BeautifulSoup is a fantastic tool for beginners. It turns messy HTML into structured data.

You learned to install, fetch, parse, and extract data. Start with simple projects.

Practice on sites that allow scraping. Always be respectful of server resources.

Happy scraping!