Last modified: Jan 12, 2026 By Alexander Williams
Scrape Multiple Pages with BeautifulSoup
Web scraping often requires data from more than one page. You need a method to navigate through many pages. This is called pagination handling.
BeautifulSoup is a great Python library for parsing HTML. But it needs help to move between pages. This guide will show you how to do it.
We will combine requests for fetching pages and BeautifulSoup for parsing. You will learn to build a scraper that loops through page links.
Understanding Pagination Patterns
First, identify how the website moves between pages. Look for "Next" buttons or page number links. Sometimes the URL changes with a query parameter.
Common patterns include `?page=2` or `/page/2/`. Your scraper must detect and follow these patterns. We will write code to find the next page link.
Check the site's robots.txt file. Always respect the website's rules. Do not overload their servers with too many rapid requests.
Setting Up Your Scraper
Start by installing the necessary libraries. Use pip for installation. You need BeautifulSoup and requests.
pip install beautifulsoup4 requests
Now, import the libraries in your Python script. We will also import time for adding delays.
import requests
from bs4 import BeautifulSoup
import time
# Base URL of the site you want to scrape
base_url = "https://example-books.com/books?page="
Building the Page Loop
The core logic is a loop. It will request each page URL one by one. We must handle potential errors like missing pages.
Use a while loop or a for loop with a range. The choice depends on the pagination style. We will use a for loop for a known number of pages.
Inside the loop, fetch the page content. Then parse it with BeautifulSoup. Extract the data you need from each page.
def scrape_multiple_pages(start_page, end_page):
all_books = []
for page_num in range(start_page, end_page + 1):
# Construct the URL for the current page
url = f"{base_url}{page_num}"
print(f"Scraping: {url}")
# Send a GET request to the page
response = requests.get(url)
# Check if the request was successful
if response.status_code != 200:
print(f"Failed to retrieve page {page_num}")
break
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all book containers on the page (adjust selector)
book_cards = soup.find_all('div', class_='book-item')
for card in book_cards:
# Extract data from each card
title_elem = card.find('h2', class_='title')
author_elem = card.find('span', class_='author')
price_elem = card.find('div', class_='price')
title = title_elem.text.strip() if title_elem else "N/A"
author = author_elem.text.strip() if author_elem else "N/A"
price = price_elem.text.strip() if price_elem else "N/A"
all_books.append({
'title': title,
'author': author,
'price': price,
'page': page_num
})
# Be polite: wait a bit before the next request
time.sleep(1)
return all_books
# Scrape pages 1 to 5
books_data = scrape_multiple_pages(1, 5)
print(f"Scraped {len(books_data)} books in total.")
Handling Dynamic Next Page Links
Some sites do not use simple numbered URLs. They have a "Next" button. Your scraper must find and follow that link.
Inspect the HTML of the page. Look for the anchor tag with "Next" text or a specific class. Extract its `href` attribute.
This approach is more robust for sites with changing URL structures. The loop continues until there is no "Next" link.
def scrape_with_next_button(start_url):
all_data = []
current_url = start_url
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
if response.status_code != 200:
break
soup = BeautifulSoup(response.content, 'html.parser')
# ... Your data extraction logic here ...
# Example: Extract product names
items = soup.find_all('li', class_='product')
for item in items:
name = item.find('h3').text if item.find('h3') else "N/A"
all_data.append({'name': name})
# Find the "Next" page link
next_link = soup.find('a', text='Next')
if next_link and 'href' in next_link.attrs:
current_url = next_link['href']
# Handle relative URLs
if current_url.startswith('/'):
current_url = "https://example.com" + current_url
else:
current_url = None # Exit loop
time.sleep(1)
return all_data
Important Best Practices
Always add delays between requests. Use time.sleep(). This prevents you from being blocked.
Check for rate limits in the site's terms. Use headers like User-Agent to mimic a real browser. For advanced techniques, read our guide on BeautifulSoup with Proxies and User Agents.
Handle errors gracefully. Networks fail and pages change. Use try-except blocks around your requests.
Your scraper might face encoding problems. Our BeautifulSoup Unicode Encoding Issues Guide can help.
Storing Your Scraped Data
Collecting data is only the first step. You need to save it. Common formats are CSV or JSON.
Python's `csv` module is perfect for this. You can write each item as a row. Learn more in our article Save Scraped Data to CSV with BeautifulSoup.
Here is a quick example of saving the books data to a CSV file.
import csv
def save_to_csv(data, filename='books.csv'):
if not data:
return
keys = data[0].keys()
with open(filename, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=keys)
writer.writeheader()
writer.writerows(data)
print(f"Data saved to {filename}")
# Assuming books_data is from our earlier function
save_to_csv(books_data)
Conclusion
Scraping multiple pages is a key skill. It lets you gather large datasets. BeautifulSoup and requests make it possible.
Remember to identify the pagination pattern first. Build a loop that requests each page. Always extract and store data carefully.
Follow best practices like adding delays. Handle errors and respect robots.txt. This ensures your scraper is effective and ethical.
For more complex sites with JavaScript, consider tools like Selenium. Check out our guide on Combine BeautifulSoup & Selenium for Web Scraping.
Now you can scale your web scraping projects. Happy scraping!