Last modified: Jan 19, 2026 By Alexander Williams

Advanced BeautifulSoup Pagination & Infinite Scroll

Web scraping often requires data from many pages. This tutorial covers advanced methods.

You will learn to handle pagination and infinite scroll. These are common on modern websites.

We assume you know basic BeautifulSoup. Let's begin.

Understanding Pagination Patterns

Pagination splits content across multiple pages. Links like "Next" or page numbers are used.

Your scraper must find and follow these links. It must collect data from each page it visits.

First, identify the pagination structure on your target site. Look for a common URL pattern.

Scraping Static Pagination

Static pagination uses direct links to numbered pages. The URL often changes predictably.

For example, a site might use `?page=2` in its URL. You can loop through these page numbers.

Here is a Python script to scrape such a site. We use requests and BeautifulSoup.


import requests
from bs4 import BeautifulSoup

base_url = "https://example.com/items?page="
all_items = []

for page_num in range(1, 6):  # Scrape pages 1 to 5
    url = base_url + str(page_num)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all item elements (adjust selector as needed)
    items = soup.find_all('div', class_='item')
    
    for item in items:
        title = item.find('h2').text.strip()
        all_items.append(title)
    
    print(f"Scraped page {page_num}")

print(f"Total items collected: {len(all_items)}")


Scraped page 1
Scraped page 2
Scraped page 3
Scraped page 4
Scraped page 5
Total items collected: 50

Following "Next" Button Links

Some sites use a relative "Next" button. The URL is not always predictable.

Your script must find the link to the next page. It stops when no "Next" link exists.

This approach is more robust. It adapts to the site's specific HTML structure.


import requests
from bs4 import BeautifulSoup

url = "https://example.com/items"
all_data = []

while url:
    print(f"Fetching: {url}")
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Scrape data from current page
    items = soup.find_all('article')
    for item in items:
        all_data.append(item.text.strip())
    
    # Find the link to the next page
    next_link = soup.find('a', text='Next')
    if next_link and next_link.get('href'):
        url = next_link['href']
        # Handle relative URLs
        if url.startswith('/'):
            url = "https://example.com" + url
    else:
        url = None  # Exit loop

print(f"Scraping complete. Collected {len(all_data)} items.")

Handling Infinite Scroll with BeautifulSoup

Infinite scroll loads content dynamically as you scroll. It uses JavaScript and AJAX.

BeautifulSoup alone cannot handle this. It only parses static HTML.

You need to find the data source. Often it's a JSON API endpoint.

Inspect the network traffic in your browser's developer tools. Look for XHR requests.

Finding and Parsing the Data API

Many sites load data via a hidden JSON API. The URL might contain parameters like `offset`.

You can simulate these requests with requests. Then parse the JSON response directly.

This method is very efficient. It avoids downloading unnecessary HTML.


import requests
import json

api_url = "https://example.com/api/items"
params = {'offset': 0, 'limit': 20}
all_items = []

while True:
    response = requests.get(api_url, params=params)
    data = response.json()
    
    items = data.get('results', [])
    if not items:
        break  # No more data
    
    for item in items:
        all_items.append(item['title'])
    
    print(f"Fetched batch with offset {params['offset']}")
    params['offset'] += params['limit']  # Prepare for next batch

print(f"Total items from API: {len(all_items)}")

For more on AJAX, see our AJAX scraping guide.

Combining BeautifulSoup with Selenium

Sometimes the API is hard to find. You can use Selenium to control a real browser.

Selenium scrolls the page and loads all content. Then you pass the HTML to BeautifulSoup.

This method is slower but powerful. It works on almost any site.


from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.get("https://example.com/scroll-page")

# Scroll to bottom multiple times to load content
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Wait for new content to load
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

# Now parse the fully loaded page
soup = BeautifulSoup(driver.page_source, 'html.parser')
items = soup.find_all('div', class_='scroll-item')
print(f"Found {len(items)} items after scroll.")

driver.quit()

Best Practices and Error Handling

Always respect the website's robots.txt file. Add delays between requests.

Use try-except blocks to handle network errors. Log your scraping progress.

Set a User-Agent header to mimic a real browser. This helps avoid blocks.

For large projects, follow our large-scale best practices.


import requests
import time
from random import uniform

headers = {'User-Agent': 'Mozilla/5.0'}
base_url = "https://example.com/page?num="

for page in range(1, 10):
    url = base_url + str(page)
    try:
        resp = requests.get(url, headers=headers, timeout=10)
        resp.raise_for_status()  # Check for HTTP errors
    except requests.exceptions.RequestException as e:
        print(f"Error on page {page}: {e}")
        break
    
    # Process page with BeautifulSoup here...
    
    time.sleep(uniform(1, 3))  # Random delay between 1-3 seconds

Conclusion

You now know advanced BeautifulSoup techniques. You can scrape paginated and infinite scroll sites.

For static pagination, loop through URLs or find "Next" links. For infinite scroll, find the JSON API or use Selenium.

Always scrape ethically and legally. Check a site's terms of service before scraping.

Start with a simple beginner's guide if needed. Then tackle these advanced patterns.

Happy scraping!