Last modified: Jan 10, 2026 By Alexander Williams
Combine BeautifulSoup & Selenium for Web Scraping
Web scraping is a vital skill for data collection. Static sites are easy. Dynamic sites are hard. They need JavaScript to load content.
Tools like BeautifulSoup alone fail here. They cannot run JavaScript. This is where Selenium shines. It automates a real web browser.
This guide shows you how to merge them. Use Selenium to get the page. Use BeautifulSoup to parse it. Get the best of both worlds.
Why Use Both Tools Together?
BeautifulSoup is a parsing library. It excels at navigating HTML and XML. It is fast and easy to use for static content.
Selenium is a browser automation tool. It controls Chrome, Firefox, or Edge. It can click buttons and wait for AJAX calls.
Combining them is powerful. Selenium handles the dynamic part. BeautifulSoup handles the data extraction. It is efficient and clean.
You avoid Selenium's slower find methods. You use BeautifulSoup's elegant syntax instead. This improves code readability and speed.
Setting Up Your Environment
First, install the required Python packages. You need beautifulsoup4 and selenium. Use pip for installation.
pip install beautifulsoup4 selenium
You also need a WebDriver. This lets Selenium control the browser. Download the ChromeDriver for Chrome. Match your browser version.
Place the WebDriver in your system PATH. Or specify its location in your code. This is a crucial step for Selenium to work.
If you need help installing BeautifulSoup, see our step-by-step guide.
Basic Integration Workflow
The core process is simple. Use Selenium to load the page. Get the page source after JavaScript runs. Pass this HTML to BeautifulSoup.
Here is a basic code example. It scrapes a dynamic page. The page loads content via JavaScript after the initial load.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
# 1. Setup Selenium WebDriver
driver = webdriver.Chrome() # Ensure chromedriver is in PATH
# 2. Navigate to a dynamic website
driver.get("https://example-dynamic-site.com")
# 3. Wait for JavaScript to load content
time.sleep(3) # Simple wait; better to use explicit waits
# 4. Get the fully rendered page source
page_source = driver.page_source
# 5. Quit the browser to free resources
driver.quit()
# 6. Parse with BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')
# Now you can use BeautifulSoup methods
# For example, find all article titles
titles = soup.find_all('h2', class_='article-title')
for title in titles:
print(title.text.strip())
The code starts a Chrome browser. It goes to a URL. It waits for content to load. Then it grabs the final HTML.
After closing the browser, it creates a BeautifulSoup object. You can now use find_all or select.
This method is perfect for single-page apps. It also works for sites with complex user interactions.
Handling Complex Interactions
Some sites need more than just loading. You may need to click a "Load More" button. Or log in to access data.
Selenium can do these actions. You find the button element. Then you click it. Wait for new content. Then parse.
Here is an example for pagination. It clicks a button to load more items. Then it parses the updated page.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("https://example-pagination-site.com")
try:
# Wait for the "Load More" button to be clickable
load_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "button.load-more"))
)
# Click the button
load_button.click()
# Wait for new content to appear
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "new-item"))
)
print("New content loaded successfully.")
except Exception as e:
print(f"Error during interaction: {e}")
# Get the updated page source
page_html = driver.page_source
driver.quit()
# Parse with BeautifulSoup
soup = BeautifulSoup(page_html, 'lxml') # Using lxml parser for speed
items = soup.find_all('div', class_='item')
print(f"Found {len(items)} items after interaction.")
This script uses explicit waits. This is better than time.sleep. It waits for specific conditions.
After clicking, it waits for new items. Then it gets the HTML. BeautifulSoup parses it with the lxml parser.
For more on pagination, read our BeautifulSoup pagination guide.
Choosing the Right Parser
BeautifulSoup supports different parsers. The default is html.parser. It is good but slow.
lxml is faster and more lenient. It can handle messy HTML better. You need to install it separately.
pip install lxml
Then use it in your BeautifulSoup constructor. For example: BeautifulSoup(html, 'lxml').
The choice depends on your needs. Speed? Use lxml. No extra installs? Use html.parser. For a detailed comparison, see BeautifulSoup vs lxml.
If you encounter broken HTML, our guide on handling broken HTML with BeautifulSoup can help.
Best Practices and Tips
Always close the Selenium driver. Use driver.quit(). This closes the browser and releases resources.
Use explicit waits over hard-coded sleeps. It makes your script more reliable and faster.
Extract the page source only after all dynamic content loads. Check for specific elements that appear last.
Use BeautifulSoup's powerful methods like find, find_all, and CSS selectors. They are easier than Selenium's locators for extraction.
Handle errors gracefully. Websites change. Your selectors might break. Use try-except blocks.
Respect robots.txt and website terms. Do not overload servers. Add delays between requests.
Conclusion
Combining BeautifulSoup and Selenium is a powerful technique. It solves the dynamic content scraping problem.
Selenium acts as the browser. It renders JavaScript and handles interactions. BeautifulSoup acts as the parser. It extracts data cleanly and efficiently.
Start with the basic workflow. Then add complex interactions as needed. Remember to choose the right parser.
This approach gives you maximum flexibility. You can scrape almost any modern website. Happy scraping!