Last modified: Jan 10, 2026 By Alexander Williams

Parse Nested HTML with BeautifulSoup Guide

Web pages are built with nested HTML tags. This creates a tree-like structure. Extracting data from deep inside this tree can be tricky. BeautifulSoup makes it simple.

This guide teaches you to navigate nested HTML. You will learn key methods and see practical examples. Soon you will scrape complex data with confidence.

Understanding HTML Nesting

HTML elements are often inside other elements. A <div> may contain a <table>. That table may hold many <tr> and <td> tags.

This is called nesting. It forms a parent-child relationship. BeautifulSoup lets you move through these relationships easily.

First, ensure you have BeautifulSoup installed. If not, follow our guide on Install BeautifulSoup in Python Step by Step.

Basic Setup and Parsing

Start by importing the necessary libraries. You need BeautifulSoup from bs4. You also need a parser like lxml or html.parser.


from bs4 import BeautifulSoup

# Sample nested HTML
html_doc = """
<html>
<body>
    <div class="container">
        <h1>Main Title</h1>
        <ul id="list">
            <li>Item 1</li>
            <li>Item 2</li>
            <li>Item 3</li>
        </ul>
    </div>
</body>
</html>
"""

# Create the BeautifulSoup object
soup = BeautifulSoup(html_doc, 'html.parser')
print(type(soup))
    

<class 'bs4.BeautifulSoup'>
    

If you see a NameError, check our fix for Fix Python NameError: Name 'BeautifulSoup' Not Defined.

Navigating Down: Children and Descendants

To go deeper into the nest, use .contents or .children. These access direct children. The .descendants property goes through all levels.


container = soup.find('div', class_='container')

# Get direct children
for child in container.children:
    print(child.name if child.name else repr(child))

print("---")

# Get all descendants
for descendant in container.descendants:
    if descendant.name:
        print(descendant.name)
    

'h1'
'\n'
'ul'
'\n'

h1
ul
li
li
li
    

Children are only one level down. Descendants include everything nested below.

Navigating Up: Parent and Parents

Sometimes you need to move up the tree. Use .parent to get the direct parent. Use .parents to get all ancestors.


first_li = soup.find('li')
print("Direct parent:", first_li.parent.name)

print("All parent tags:")
for parent in first_li.parents:
    if parent.name:
        print(parent.name)
    

Direct parent: ul
All parent tags:
ul
div
body
html
[document]
    

For more control, learn about the BeautifulSoup find_parent() Method.

Navigating Sideways: Siblings

Elements at the same nesting level are siblings. Use .next_sibling and .previous_sibling. Use .next_siblings and .previous_siblings for all.


second_li = soup.find_all('li')[1]  # Get "Item 2"
print("Item text:", second_li.string)

prev_sib = second_li.previous_sibling
print("Previous sibling:", repr(prev_sib))

next_sib = second_li.next_sibling
print("Next sibling:", repr(next_sib))
    

Item text: Item 2
Previous sibling: '\n'
Next sibling: '\n'
    

Note that newline characters are also counted as siblings. Often you need to filter for tags only.

Using find() and find_all() in Nests

The find() and find_all() methods are your main tools. They search within the current element's subtree. This is perfect for nested data.


# Find the ul inside the container
ul_tag = container.find('ul')
print("Found UL id:", ul_tag['id'])

# Find all li tags inside that ul
li_tags = ul_tag.find_all('li')
for li in li_tags:
    print(li.text)
    

Found UL id: list
Item 1
Item 2
Item 3
    

This is very useful for How to Extract Data from Tables Using BeautifulSoup.

Real-World Example: Nested Product List

Let's parse a complex nested structure. Imagine an e-commerce page with products in categories.


complex_html = """
<div class="shop">
    <section class="category">
        <h2>Books</h2>
        <div class="products">
            <article class="product">
                <h3 class="title">Python Guide</h3>
                <span class="price">$29.99</span>
            </article>
            <article class="product">
                <h3 class="title">Web Scraping Book</h3>
                <span class="price">$24.99</span>
            </article>
        </div>
    </section>
</div>
"""

soup2 = BeautifulSoup(complex_html, 'html.parser')
categories = soup2.find_all('section', class_='category')

for cat in categories:
    cat_name = cat.h2.text
    print(f"Category: {cat_name}")
    products = cat.find_all('article', class_='product')
    for prod in products:
        title = prod.find('h3', class_='title').text
        price = prod.find('span', class_='price').text
        print(f"  - {title}: {price}")
    

Category: Books
  - Python Guide: $29.99
  - Web Scraping Book: $24.99
    

We started at the top-level <section>. Then we drilled down into each product article. This is a common pattern.

Handling Dynamic Nested Content

Some nested content loads with JavaScript. Basic BeautifulSoup may not see it. For those cases, you need a different approach.

Consider using tools like Requests-HTML or Selenium. They can render JavaScript first. Then you can parse the final HTML with BeautifulSoup.

Learn more in our guide on Scrape Dynamic Content with BeautifulSoup & Requests-HTML.

Common Pitfalls and Tips

Whitespace as Siblings: Remember that text and newlines are nodes. Use conditions like if element.name to filter them.

Missing Elements: Always check if a find() call returned None before accessing attributes.

Overly Generic Selectors: Be specific. Use CSS classes or IDs. Combine tag names with attributes for robust searches.

Conclusion

Parsing nested HTML is a core web scraping skill. BeautifulSoup provides a simple API for this task. You can navigate down, up, and sideways.

Use .find() and .find_all() to search within elements. Understand parent-child-sibling relationships. Practice on real websites.

Start with simple nests and move to complex ones. Soon you will extract any data you need from the web's tangled HTML trees.