Last modified: Mar 04, 2023 By Alexander Williams

Understand How to Work with Table in beautifulsoup

This article will cover everything you need to know about using tables and BeautifulSoup. Specifically, we will go over how to:

  • Find the table within HTML
  • Find the table headers
  • Retrieve the table columns
  • Find the table by class
  • Find the table by ID
  • Find the table in a table
  • Find all tables

By the end of this article, you will understand how to work with tables in BeautifulSoup.

Find table within HTML

To find a table within HTML using BeautifulSoup, follow the code below:

from bs4 import BeautifulSoup

html_doc = '''
<table class="table-1">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

<table class="table-2">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Facebook</td>
    <td>Mark</td>
    <td>USA</td>
  </tr>
  <tr>
    <td>Centro comercial Newyork</td>
    <td>Newyork Chang</td>
    <td>USA</td>
  </tr>
</table>
'''

# Parse HTML
soup = BeautifulSoup(html_doc, 'html.parser')

# Find the first table
table = soup.find('table')

print(table)

As you can see in the HTML code provided, there are two tables with different class attributes. To find the first table, we can use the following line of code: soup.find('table').

<table class="table-1">
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
</table>

If you need to find all the columns in a table, you can use the find_all() function and specify the tr tag as the parameter.

# Find the first table
table = soup.find('table')

# find all rows in the table
rows = table.find_all('tr')

print(rows)

Output:

[<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>, <tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>, <tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>]

Additionally, if you want to extract the headers of a table, you can use the find_all() function and specify the th tag as the parameter.

# find headers in the table
headers = table.find_all('th')

print(headers)

Output:

[<th>Company</th>, <th>Contact</th>, <th>Country</th>]

To get the text inside each header cell, you can use the .text property as shown in the following example:

for header in headers:
    print(header.text)

Output:

Company
Contact
Country

To find and retrieve the text of each cell <td> In a table, you can follow these steps::

  1. Iterate over the rows of the table.
  2. Find all <td> tags within each row.
  3. Iterate over the <td> tags.
  4. Retrieve the text inside each <td> tag.

Here is an example:

# Find the first table
table = soup.find('table')

# find all rows in the table
rows = table.find_all('tr')

for row in rows:
    cells = row.find_all('td') # Find all <td> tags
    for cell in cells:
        print(cell.text) # Get <td> text

Output:

Alfreds Futterkiste
Maria Anders
Germany
Centro comercial Moctezuma
Francisco Chang
Mexico

Find all tables

In Beautiful Soup, we can use either the find_all() or select() function to locate all tables within HTML. Here's an example of how to use each function:

1. find_all():

html_doc = '''
<table class="table-1">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

<table class="table-2">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Facebook</td>
    <td>Mark</td>
    <td>USA</td>
  </tr>
  <tr>
    <td>Centro comercial Newyork</td>
    <td>Newyork Chang</td>
    <td>USA</td>
  </tr>
</table>
'''

soup = BeautifulSoup(html_doc, 'html.parser')

tables = soup.find_all('table') # Find all tables using find_all()

2. Select()

html_doc = '''
<table class="table-1">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

<table class="table-2">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Facebook</td>
    <td>Mark</td>
    <td>USA</td>
  </tr>
  <tr>
    <td>Centro comercial Newyork</td>
    <td>Newyork Chang</td>
    <td>USA</td>
  </tr>
</table>
'''

soup = BeautifulSoup(html_doc, 'html.parser')

tables = soup.select('table') # Find all tables using select()

Both the find_all() and select() functions will return a list of all tables in the HTML code. You can then iterate over this list to access each table individually

Find table by class

To find a table by class, we can use the find() function and specify the class in the class_ parameter.

html_doc = '''
<table class="table-1">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table>

<table class="table-2">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Facebook</td>
    <td>Mark</td>
    <td>USA</td>
  </tr>
  <tr>
    <td>Centro comercial Newyork</td>
    <td>Newyork Chang</td>
    <td>USA</td>
  </tr>
</table>
'''

soup = BeautifulSoup(html_doc, 'html.parser')

# Find Table by Class
table = soup.find("table", class_="table-1")

In the example above, we want to find the table with the table-1 class.

Another function that can be used to find a table by class is select_one(). As you can see in the following example:

table = soup.select_one("table.table-1")

Extract table inside other tables

To extract tables nested inside other tables using Beautiful Soup, you can use the same method for extracting any other element. Here is an example:

html_doc = '''
<table class="outer-table">
  <tr>
    <td>Outer Table Cell 1</td>
    <td>Outer Table Cell 2</td>
  </tr>
  <tr>
    <td colspan="2">
      <table class="inner-table">
        <tr>
          <td>Inner Table Cell 1</td>
          <td>Inner Table Cell 2</td>
        </tr>
        <tr>
          <td>Inner Table Cell 3</td>
          <td>Inner Table Cell 4</td>
        </tr>
      </table>
    </td>
  </tr>
</table>

'''

#
soup = BeautifulSoup(html_doc, 'html.parser')

# find the outer table
outer_table = soup.find('table', {'class': 'outer-table'})

# find the inner table within the outer table
inner_table = outer_table.find('table', {'class': 'inner-table'})

# iterate through the rows of the inner table and extract data from each cell
rows = inner_table.find_all('tr')
for row in rows:
    cells = row.find_all('td')
    for cell in cells:
        print(cell.text)

Output:

Inner Table Cell 1
Inner Table Cell 2
Inner Table Cell 3
Inner Table Cell 4

Let me explain:

  1. Find the outer table that contains the nested table.
  2. Find the inner table within the outer table.
  3. Iterate through the rows of the inner table.
  4. Extract data from each cell as required.

Conclusion

In this article, we have covered everything you need to know about working with tables in BeautifulSoup. As you can see, BeautifulSoup provides a robust set of functions for extracting tables and other structured data from HTML documents.

I hope this article helps you.