Last modified: Aug 20, 2023 By Alexander Williams
How to Remove Specific or All HTML Tags in Beautifulsoup
In this guide, we'll learn how to remove specific and All HTML tags using Beautifulsoup.
Remove Specific HTML tags
You can use the method to remove specific HTML tags using Beautiful Soup. This method removes one or more tags from the parsed text.
Here's an example:
from bs4 import BeautifulSoup
html = """
<div class="article">
<h1>Title</h1>
<p>This is the <strong>content</strong> of the article.</p>
<a href="#">Read more</a>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
# Remove <strong> and <a> tags from the HTML
strong_tag = soup.find('strong')
if strong_tag:
strong_tag.extract()
a_tag = soup.find('a')
if a_tag:
a_tag.extract()
# Print the modified HTML
print(soup)
Output:
<div class="article">
<h1>Title</h1>
<p>This is the of the article.</p>
</div>
In this case, the .find()
method is used to find the <strong>
and <a>
tags, and then the .extract()
method is called on each of them to remove them from the parsed HTML.
The tags you chose will be removed from the created soup object.
If you wish to remove the tag without its content, take a look at the example below:
from bs4 import BeautifulSoup
html = """
<p>This is a <strong>bold</strong> statement.</p>
<p>This is an <em>italic</em> sentence.</p>
"""
soup = BeautifulSoup(html, 'html.parser')
# Specify the tag you want to remove
tag_to_remove = 'strong'
# Find all instances of the specified tag
tags = soup.find_all(tag_to_remove)
# Replace each tag with its content
for tag in tags:
tag.unwrap()
# Print the modified HTML
print(soup)
Output:
<p>This is a bold statement.</p>
<p>This is an <em>italic</em> sentence.</p>
The .unwrap()
method replaces tags with their contents.
Remove All HTML tags
The .get_text()
method is the easiest way to remove all HTML tags. This method takes the HTML code and removes any tags from it.
Here's a simple example:
from bs4 import BeautifulSoup
html = "<p>This is <strong>bold</strong> and <em>italic</em> text.</p>"
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
print(text)
Output:
This is bold and italic text.
As you can see, all tags have been removed from the document.