Last modified: Jan 10, 2023 By Alexander Williams
Beautifulsoup: Get script Tag and Content
This tutorial will teach us how to get <script> tag and <script> content in Beautifulsoup.
Table Of Contents
Get all scripts tag
To get all scripts tag, we need to use find_all() function
Let's see an example.
from bs4 import BeautifulSoup # Import BeautifulSoup module
# 👇 HTML Source
html = '''
<head>
<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
<script> console.log('Hellow BeautifulSoup') </script>
</head>
'''
soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing
scripts = soup.find_all("script") # 👉️ Find all script tags
print(scripts) # 👉️ Print Result
Output:
[<script src="/static/js/prism.js"></script>, <script src="/static/js/bootstrap.bundle.min.js"></script>, <script src="/static/js/main.js"></script>, <script> console.log('Hellow BeautifulSoup') </script>]
As you can see, we got the script tags as a list. Now let's print them one by one.
for script in scripts: # 👉️ Loop Over scripts
print(script)
Output:
<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
<script> console.log('Hellow BeautifulSoup') </script>
Get Script tags that come with a script file
To get only the script tags that come with a script file, we need to:
- use find_all() function
- set src=True parameter
Example:
# 👇 HTML Source
html = '''
<head>
<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
<script> console.log('Hellow BeautifulSoup') </script>
</head>
'''
soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing
scripts = soup.find_all("script", src=True) # 👉️ Find all script tags that come with the src attribute
print(scripts) # 👉️ Print Result
Output:
[<script src="/static/js/prism.js"></script>, <script src="/static/js/bootstrap.bundle.min.js"></script>, <script src="/static/js/main.js"></script>]
To get the src attribute of the scripts, follow the code below.
# Get src attribute
for script in scripts: # 👉️ Loop Over scripts
print(script['src'])
Output:
/static/js/prism.js
/static/js/bootstrap.bundle.min.js
/static/js/main.js
As you can see, we've used ['src'] to get the src URL of the script tags.
Get Content of Script tag
To get the content of a script tag, we need to use the .string property. However, Let's see an example:
# 👇 HTML Source
html = '''
<head>
<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
<script> console.log('Hellow BeautifulSoup') </script>
</head>
'''
soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing
scripts = soup.find_all("script", string=True) # 👉️ Find all script tags
print(scripts) # 👉️ Print Result
Output:
[<script> console.log('Hellow BeautifulSoup') </script>]
We've set string=True to find all script tags that have content. Now we'll print the content of the script tag.
# Get content of script
for script in scripts: # 👉️ Loop Over scripts
print(script.string)
Output:
console.log('Hellow BeautifulSoup')
Conclusion
In the Beautifulsoup topic, we've learned how to get all script tags. Also, we've learned how to get the src attribute and content of the script tag.
Learn also: Understand How to Use the attribute in Beautifulsoup Python
Happy Coding </>