Beautifulsoup: Get script Tag and Content

Beautifulsoup: Get script Tag and Content

This tutorial will teach us how to get <script> tag and <script> content in Beautifulsoup.

Get all scripts tag

To get all scripts tag, we need to use find_all() function

Let's see an example.

from bs4 import BeautifulSoup # Import BeautifulSoup module

# 👇 HTML Source
html = '''
<head>

<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>

<script> console.log('Hellow BeautifulSoup') </script>
</head>

'''

soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing

scripts = soup.find_all("script") # 👉️ Find all script tags

print(scripts) # 👉️ Print Result

Output:

[<script src="/static/js/prism.js"></script>, <script src="/static/js/bootstrap.bundle.min.js"></script>, <script src="/static/js/main.js"></script>, <script> console.log('Hellow BeautifulSoup') </script>]

As you can see, we got the script tags as a list. Now let's print them one by one.

for script in scripts: # 👉️ Loop Over scripts
    print(script)

Output:

<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
<script> console.log('Hellow BeautifulSoup') </script>

Get Script tags that come with a script file

To get only the script tags that come with a script file, we need to:

  • use find_all() function
  • set src=True parameter

Example:

# 👇 HTML Source
html = '''
<head>

<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>

<script> console.log('Hellow BeautifulSoup') </script>
</head>

'''

soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing

scripts = soup.find_all("script", src=True) # 👉️ Find all script tags that come with the src attribute

print(scripts) # 👉️ Print Result

Output:

[<script src="/static/js/prism.js"></script>, <script src="/static/js/bootstrap.bundle.min.js"></script>, <script src="/static/js/main.js"></script>]

To get the src attribute of the scripts, follow the code below.

# Get src attribute
for script in scripts: # 👉️ Loop Over scripts
    print(script['src'])

Output:

/static/js/prism.js
/static/js/bootstrap.bundle.min.js
/static/js/main.js

As you can see, we've used ['src'] to get the src URL of the script tags.

Get Content of Script tag

To get the content of a script tag, we need to use the .string property. However, Let's see an example:

# 👇 HTML Source
html = '''
<head>

<script src="/static/js/prism.js"></script>
<script src="/static/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>

<script> console.log('Hellow BeautifulSoup') </script>
</head>

'''

soup = BeautifulSoup(html, 'html.parser') # 👉️ Parsing

scripts = soup.find_all("script", string=True) # 👉️ Find all script tags

print(scripts) # 👉️ Print Result

Output:

[<script> console.log('Hellow BeautifulSoup') </script>]

We've set string=True to find all script tags that have content. Now we'll print the content of the script tag.

# Get content of script
for script in scripts: # 👉️ Loop Over scripts
    print(script.string)

Output:

console.log('Hellow BeautifulSoup') 

Conclusion

In the Beautifulsoup topic, we've learned how to get all script tags. Also, we've learned how to get the src attribute and content of the script tag.

Learn also: Understand How to Use the attribute in Beautifulsoup Python

Happy Coding </>