Last modified: Sep 11, 2023 By Alexander Williams

Scrapy css() - Find By Class Selector

In this guide, we'll learn how to use Scrapy's css() method to find elements by class. Follow this guide if you haven't set up your Scrapy project yet.

What is css() method

In Scrapy, the css() method is used for parsing and selecting elements from HTML or XML documents based on CSS selectors.

Here is the syntax of css() method:

.css('your_css_selector')

.class_name: Replace 'class_name' with the specific class name you want to target.

How to find elements by class selector

In the following example, we'll use the css() method to find all the article titles on pytutorial.com home page.

This is the structure of the article's HTML:

<h2 class="post-title"><a href="/how-to-install-and-setup-scrapy">How to Install and Setup Scrapy</a></h2>

Now, by using the .post-title a css selector, let's  write a Scrapy spider that find and retrieve the titles of all the articles:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'  # Unique name for your spider
    start_urls = ['http://pytutorial.com']  # Starting URL for web scraping

    def parse(self, response):
        # Use CSS selector to select all elements with class "post-title" inside <a> tags
        articles = response.css('.post-title a')

        for article in articles:
            # Extract text from each selected <a> tag
            content_text = article.css('::text').get()
            
            # Print or yield the extracted data as needed
            yield {'title': content_text}  # Yield the extracted title

Now let's run our Scrapy crawl by using this command:

scrapy crawl myspider -O data.json

The result will be written in data.json:

[
{"title": "How to Install and Setup Scrapy"},
{"title": "How to Append Multiple Items to List in Python"},
{"title": "How to Use BeautifulSoup clear() Method"},
{"title": "Python: Add Variable to String & Print Using 4 Methods"},
{"title": "How to Use Beautifulsoup  select_one() Method"},
{"title": "How To Solve ModuleNotFoundError: No module named in Python"},
{"title": "Beautifulsoup Get All Links"},
{"title": "Beautifulsoup image alt Attribute"},
{"title": "How to Get href of Element using BeautifulSoup [Easily]"},
{"title": "Understand How to Use Beautifulsoup find_all() Function"}
]

As you can see, all of the article titles have been extracted.

Conclusion

Scrapy makes it easy to select and extract elements on a web page that have specific CSS classes. This allows you to retrieve the data you need efficiently.