Scrapy - Items



Description

Scrapy process can be used to extract the data from sources such as web pages using the spiders. Scrapy uses Item class to produce the output whose objects are used to gather the scraped data.

Declaring Items

You can declare the items using the class definition syntax along with the field objects shown as follows −

import scrapy 
class MyProducts(scrapy.Item): 
   productName = Field() 
   productLink = Field() 
   imageURL = Field() 
   price = Field() 
   size = Field() 

Item Fields

The item fields are used to display the metadata for each field. As there is no limitation of values on the field objects, the accessible metadata keys does not ontain any reference list of the metadata. The field objects are used to specify all the field metadata and you can specify any other field key as per your requirement in the project. The field objects can be accessed using the Item.fields attribute.

Working with Items

There are some common functions which can be defined when you are working with the items. For more information, click this link.

Extending Items

The items can be extended by stating the subclass of the original item. For instance −

class MyProductDetails(Product): 
   original_rate = scrapy.Field(serializer = str) 
   discount_rate = scrapy.Field()

You can use the existing field metadata to extend the field metadata by adding more values or changing the existing values as shown in the following code −

class MyProductPackage(Product): 
   name = scrapy.Field(Product.fields['name'], serializer = serializer_demo)

Item Objects

The item objects can be specified using the following class which provides the new initialized item from the given argument −

class scrapy.item.Item([arg])

The Item provides a copy of the constructor and provides an extra attribute that is given by the items in the fields.

Field Objects

The field objects can be specified using the following class in which the Field class doesn't issue the additional process or attributes −

class scrapy.item.Field([arg])
Advertisements