Scrapy - Feed exports



Description

Feed exports is a method of storing the data scraped from the sites, that is generating a "export file".

Serialization Formats

Using multiple serialization formats and storage backends, Feed Exports use Item exporters and generates a feed with scraped items.

The following table shows the supported formats−

Sr.No Format & Description
1

JSON

FEED_FORMAT is json

Exporter used is class scrapy.exporters.JsonItemExporter

2

JSON lines

FEED_FROMAT is jsonlines

Exporter used is class scrapy.exporters.JsonLinesItemExporter

3

CSV

FEED_FORMAT is CSV

Exporter used is class scrapy.exporters.CsvItemExporter

4

XML

FEED_FORMAT is xml

Exporter used is class scrapy.exporters.XmlItemExporter

Using FEED_EXPORTERS settings, the supported formats can also be extended −

Sr.No Format & Description
1

Pickle

FEED_FORMAT is pickel

Exporter used is class scrapy.exporters.PickleItemExporter

2

Marshal

FEED_FORMAT is marshal

Exporter used is class scrapy.exporters.MarshalItemExporter

Storage Backends

Storage backend defines where to store the feed using URI.

Following table shows the supported storage backends −

Sr.No Storage Backend & Description
1

Local filesystem

URI scheme is file and it is used to store the feeds.

2

FTP

URI scheme is ftp and it is used to store the feeds.

3

S3

URI scheme is S3 and the feeds are stored on Amazon S3. External libraries botocore or boto are required.

4

Standard output

URI scheme is stdout and the feeds are stored to the standard output.

Storage URI Parameters

Following are the parameters of storage URL, which gets replaced while the feed is being created −

  • %(time)s: This parameter gets replaced by a timestamp.
  • %(name)s: This parameter gets replaced by spider name.

Settings

Following table shows the settings using which Feed exports can be configured −

Sr.No Setting & Description
1

FEED_URI

It is the URI of the export feed used to enable feed exports.

2

FEED_FORMAT

It is a serialization format used for the feed.

3

FEED_EXPORT_FIELDS

It is used for defining fields which needs to be exported.

4

FEED_STORE_EMPTY

It defines whether to export feeds with no items.

5

FEED_STORAGES

It is a dictionary with additional feed storage backends.

6

FEED_STORAGES_BASE

It is a dictionary with built-in feed storage backends.

7

FEED_EXPORTERS

It is a dictionary with additional feed exporters.

8

FEED_EXPORTERS_BASE

It is a dictionary with built-in feed exporters.

Advertisements