- Scrapy - Overview
- Scrapy - Environment
- Scrapy - Command Line Tools
- Scrapy - Spiders
- Scrapy - Selectors
- Scrapy - Items
- Scrapy - Item Loaders
- Scrapy - Shell
- Scrapy - Item Pipeline
- Scrapy - Feed exports
- Scrapy - Requests & Responses
- Scrapy - Link Extractors
- Scrapy - Settings
- Scrapy - Exceptions
- Scrapy Live Project
- Scrapy - Create a Project
- Scrapy - Define an Item
- Scrapy - First Spider
- Scrapy - Crawling
- Scrapy - Extracting Items
- Scrapy - Using an Item
- Scrapy - Following Links
- Scrapy - Scraped Data
- Scrapy Built In Services
- Scrapy - Logging
- Scrapy - Stats Collection
- Scrapy - Sending an E-mail
- Scrapy - Telnet Console
- Scrapy - Web Services
- Scrapy Useful Resources
- Scrapy - Quick Guide
- Scrapy - Useful Resources
- Scrapy - Discussion
Scrapy - Feed exports
Description
Feed exports is a method of storing the data scraped from the sites, that is generating a "export file".
Serialization Formats
Using multiple serialization formats and storage backends, Feed Exports use Item exporters and generates a feed with scraped items.
The following table shows the supported formats−
| Sr.No | Format & Description |
|---|---|
| 1 |
JSON FEED_FORMAT is json Exporter used is class scrapy.exporters.JsonItemExporter |
| 2 |
JSON lines FEED_FROMAT is jsonlines Exporter used is class scrapy.exporters.JsonLinesItemExporter |
| 3 |
CSV FEED_FORMAT is CSV Exporter used is class scrapy.exporters.CsvItemExporter |
| 4 |
XML FEED_FORMAT is xml Exporter used is class scrapy.exporters.XmlItemExporter |
Using FEED_EXPORTERS settings, the supported formats can also be extended −
| Sr.No | Format & Description |
|---|---|
| 1 |
Pickle FEED_FORMAT is pickel Exporter used is class scrapy.exporters.PickleItemExporter |
| 2 |
Marshal FEED_FORMAT is marshal Exporter used is class scrapy.exporters.MarshalItemExporter |
Storage Backends
Storage backend defines where to store the feed using URI.
Following table shows the supported storage backends −
| Sr.No | Storage Backend & Description |
|---|---|
| 1 |
Local filesystem URI scheme is file and it is used to store the feeds. |
| 2 |
FTP URI scheme is ftp and it is used to store the feeds. |
| 3 |
S3 URI scheme is S3 and the feeds are stored on Amazon S3. External libraries botocore or boto are required. |
| 4 |
Standard output URI scheme is stdout and the feeds are stored to the standard output. |
Storage URI Parameters
Following are the parameters of storage URL, which gets replaced while the feed is being created −
- %(time)s: This parameter gets replaced by a timestamp.
- %(name)s: This parameter gets replaced by spider name.
Settings
Following table shows the settings using which Feed exports can be configured −
| Sr.No | Setting & Description |
|---|---|
| 1 |
FEED_URI It is the URI of the export feed used to enable feed exports. |
| 2 |
FEED_FORMAT It is a serialization format used for the feed. |
| 3 |
FEED_EXPORT_FIELDS It is used for defining fields which needs to be exported. |
| 4 |
FEED_STORE_EMPTY It defines whether to export feeds with no items. |
| 5 |
FEED_STORAGES It is a dictionary with additional feed storage backends. |
| 6 |
FEED_STORAGES_BASE It is a dictionary with built-in feed storage backends. |
| 7 |
FEED_EXPORTERS It is a dictionary with additional feed exporters. |
| 8 |
FEED_EXPORTERS_BASE It is a dictionary with built-in feed exporters. |