
- Scrapy - Overview
- Scrapy - Environment
- Scrapy - Command Line Tools
- Scrapy - Spiders
- Scrapy - Selectors
- Scrapy - Items
- Scrapy - Item Loaders
- Scrapy - Shell
- Scrapy - Item Pipeline
- Scrapy - Feed exports
- Scrapy - Requests & Responses
- Scrapy - Link Extractors
- Scrapy - Settings
- Scrapy - Exceptions
- Scrapy Live Project
- Scrapy - Create a Project
- Scrapy - Define an Item
- Scrapy - First Spider
- Scrapy - Crawling
- Scrapy - Extracting Items
- Scrapy - Using an Item
- Scrapy - Following Links
- Scrapy - Scraped Data
- Scrapy Built In Services
- Scrapy - Logging
- Scrapy - Stats Collection
- Scrapy - Sending an E-mail
- Scrapy - Telnet Console
- Scrapy - Web Services
- Scrapy Useful Resources
- Scrapy - Quick Guide
- Scrapy - Useful Resources
- Scrapy - Discussion
Scrapy - Other Settings
The following table shows other settings of Scrapy −
Sr.No | Setting & Description |
---|---|
1 |
AJAXCRAWL_ENABLED It is used for enabling the large crawls. Default value: False |
2 |
AUTOTHROTTLE_DEBUG It is enabled to see how throttling parameters are adjusted in real time, which displays stats on every received response. Default value: False |
3 |
AUTOTHROTTLE_ENABLED It is used to enable AutoThrottle extension. Default value: False |
4 |
AUTOTHROTTLE_MAX_DELAY It is used to set the maximum delay for download in case of high latencies. Default value: 60.0 |
5 |
AUTOTHROTTLE_START_DELAY It is used to set the initial delay for download. Default value: 5.0 |
6 |
AUTOTHROTTLE_TARGET_CONCURRENCY It defines the average number of requests for a Scrapy to send parallely to remote sites. Default value: 1.0 |
7 |
CLOSESPIDER_ERRORCOUNT It defines total number of errors that should be recieved before the spider is closed. Default value: 0 |
8 |
CLOSESPIDER_ITEMCOUNT It defines a total number of items before closing the spider. Default value: 0 |
9 |
CLOSESPIDER_PAGECOUNT It defines the maximum number of responses to crawl before spider closes. Default value: 0 |
10 |
CLOSESPIDER_TIMEOUT It defines the amount of time (in sec) for a spider to close. Default value: 0 |
11 |
COMMANDS_MODULE It is used when you want to add custom commands in your project. Default value: '' |
12 |
COMPRESSION_ENABLED It indicates that the compression middleware is enabled. Default value: True |
13 |
COOKIES_DEBUG If set to true, all the cookies sent in requests and received in responses are logged. Default value: False |
14 |
COOKIES_ENABLED It indicates that cookies middleware is enabled and sent to web servers. Default value: True |
15 |
FILES_EXPIRES It defines the delay for the file expiration. Default value: 90 days |
16 |
FILES_RESULT_FIELD It is set when you want to use other field names for your processed files. |
17 |
FILES_STORE It is used to store the downloaded files by setting it to a valid value. |
18 |
FILES_STORE_S3_ACL It is used to modify the ACL policy for the files stored in Amazon S3 bucket. Default value: private |
19 |
FILES_URLS_FIELD It is set when you want to use other field name for your files URLs. |
20 |
HTTPCACHE_ALWAYS_STORE Spider will cache the pages thoroughly if this setting is enabled. Default value: False |
21 |
HTTPCACHE_DBM_MODULE It is a database module used in DBM storage backend. Default value: 'anydbm' |
22 |
HTTPCACHE_DIR It is a directory used to enable and store the HTTP cache. Default value: 'httpcache' |
23 |
HTTPCACHE_ENABLED It indicates that HTTP cache is enabled. Default value: False |
24 |
HTTPCACHE_EXPIRATION_SECS It is used to set the expiration time for HTTP cache. Default value: 0 |
25 |
HTTPCACHE_GZIP This setting if set to true, all the cached data will be compressed with gzip. Default value: False |
26 |
HTTPCACHE_IGNORE_HTTP_CODES It states that HTTP responses should not be cached with HTTP codes. Default value: [] |
27 |
HTTPCACHE_IGNORE_MISSING This setting if enabled, the requests will be ignored if not found in the cache. Default value: False |
28 |
HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS It is a list containing cache controls to be ignored. Default value: [] |
29 |
HTTPCACHE_IGNORE_SCHEME It states that HTTP responses should not be cached with URI schemes. Default value: ['file'] |
30 |
HTTPCACHE_POLICY It defines a class implementing cache policy. Default value: 'scrapy.extensions.httpcache.DummyPolicy' |
31 |
HTTPCACHE_STORAGE It is a class implementing the cache storage. Default value: 'scrapy.extensions.httpcache.FilesystemCacheStorage' |
32 |
HTTPERROR_ALLOWED_CODES It is a list where all the responses are passed with non-200 status codes. Default value: [] |
33 |
HTTPERROR_ALLOW_ALL This setting when enabled, all the responses are passed despite of its status codes. Default value: False |
34 |
HTTPPROXY_AUTH_ENCODING It is used to authenticate the proxy on HttpProxyMiddleware. Default value: "latin-1" |
35 |
IMAGES_EXPIRES It defines the delay for the images expiration. Default value: 90 days |
36 |
IMAGES_MIN_HEIGHT It is used to drop images that are too small using minimum size. |
37 |
IMAGES_MIN_WIDTH It is used to drop images that are too small using minimum size. |
38 |
IMAGES_RESULT_FIELD It is set when you want to use other field name for your processed images. |
39 |
IMAGES_STORE It is used to store the downloaded images by setting it to a valid value. |
40 |
IMAGES_STORE_S3_ACL It is used to modify the ACL policy for the images stored in Amazon S3 bucket. Default value: private |
41 |
IMAGES_THUMBS It is set to create the thumbnails of downloaded images. |
42 |
IMAGES_URLS_FIELD It is set when you want to use other field name for your images URLs. |
43 |
MAIL_FROM The sender uses this setting to send the emails. Default value: 'scrapy@localhost' |
44 |
MAIL_HOST It is a SMTP host used to send emails. Default value: 'localhost' |
45 |
MAIL_PASS It is a password used to authenticate SMTP. Default value: None |
46 |
MAIL_PORT It is a SMTP port used to send emails. Default value: 25 |
47 |
MAIL_SSL It is used to implement connection using SSL encrypted connection. Default value: False |
48 |
MAIL_TLS When enabled, it forces connection using STARTTLS. Default value: False |
49 |
MAIL_USER It defines a user to authenticate SMTP. Default value: None |
50 |
METAREFRESH_ENABLED It indicates that meta refresh middleware is enabled. Default value: True |
51 |
METAREFRESH_MAXDELAY It is a maximum delay for a meta-refresh to redirect. Default value: 100 |
52 |
REDIRECT_ENABLED It indicates that the redirect middleware is enabled. Default value: True |
53 |
REDIRECT_MAX_TIMES It defines the maximum number of times for a request to redirect. Default value: 20 |
54 |
REFERER_ENABLED It indicates that referrer middleware is enabled. Default value: True |
55 |
RETRY_ENABLED It indicates that the retry middleware is enabled. Default value: True |
56 |
RETRY_HTTP_CODES It defines which HTTP codes are to be retried. Default value: [500, 502, 503, 504, 408] |
57 |
RETRY_TIMES It defines maximum number of times for retry. Default value: 2 |
58 |
TELNETCONSOLE_HOST It defines an interface on which the telnet console must listen. Default value: '127.0.0.1' |
59 |
TELNETCONSOLE_PORT It defines a port to be used for telnet console. Default value: [6023, 6073] |