Scrapy run 2 spiders with outputs to 2 different files using one process (AWS Lambda)

发布于 2020-03-29 12:48:30

I'm trying to run Scrapy on an AWS Lambda function and everything is almost working, except that I need to run 2 lambdas in the 1 function. The main catch is that I need the 2 spiders to output to 2 different JSON files.

The docs look like they've got a very close solution:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start()

Except for the fact that if I were to input my settings into the CrawlerProcess like I currently have:

CrawlerProcess({
    'FEED_FORMAT': 'json',
    'FEED_URI': '/tmp/fx_today_data.json'
})

Then both spiders would output to the one file fx_today_data.json.

I've tried creating 2 CrawlerProcesses but that gives me the ReactorNotRestartable Error which I've tried solving using this thread but with no success.

I've also tried running the scrapy code like so:

subprocess.call(["scrapy", "runspider", "./spiders/fx_today_data.py", "-o", "/tmp/fx_today_data.json"])

But this results in the usual 'scrapy' command not found - because I don't have a virtualenv set up in the Lambda function (I don't know if it's worth setting one up for this?).

Does anyone know how to run 2 Scrapy Spiders (they don't have to run at the same time) in one process and have them output to separate files?

Questioner

Jamie

Viewed

110

Chinese

Original

Scrapy run 2 spiders with outputs to 2 different files using one process (AWS Lambda)

Related issues