Warm tip: This article is reproduced from serverfault.com, please click

How do I run a scraper on each entry in a database?

发布于 2020-12-08 01:50:22

I'm scraping data with requests library and parsing them with Beautiful Soup.

I'm storing scraped data in mysql db.

I want to run a scraper each time when it finds a new entry in a table.

Questioner
Asad Ullah
Viewed
0
Menachem Hornbacher 2020-12-08 10:21:28

Assuming you have your scraping method already, let's call it scrape_data()

You can use the MySQL-Python-Connector to run a query on the database directly to scrape as it reads each row (although you might want to buffer them into memory to handle disconnects)

# Importing the MySQL-Python-connector
import mysql.connector as mysqlConnector

# Creating connection with the MySQL Server Running. Remember to use your own credentials.
conn = mysqlConnector.connect(host='localhost',user='root',passwd='root')

# Handle bad connections
if conn:
    print("Connection Successful :)")
else:
    print("Connection Failed :(")

# Creating a cursor object to traverse the resultset
cur = conn.cursor()

# Assuming the column is called data in a table called table. Replace as needed.
cur.execute("SELECT data FROM table")
for row in cur:
    scrape_data(row[0]) # Assumes data is the first column.

# Closing the connection - or you will end up with a resource leak
conn.close()

Note

You can find the official connector here.