我正在使用请求库抓取数据,并使用Beautiful Soup解析它们。
我正在将抓取的数据存储在mysql db中。
我想在每次在表中找到新条目时都运行 Scraping 板。
假设你已经有了抓取方法,就称它为 scrape_data()
你可以使用MySQL-Python-Connector直接在数据库上运行查询以在其读取每一行时进行抓取(尽管你可能希望将它们缓冲到内存中以处理断开连接)
# Importing the MySQL-Python-connector
import mysql.connector as mysqlConnector
# Creating connection with the MySQL Server Running. Remember to use your own credentials.
conn = mysqlConnector.connect(host='localhost',user='root',passwd='root')
# Handle bad connections
if conn:
print("Connection Successful :)")
else:
print("Connection Failed :(")
# Creating a cursor object to traverse the resultset
cur = conn.cursor()
# Assuming the column is called data in a table called table. Replace as needed.
cur.execute("SELECT data FROM table")
for row in cur:
scrape_data(row[0]) # Assumes data is the first column.
# Closing the connection - or you will end up with a resource leak
conn.close()
你可以在此处找到官方连接器。