Warm tip: This article is reproduced from serverfault.com, please click

database python screen-scraping

How do I run a scraper on each entry in a database?

发布于 2020-12-08 01:50:22

I'm scraping data with requests library and parsing them with Beautiful Soup.

I'm storing scraped data in mysql db.

I want to run a scraper each time when it finds a new entry in a table.

Questioner

Asad Ullah

Viewed

0

Menachem Hornbacher 2020-12-08 10:21:28

Assuming you have your scraping method already, let's call it scrape_data()

You can use the MySQL-Python-Connector to run a query on the database directly to scrape as it reads each row (although you might want to buffer them into memory to handle disconnects)

# Importing the MySQL-Python-connector
import mysql.connector as mysqlConnector

# Creating connection with the MySQL Server Running. Remember to use your own credentials.
conn = mysqlConnector.connect(host='localhost',user='root',passwd='root')

# Handle bad connections
if conn:
    print("Connection Successful :)")
else:
    print("Connection Failed :(")

# Creating a cursor object to traverse the resultset
cur = conn.cursor()

# Assuming the column is called data in a table called table. Replace as needed.
cur.execute("SELECT data FROM table")
for row in cur:
    scrape_data(row[0]) # Assumes data is the first column.

# Closing the connection - or you will end up with a resource leak
conn.close()

Note

You can find the official connector here.

热门帖子

1

google doc如何快速插入日期时间？

2

求一个 Spotify 长期车

3

C++新手，求助一个关于怎么使用第三方库的问题

4

西安出个联通千兆宽带

5

关于英语学习的重要性的思考

6

出美版 iPhone 13promax， V 友 4600 吧

7

这里分享一个免费的在线 PDF 总结工具： NoteGPT

8

没想到 Arc 浏览器对网络要求如此严格

9

国内用户 android 手机如何使用 chatgpt 客户端

10

深陷消费主义陷阱的背后，是我空洞的灵魂

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

3

shadcn/ui, but for Svelte. ✨

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

5

Performance-portable, length-agnostic SIMD with runtime dispatch

6

ZK Credo

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

12

🎓 Path to a free self-taught education in Computer Science!

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

14

A collective list of free APIs

15

📚 Freely available programming books