Warm tip: This article is reproduced from serverfault.com, please click

python-Reddit 爬虫和电报机器人

(python - Reddit Scraper and Telegram Bot)

发布于 2020-03-04 15:53:45

我有一个想法,从“科学”subreddit 中抓取一些科学新闻,并通过电报机器人将其广播到我的电报频道。我已经为这些任务中的每一个在 Python 中构建了这两个简单的代码片段。现在我想知道将它们组合到一个完整的代码块中的最佳方法是什么,以便机器人可以在每次执行程序时自动将已抓取的信息发送到通道。两个脚本单独工作都很好。请指教。

Reddit 爬虫

import praw

# assigning Reddit API data
# see further instructions here --> https://www.reddit.com/prefs/apps
reddit = praw.Reddit(client_id='XXXX', \
                     client_secret='XXXXXXXXXXXXXXXXXXXXXXX', \
                     user_agent='science_bot', \
                     username='XXXXXX', \
                     password='XXXXXXXXXXXXXXXXXX')

# select a subreddit you want to use for scraping data
subreddit = reddit.subreddit('science')
new_subreddit = subreddit.new(limit=500)
print("\t", "Digest of the latest scientific news for today: \n")
for submission in subreddit.new(limit=5):
    print(submission.title)
    print(submission.url, "\n")

发布电报机器人

import requests

def telegram_bot_sendtext(bot_message):
    
    bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
    bot_chatID = '@XXXXXX'
    send_text = 'https://api.telegram.org/bot' + bot_token + '/sendMessage?chat_id=' + bot_chatID + '&parse_mode=Markdown&text=' + bot_message

    response = requests.get(send_text)

    return response.json()
    

test = telegram_bot_sendtext("Testing my new Telegram bot.")
print(test)

提前致谢!

Questioner
web_tracer
Viewed
0
web_tracer 2020-03-07 04:45:49

我已经使用以下代码结构解决了这个问题。感谢@maxwell 提出了一个简单而优雅的想法。

import telegram
import telebot
import praw

bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXX'
bot_chatID = '@your_channel_name'
bot = telebot.TeleBot('XXXXXXXXXXXXXXXXXXXXXXXXX')

reddit = praw.Reddit(client_id='XXXXXXXXXXXXXX', \
                     client_secret='XXXXXXXXXXXXXXXXXXXXXXXX', \
                     user_agent='your_bot_name', \
                     username='your_reddit_username', \
                     password='XXXXXXXXXXXXXX')

def reddit_scraper(submission):
    news_data = []
    subreddit = reddit.subreddit('name_of_subreddit')
    new_subreddit = subreddit.new(limit=500)
    for submission in subreddit.new(limit=5):
        data = {}
        data['title'] = submission.title
        data['link'] = submission.url
        news_data.append(data)
    return news_data

def get_msg(news_data):
    msg = '\n\n\n'
    for news_item in news_data:
        title = news_item['title']
        link = news_item['link']
        msg += title+'\n[<a href="'+link+'">Read the full article --></a>]'
        msg += '\n\n'

    return msg

subreddit = reddit.subreddit('name_of_subreddit')
new_subreddit = subreddit.new(limit=500)
for submission in subreddit.new(limit=1):
    news_data = reddit_scraper(submission)
    if len(news_data) > 0:
        msg = get_msg(news_data)
        status = bot.send_message(chat_id='@your_channel_name', text=msg, parse_mode=telegram.ParseMode.HTML)        
        if status:            
            print(status)
else:
    print('No updates.')