Warm tip: This article is reproduced from serverfault.com, please click

其他-当我从Python中的Google搜索查询中提取链接时,我无法返回HTML链接

(其他 - I'm having trouble returning an HTML link as I pull links from a google search query in Python)

发布于 2020-11-27 21:47:38

我正在尝试从Google搜索中拉取网站链接,但我无法返回任何值,我认为问题出在我用来调用网络链接的属性上,但是我不确定为什么能在webdriver中使用相同的属性来完成结果。

这是代码:

import requests
import sys
import webbrowser
import bs4
from parsel import Selector
import xlsxwriter
from openpyxl import load_workbook
import pandas as pd

print('Searching...')
res = requests.get('https://google.com/search?q="retail software" AND "mission"')
soup = bs4.BeautifulSoup(res.content, 'html.parser')

for x in soup.find_all('div', class_='yuRUbf'):
    anchors = g.find_all('a')
    if anchors:
        link = anchors[0]['href']
        print(link)

这是输出:正在搜索...

就是这样。先谢谢你的帮助!

Questioner
Arsh Chopra
Viewed
0
Eduardo Coltri 2020-11-28 08:28:20

该类的值是动态的,你应该使用以下选择器来检索该href

"a[href*='/url']"

这将获取a包含模式的任何标签/url

因此,只需将你更改for loop

for anchor_tags in soup.select("a[href*='/url']"):
    print(anchor_tags.attrs.get('href'))

href打印的示例

/url?q=https://www.devontec.com/mission-vision-and-values/&sa=U&ved=2ahUKEwjShcWU-aPtAhWEH7cAHYfQAlEQFjAAegQICBAB&usg=AOvVaw14ziJ-ipXIkPWH3oMXCig1

要获取链接,你只需要拆分字符串。你可以这样

for anchor_tags in soup.select("a[href*='/url']"):
    link = anchor_tags.attrs.get('href').split('q=')[-1] # split by 'q=' and gets the last position
    print(link)