Warm tip: This article is reproduced from serverfault.com, please click

I'm having trouble returning an HTML link as I pull links from a google search query in Python

发布于 2020-11-27 21:47:38

I'm attempting to pull website links from a google search but I'm having trouble returning any value, I think the issue is with the attributes I'm using to call the web link but I'm not sure why as I was able to use the same attributes in webdriver to accomplish the result.

Here's the code:

import requests
import sys
import webbrowser
import bs4
from parsel import Selector
import xlsxwriter
from openpyxl import load_workbook
import pandas as pd

print('Searching...')
res = requests.get('https://google.com/search?q="retail software" AND "mission"')
soup = bs4.BeautifulSoup(res.content, 'html.parser')

for x in soup.find_all('div', class_='yuRUbf'):
    anchors = g.find_all('a')
    if anchors:
        link = anchors[0]['href']
        print(link)

This is the ouput: Searching...

That's it. Thanks in advance for the help!

Questioner
Arsh Chopra
Viewed
0
Eduardo Coltri 2020-11-28 08:28:20

The class value is dynamic, you should use the following selector to retrieve the href value

"a[href*='/url']"

This will get any a tag that contains the pattern /url.

So, just change your for loop to

for anchor_tags in soup.select("a[href*='/url']"):
    print(anchor_tags.attrs.get('href'))

Example of href printed

/url?q=https://www.devontec.com/mission-vision-and-values/&sa=U&ved=2ahUKEwjShcWU-aPtAhWEH7cAHYfQAlEQFjAAegQICBAB&usg=AOvVaw14ziJ-ipXIkPWH3oMXCig1

To get the link, you only need to split the string. You can do it like this

for anchor_tags in soup.select("a[href*='/url']"):
    link = anchor_tags.attrs.get('href').split('q=')[-1] # split by 'q=' and gets the last position
    print(link)