I'm attempting to pull website links from a google search but I'm having trouble returning any value, I think the issue is with the attributes I'm using to call the web link but I'm not sure why as I was able to use the same attributes in webdriver to accomplish the result.
Here's the code:
import requests
import sys
import webbrowser
import bs4
from parsel import Selector
import xlsxwriter
from openpyxl import load_workbook
import pandas as pd
print('Searching...')
res = requests.get('https://google.com/search?q="retail software" AND "mission"')
soup = bs4.BeautifulSoup(res.content, 'html.parser')
for x in soup.find_all('div', class_='yuRUbf'):
anchors = g.find_all('a')
if anchors:
link = anchors[0]['href']
print(link)
This is the ouput: Searching...
That's it. Thanks in advance for the help!
The class value is dynamic, you should use the following selector to retrieve the href
value
"a[href*='/url']"
This will get any a
tag that contains the pattern /url
.
So, just change your for loop
to
for anchor_tags in soup.select("a[href*='/url']"):
print(anchor_tags.attrs.get('href'))
Example of href printed
/url?q=https://www.devontec.com/mission-vision-and-values/&sa=U&ved=2ahUKEwjShcWU-aPtAhWEH7cAHYfQAlEQFjAAegQICBAB&usg=AOvVaw14ziJ-ipXIkPWH3oMXCig1
To get the link, you only need to split the string. You can do it like this
for anchor_tags in soup.select("a[href*='/url']"):
link = anchor_tags.attrs.get('href').split('q=')[-1] # split by 'q=' and gets the last position
print(link)