I'm trying to retrieve the 'inStockQty' json key/value pair using beautifulsoup but am having trouble.
Here's my code so far:
import requests
from bs4 import BeautifulSoup
url = "https://direct.asda.com/george/men/shoes/black-leather-lace-up-oxford-shoes/GEM830406,default,pd.html?cgid=D2M1G10C13"
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
headers = {'User-Agent': user_agent,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")
script = soup.select_one('script:contains("window.priceAvailabilityJSON")')
How do I then find 'inStockQty'? I thought about trying to parse all the JSON, but i don't know how to strip out all the HTML crap.
Many Thanks
Try this:
import json
import requests
from bs4 import BeautifulSoup
url = "https://direct.asda.com/george/men/shoes/black-leather-lace-up-oxford-shoes/GEM830406,default,pd.html?cgid=D2M1G10C13"
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
headers = {'User-Agent': user_agent,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")
script = soup.find(id='main-content').find('script').string
data = script.split('window.priceAvailabilityJSON = ')[1].split(';\nlet product')[0]
json_data = json.loads(data)
# Output
for product in json_data['productAvailability'].values():
print(product['availability']['inStockQty'])
getting this: File "C:\Users\sandhu\AppData\Local\Programs\Python\Python39\lib\json_init_.py", line 346, in loads return _default_decoder.decode(s) File "C:\Users\sandhu\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Users\sandhu\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Unterminated string starting at: line 319 column 21 (char 7713)
I am on Linux and I am unable to reproduce your problem. Сan you provide, variable data output?
pastebin.com/e4QxhtZi that is the dump of 'data' variable. is it because, if you look at the end, for some reason it is missing end brackets?
if i switch to anaconda (using spyder instead of pycharm) same issue
I'm updating answer. Try this. I will wait for your result