Warm tip: This article is reproduced from serverfault.com, please click

Python Beautifulsoup retrieving json

发布于 2020-12-01 17:03:21

I'm trying to retrieve the 'inStockQty' json key/value pair using beautifulsoup but am having trouble.

Here's my code so far:

import requests
from bs4 import BeautifulSoup

url = "https://direct.asda.com/george/men/shoes/black-leather-lace-up-oxford-shoes/GEM830406,default,pd.html?cgid=D2M1G10C13"
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
    headers = {'User-Agent': user_agent,
                   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html5lib")
    script = soup.select_one('script:contains("window.priceAvailabilityJSON")')

How do I then find 'inStockQty'? I thought about trying to parse all the JSON, but i don't know how to strip out all the HTML crap.

Many Thanks

Questioner
supertom100
Viewed
0
AlexElizard 2020-12-02 11:17:30

Try this:

import json
import requests
from bs4 import BeautifulSoup

url = "https://direct.asda.com/george/men/shoes/black-leather-lace-up-oxford-shoes/GEM830406,default,pd.html?cgid=D2M1G10C13"
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
headers = {'User-Agent': user_agent,
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")

script = soup.find(id='main-content').find('script').string
data = script.split('window.priceAvailabilityJSON = ')[1].split(';\nlet product')[0]

json_data = json.loads(data)

# Output
for product in json_data['productAvailability'].values():
    print(product['availability']['inStockQty'])