Warm tip: This article is reproduced from serverfault.com, please click

Crawling Google Play Store Apps

发布于 2018-09-03 14:22:35

I want to crawl the google play store and get all the app ids of a particular category. When I executed the below code I just got the app ids of first 49 apps not more than that. But I want to get all the app ids. How can I achieve this? And the URL that I used was https://play.google.com/store/search?q=sports&c=apps&hl=en for scrapping.

import urllib.request, urllib.error, urllib.parse
from bs4 import BeautifulSoup

url=input('Enter:')
html=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,'html.parser')

tags=soup('a')
l=list()
for tag in tags:
    x=tag.get('href',None)
    if x.find("/store/apps/details?id=")!=-1:
       if not(x[23:] in l):
            l.append(x[23:])
print(l)
Questioner
Darshil
Viewed
1
Jakub Balada 2018-09-04 20:01:06

On dynamic sites like this, it's better to use internal XHRs to get data instead of parsing html. There is a POST request for every 48 apps shown there, which you can call from your script. In this blog post is an example of how to get app reviews from Google play store this way.