Warm tip: This article is reproduced from serverfault.com, please click

Scrape data table using xpath in R

发布于 2020-11-28 02:31:06

I am fairly familiar with R, but have 0 experience with web scraping. I had looked around and cannot seem to figure out why my web scraping is "failing." Here is my code including the URL I want to scrape (the ngs-data-table to be specific):

library(rvest)
webpage <- read_html("https://nextgenstats.nfl.com/stats/rushing/2020/REG/1#yards")
tbls <- html_nodes(webpage, xpath = '/html/body/div[2]/div[3]/main/div/div/div[3]')
#also attempted using this Xpath '//*[@id="stats-rushing-view"]/div[3]' but neither worked
tbls

I am not receiving any errors with the code but I am receiving:

{xml_nodeset (0)}

I know this isn't a ton of code, I have tried multiple different xpaths as well. I know I will eventually need more code to be more specific for web-scraping, but I figured even the code above would at least begin to point me in the right direction? Any help would be appreciated. Thank you!

Questioner
Koala
Viewed
0
Dave2e 2020-11-28 12:52:38

The data is stored as JSON. Here is a method to download and process that file.

library(httr)

#URL for week 6 data
url <- "https://nextgenstats.nfl.com/api/statboard/rushing?season=2020&seasonType=REG&week=6"

#create a user agent 
ua <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"

#download the information
content <-httr::GET(url, verbose() , user_agent(ua), add_headers(Referer =  "https://nextgenstats.nfl.com/stats/rushing/2020/REG/1"))
answer <-jsonlite::fromJSON(content(content, as = "text") ,flatten = TRUE)
answer$stats