Warm tip: This article is reproduced from serverfault.com, please click

scraping news website aggregator by clicking on more news button using selenium

发布于 2020-11-28 08:08:25

I want to scrape news headlines from this link: https://www.newsnow.co.uk/h/Business+&+Finance?type=ln

I want to expand news by clicking (using selenium) on the button 'view more headlines' to collect the max number of news headlines possible

I created this code but failed to make the click to expand news :

import time
from selenium import webdriver
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'

driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")    
driver.implicitly_wait(60) # seconds

elem = driver.find_element_by_css_selector('span:contains("view more headlines")')
for i in range(10):
    elem.click()
    time.sleep(5)
    print(f'click {i} done')

returns: selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

I tried using xpath selector:

elem = driver.find_element_by_xpath('//[@id="nn_container"]/div[2]/main/div[2]/div/div/div[3]/div/a')

returns: selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class="rs-button-more js-button-more btn--primary btn--primary--no-spacing" href="#">...</a> is not clickable at point (353, 551). Other element would receive the click: <div class="alerts-scroller">...</div>

Questioner
khaled koubaa
Viewed
0
Abhishek Rai 2020-11-28 16:43:06

The click button gets covered by an overlay element after the click. So, we use javascript to get to it after the first click. Here is the working program.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'

driver = webdriver.Chrome(executable_path=r"C:\bin\chromedriver.exe")
driver.maximize_window()
driver.get(u)
time.sleep(10)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
for i in range(10):
        element =WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME,'btn--primary__label')))
        driver.execute_script("arguments[0].scrollIntoView();", element)
        element.click()
        time.sleep(5)

        print(f'click {i} done')