Warm tip: This article is reproduced from serverfault.com, please click

Webscraping with Selenium and Python

发布于 2020-11-28 08:20:27

I'm beginner in coding and try to learn webscraping with selenium, I been working on a project to check with a dictionary how long it takes to crack a password with every single word. So my code reads a .txt file that has a word on each line, then writes it to the bar and it would copy how long it would take to crack it. The problem is that I cannot capture a part of the html code of the webpage and I need help.

This is my code

# This program run spanish dictionary and check how secure password there are

import random
import time
from selenium import webdriver

#Paste here Chromedriver path
CHROMEDRIVERPATH = "C:\Program Files (x86)\chromedriver.exe"
#Paste here dictionary path in .txt format
dictionary = readFile("spanish_dictionary.txt")
date = str(time.strftime("%Y-%m-%dT%H-%M-%S"))

#read files
driver = webdriver.Chrome(CHROMEDRIVERPATH)

#webpage target
driver.get("https://www.security.org/how-secure-is-my-password/")
time.sleep(2)

#Label
writeFile("results_" + date + ".txt","word,time \n")
#File Content
for word in dictionary:
    bar = driver.find_element_by_id('password')
    bar.send_keys(word)
    bar.clear()
    timeToCrack = driver.find_element_by_xpath('//*[@id="hsimp"]/div[1]/div[3]/p[2]').get_attribute("class")
    result = word + "," + timeToCrack + "\n"
    writeFile("results_" + date + ".txt",result)
    time.sleep(random.uniform(0.4,1.0))

This is html code of the page

<p class="result__text result__time">2 hundred microseconds</p>

I get this in output file:

word,time 
a,result__text result__time
aba,result__text result__time
abaá,result__text result__time

I want this:

word,time 
a,6 hundred picoseconds
aba,4 hundred nanoseconds
abaá,5 milliseconds
Questioner
lordkoda
Viewed
0
Mick 2020-11-28 16:34:26

You want:

timeToCrack = driver.find_element_by_xpath('//*[@id="hsimp"]/div[1]/div[3]/p[2]').text

The Java equivalent is:

driver.findElement(By.xpath("//*[@id="hsimp"]/div[1]/div[3]/p[2]").getText();