Warm tip: This article is reproduced from stackoverflow.com, please click
css-selectors python python-3.x scrapy xpath

Can't parse a certain information from some html elements using xpath

发布于 2020-03-27 10:21:01

I've created an xpath expression to target an element so that I can extract a certain information out of some html elements using xpath within scrapy. I can't reach it anyway.

Html elements:

<div class="rates">
                <label>
                  Rates :
                </label>
                  R 3500
                  <br class="hidden-md hidden-lg">
              </div>

I wish to extract R 3500 out of it.

I've tried with:

from scrapy import Selector

html = """
<div class="rates">
                <label>
                  Rates :
                </label>
                  R 3500
                  <br class="hidden-md hidden-lg">
              </div>
"""
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following::*").get()
print(rate)

Upon running my above script this is what I'm getting <br class="hidden-md hidden-lg"> whereas I wish to get R 3500.

I could have used .tail if opted for lxml. However, when I go for scrapy I don't find anything similar.

How can I extract that rate out of the html elements using xpath?

Questioner
MITHU
Viewed
123
RomanPerekhrest 2019-07-03 22:55

To get a text node as a following-sibling after the label node:

...
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following-sibling::text()").get().strip()
print(rate)

The output:

R 3500

Addition: "//*[@class='rates']/label/following::text()" should also work.

https://www.w3.org/TR/1999/REC-xpath-19991116#axes