I've created an xpath expression to target an element so that I can extract a certain information out of some html elements using xpath within scrapy. I can't reach it anyway.
Html elements:
<div class="rates">
<label>
Rates :
</label>
R 3500
<br class="hidden-md hidden-lg">
</div>
I wish to extract R 3500
out of it.
I've tried with:
from scrapy import Selector
html = """
<div class="rates">
<label>
Rates :
</label>
R 3500
<br class="hidden-md hidden-lg">
</div>
"""
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following::*").get()
print(rate)
Upon running my above script this is what I'm getting <br class="hidden-md hidden-lg">
whereas I wish to get R 3500
.
I could have used .tail
if opted for lxml
. However, when I go for scrapy I don't find anything similar.
How can I extract that rate out of the html elements using xpath?
To get a text node as a following-sibling
after the label
node:
...
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following-sibling::text()").get().strip()
print(rate)
The output:
R 3500
Addition: "//*[@class='rates']/label/following::text()"
should also work.
I found you in the loop after a long time @RomanPerekhrest. It perfectly worked. An optional question: do you know how i can achieve the same using css selector. Thankssssss a lot.
You could improve your answer by mentioning the reason why
following::*
does not work:*
only selects element nodes, not text nodes.@MITHU, welcome. as for your question: we can't do that magic in css, but in python libraries we have
.next_sibling
inBeautifulsoup
and.tail
inetree
@MathiasMüller, see the addition, it should work with
"//*[@class='rates']/label/following::text()"
(tested)Thanks for the edit! You still did not add an explanation, just point out another solution. An answer that gives the reason would be more educational.