Warm tip: This article is reproduced from stackoverflow.com, please click

python scrapy xpath

Using XPath in strings

发布于 2020-03-29 12:47:36

Let's say we have the following response from a browser:

<div>
  <tr id="1"></tr>
  <tr id="2">
  <!--
    <div class="A">AAA</div>
    <div class="C">BBB</div>
    <div class="C">CCC</div>
  -->
  </tr>
</div>

Getting the comment string using xpath in scrapy should be something like:

response.xpath(//tr[@id="2"]/comment())

So my question - is there any easy way to extract the values of <div class="C"> tags inside the comment? One way would be remove the comment tags in the string , and use lxml.htmllibrary to transform the result into an HTML again and use xpath in it, but I'm pretty sure it should be an easier way...

I'd appreciate any help. Cheers!

Questioner

willp93

Viewed

97

Chinese

Original

Mathias Müller 2020-01-30 03:35

Parsing the content of the comment with lxml.html is a good solution in my opinion.

Python Code

from lxml import etree
from io import StringIO

parser = etree.HTMLParser()

html_text = """<div>
  <tr id="1"></tr>
  <tr id="2">
  <!--
    <div class="A">AAA</div>
    <div class="C">BBB</div>
    <div class="C">CCC</div>
  -->
  </tr>
</div>"""

tree = etree.parse(StringIO(html_text), parser)

comment = tree.xpath("//tr[@id='2']/comment()")

comment_text = str(comment[0])

# string needs an outermost element in order to be parseable

comment_text = comment_text.replace("<!--", "<html>").replace("-->", "</html>")

embedded_tree = etree.parse(StringIO(comment_text), parser)

embedded_tree.xpath("//div[@class='C']/text()")

Output

['BBB', 'CCC']

willp93 2020-01-30 04:06:28

Thanks Mathias! It's been very useful

Related issues

1

How to use python cut method to create bins, accept one parameter and return appropriate bin?

2

Create a dictionary from a list of lists with certain criteria

3

selecting columns based on row value, Python, Pandas

4

plotting count of zeros and ones in a dataframe

5

BeautifulSoup find.all() web scraping returns empty

6

python function. output a keys list from a dictionary if the key is todays date

7

Best way to perform multiple amount of Pandas lookups between two DataFrames

8

How to get the number of columns and the width of each column in a Pandas pivot table?

9

Display a column when a desired value is missing while grouping in Pandas dataframe

10

Python hide ticks but show tick labels