温馨提示:本文翻译自stackoverflow.com，查看原文请点击：python - Can't parse a certain information from some html elements using xpath

css-selectors python python-3.x scrapy xpath

python - 无法使用xpath解析某些html元素中的某些信息

发布于 2020-03-27 10:56:27

我创建了一个xpath表达式来定位元素，以便可以在scrapy中使用xpath从某些html元素中提取某些信息。无论如何我都无法达到。

HTML元素：

<div class="rates">
                <label>
                  Rates :
                </label>
                  R 3500
                  <br class="hidden-md hidden-lg">
              </div>

我希望从中提取R 3500出来。

我尝试过：

from scrapy import Selector

html = """
<div class="rates">
                <label>
                  Rates :
                </label>
                  R 3500
                  <br class="hidden-md hidden-lg">
              </div>
"""
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following::*").get()
print(rate)

运行上面的脚本后，这就是我想要的，<br class="hidden-md hidden-lg">而我希望得到R 3500。

.tail如果选择我可以使用lxml。但是，当我抓狂时，找不到任何类似的东西。

如何使用xpath从html元素中提取该比率？

提问者

MITHU

被浏览

194

查看英文版

查看原文

RomanPerekhrest 2019-07-03 22:55

要将文本节点作为节点following-sibling之后的label节点：

...
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following-sibling::text()").get().strip()
print(rate)

输出：

R 3500

另外："//*[@class='rates']/label/following::text()"应该也可以。

https://www.w3.org/TR/1999/REC-xpath-19991116#axes

MITHU 2019-07-03 22:17:46

@RomanPerekhrest很长时间后，我在循环中发现您。完美地运作了。一个可选的问题：您知道我如何使用CSS选择器来达到同样的效果。非常感谢。

Mathias Müller 2019-07-03 22:48:17

您可以通过提及following::*不起作用的原因来改善答案：*仅选择元素节点，而不选择文本节点。

RomanPerekhrest 2019-07-03 22:48:26

@MITHU，欢迎光临。关于您的问题：我们无法在CSS中做到这一点，但是在python库中，我们拥有.next_siblingin Beautifulsoup和.tailinetree

RomanPerekhrest 2019-07-03 22:56:34

@MathiasMüller，请参阅补充内容，它应该与"//*[@class='rates']/label/following::text()"（已测试）

Mathias Müller 2019-07-03 23:04:52

感谢您的编辑！您仍然没有添加说明，只是指出了另一种解决方案。给出原因的答案将更具教育意义。

相关问题

1

如何使用python cut方法创建bin，接受一个参数并返回适当的bin？

2

从具有特定条件的列表列表创建字典

3

根据行值选择列，Python，Pandas

4

在数据框中绘制零和一的计数

5

python函数。

6

在两个DataFrame之间执行大量Pandas查找的最佳方法

7

如何获取Pandas数据透视表中的列数和每列的宽度？

8

在Pandas数据框中分组时缺少所需值时显示一列

9

Python隐藏壁虱但显示壁虱标签

10

获取Entry和checkbutton值Tkinter时出现问题

热门github

1

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application. (翻译：LobeChat 是开源的高性能聊天机器人框架，支持语音合成、多模态、可扩展的（Function Call）插件系统。)

2

Collection of leaked system prompts

3

Jelly Evolution Simulator

4

Master programming by recreating your favorite technologies from scratch. (翻译：在这个项目中，你能学会如何创造自己的各种工具，引擎，游戏，框架，库......)

5

Agent S: an open agentic framework that uses computers like a human

6

An open source payments switch written in Rust to make payments fast, reliable and affordable (翻译：YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite)

7

Python - 100天从新手到大师

8

Truly independent web browser

9

Curated list of project-based tutorials (翻译：收藏了基于项目的教程列表)

10

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ (翻译：12 节课程，开始使用生成式 AI 进行构建)

11

ChatGPT DAN, Jailbreaks prompt

12

A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage

13

real time face swap and one-click video deepfake with only a single image