java Jsoup选择器：h2之后的第2个div

Krystian G 2020-02-01 08:49

I have two ideas how to achieve this.
The first one is to remove every <p> and then you will only have to select "h2:contains(" + text + ")+div+div". Be careful and use it only when you're sure your <div> doesn't contain any <p>. Otherwise it will lack some content.

    public void execute1(String html) {
        Document doc = Jsoup.parse(html);
        // first approach: remove every <p> to simplify document
        Elements paragraphs = doc.select("p");
        for (Element paragraph : paragraphs) {
            paragraph.remove();
        }
        // then one selector will return what you want in both cases
        System.out.println(selectSecondDivAfterH2WithText(doc, "Blah 1"));
        System.out.println(selectSecondDivAfterH2WithText(doc, "Blah 2"));
    }

    private Element selectSecondDivAfterH2WithText(Document doc, String text) {
        return doc.select("h2:contains(" + text + ")+div+div").first();
    }

第二种方法是遍历兄弟姐妹，"h2:contains(" + text+ ")"然后“手动”找到第二种<div>忽略其他事物的方法。最好这样做，因为它不会破坏原始文档，并且会跳过任何数量的<p>元素。

    public void execute2(String html) {
        Document doc = Jsoup.parse(html);
        System.out.println(selectSecondDivAfterH2WithText2(doc, "Blah 1"));
        System.out.println(selectSecondDivAfterH2WithText2(doc, "Blah 2"));
    }

    private Element selectSecondDivAfterH2WithText2(Document doc, String text) {
        int counter = 2;
        // find h2 with given text
        Element h2 = doc.select("h2:contains(" + text + ")").first();
        // select every sibling after this h2 element
        Elements siblings = h2.nextElementSiblings();
        // loop over them
        for (Element sibling : siblings) {
            // skip everything that's not a div
            if (sibling.tagName().equals("div")) {
                // count how many divs left to skip
                counter--;
                if (counter == 0) {
                    // return when found nth div
                    return sibling;
                }
            }
        }
        return null;
    }

我还有第三个想法要使用"h2:contains(" + text + ")~div:nth-of-type(2)"。它适用于第一种情况，但不适用于第二种情况，可能是因为<p>div之间有一个。

Bruno C 2020-02-06 16:30:28

Hy Kristian，我想避免使用Java，但是最后没有编码就无法做到：)

java - Jsoup选择器：h2之后的第2个div

相关问题

热门github