marklogic remove duplicate node/element

发布于 2020-04-23 17:36:06

I have several thousand documents that have duplicate element nodes. How can I find and remove duplicate title elements in my XML files?

I use fn:distict-values() cause performance issues.

for example: 01.xml

<doc>
     <pdf>1</pdf>
     <title>Head First JavaScript</title>
     <title>Head First JavaScript</title>
</doc>

02.xml

<doc>
    <pdf>0</pdf>
    <title>Python: Programming Basics for Absolute Beginners </title>
    <title>Python: Programming Basics for Absolute Beginners </title>
</doc>

result: 01.xml

<doc>
     <pdf>1</pdf>
     <title>Head First JavaScript</title>

</doc>

02.xml

<doc>
    <pdf>0</pdf>
    <title>Python: Programming Basics for Absolute Beginners </title>

</doc>

Questioner

thichxai

Viewed

Chinese

Original

let $doc := <doc> <title>Head First JavaScript</title> <title>Head First JavaScript</title> <title>hellao</title> <title>hello</title> <title>hello</title> <title>Python: Programming Basics for Absolute Beginners </title> <title>ahello</title> <title>Python: Programming Basics for Absolute Beginners </title> </doc> for $data in $doc//title[not(. = preceding-sibling::node())] return $data

marklogic remove duplicate node/element

Related issues