Warm tip: This article is reproduced from stackoverflow.com, please click
marklogic

marklogic remove duplicate node/element

发布于 2020-04-23 17:36:06

I have several thousand documents that have duplicate element nodes. How can I find and remove duplicate title elements in my XML files?

I use fn:distict-values() cause performance issues.

for example: 01.xml

<doc>
     <pdf>1</pdf>
     <title>Head First JavaScript</title>
     <title>Head First JavaScript</title>
</doc>

02.xml

<doc>
    <pdf>0</pdf>
    <title>Python: Programming Basics for Absolute Beginners </title>
    <title>Python: Programming Basics for Absolute Beginners </title>
</doc>

result: 01.xml

<doc>
     <pdf>1</pdf>
     <title>Head First JavaScript</title>

</doc>

02.xml

<doc>
    <pdf>0</pdf>
    <title>Python: Programming Basics for Absolute Beginners </title>

</doc>
Questioner
thichxai
Viewed
82
Sudeep Rawat 2020-02-13 16:43

Hi Please test attached code

    let $doc :=
<doc>
    <title>Head First JavaScript</title>
     <title>Head First JavaScript</title>
     <title>hellao</title>
     <title>hello</title>
     <title>hello</title>
     <title>Python: Programming Basics for Absolute Beginners </title>
     <title>ahello</title>
     <title>Python: Programming Basics for Absolute Beginners </title>
</doc>

for $data in $doc//title[not(. = preceding-sibling::node())]
return $data