I have several thousand documents that have duplicate element nodes. How can I find and remove duplicate title
elements in my XML files?
I use fn:distict-values()
cause performance issues.
for example: 01.xml
<doc>
<pdf>1</pdf>
<title>Head First JavaScript</title>
<title>Head First JavaScript</title>
</doc>
02.xml
<doc>
<pdf>0</pdf>
<title>Python: Programming Basics for Absolute Beginners </title>
<title>Python: Programming Basics for Absolute Beginners </title>
</doc>
result: 01.xml
<doc>
<pdf>1</pdf>
<title>Head First JavaScript</title>
</doc>
02.xml
<doc>
<pdf>0</pdf>
<title>Python: Programming Basics for Absolute Beginners </title>
</doc>
Hi Please test attached code
let $doc :=
<doc>
<title>Head First JavaScript</title>
<title>Head First JavaScript</title>
<title>hellao</title>
<title>hello</title>
<title>hello</title>
<title>Python: Programming Basics for Absolute Beginners </title>
<title>ahello</title>
<title>Python: Programming Basics for Absolute Beginners </title>
</doc>
for $data in $doc//title[not(. = preceding-sibling::node())]
return $data
Sudeep, thank you so much. I will use corb eliminate duplicate nodes.