Warm tip: This article is reproduced from stackoverflow.com, please click
marklogic marklogic-9

cts:element-query vs cts:path-range-query performance

发布于 2020-04-13 10:10:23

We are developing an enterprise application which store huge amount of data. In our application we forced the user to create multiple Path Range Indexes to make search faster.

Earlier we were taking advantage of Path Range Indexes to make search faster using cts:path-range-query() but now I found that same result I can get using cts:element-query() without creating Path Range Indexes.

For Example -

  1. Using cts:path-range-query() -> Here I need to create Path Range Index for /tXML/Message/INVENTORY/ASNId

    xquery version "1.0-ml"; cts:uris('', (), cts:and-query((cts:collection-query("integration"), cts:path-range-query("/tXML/Message/INVENTORY/ASNId", "=", "10121600"))))

  2. Using cts:element-query() -> Here I dont need Create Path Range Index.

    xquery version "1.0-ml"; cts:uris('', (), cts:and-query((cts:collection-query("integration"), cts:element-query(xs:QName("tXML"),cts:element-query(xs:QName("Message"), cts:element-query(xs:QName("INVENTORY"), cts:element-value-query(xs:QName("ASNId"), "10121600")))))))

My questions are,

  1. If I am getting the same result as cts:path-range-query() using cts:element-query() then why do I need to force the user to create Path Range Indexes ?.

  2. which query is suitable for huge set of data ?.(cts:element-query() or cts:path-range-query())

Please help me to find answers of these two questions.

Questioner
Shivling Bhandare
Viewed
65
grtjn 2020-02-03 19:23

The answer is not entirely straight-forward, meaning that results might vary depending on data, and volume.

A couple of notes though:

  • Your queries are not semantically the same. Element-queries check for ancestors, while in a path you can be more strict, and require specific parents, so direct parent-child rather than ancestor-descendant relations
  • Range queries are resolved against range indexes with predefined collations, and always against entire ('exact') value. Value queries are resolved against universal index however. More specifically, against the index with unstemmed tokens. If your value consists of multiple tokens, it would require positions to be enabled, or filtered searches for accuracy. The value in your example consists of only one token though.
  • Path range indexes come at a cost at ingest time, slightly bigger than element range indexes. Range indexes also take extra memory. Element queries, and element value queries take slightly more work to resolve at search time. Though, you might need a big test set to notice significant differences.
  • Last but not least, you can't do inequality queries, or value lookups for facets and such without range indexes.

HTH!