Warm tip: This article is reproduced from serverfault.com, please click

MarkLogic: Timeout for processing document to add properties

发布于 2020-12-02 06:57:12

MarkLogic: 9.8.0

We around 20M of data and now we need to add additional data into document properties.

So we have setup scheduler & below code will be executed

let $Log := xdmp:log("[ Project ][ Scheduler ][ Start ][ ======================== Insert Records Scheduler Start ======================== ]")

for $author in (/author][not(property::root/contributors)])[1 to 500]

let $uri           := $author/base-uri()
let $auth_element  := if ($author/aug)
                      then
                           for $auth in $author/aug/pname
                               let $snm := $auth/snm/text()
                               let $fnm := fn:concat(fn:string-join(for $i in $auth/fnm return $i,' '),'')
                               return
                                     <pname type='author'>{fn:normalize-space(fn:concat($snm,' ',$fnm))}</pname>
                      else if ($author/editg)
                      then
                           for $auth in $author/pname
                               let $snm := $auth/snm/text()
                               let $fnm := fn:concat(fn:string-join(for $i in $auth/fnm return $i,' '),'')
                               return
                                     <pname type='editor'>{fn:normalize-space(fn:concat($snm,' ',$fnm))}</pname>
                      else ()
let $XmlDoc := <root><contributors>{$auth_element}</contributors></root>             
        
return try{
            xdmp:document-add-properties($uri,$XmlDoc),
            xdmp:log("[ InspecDirect ][ Scheduler ][ End ][ ======================== Insert Records Scheduler End ======================== ]")
            }
       catch($e){xdmp:log($e)}

When we change from [1 to 500] to [1 to 10000] we are getting timeout error here. And if we will go with 500 then it will take weeks to finish it.

Can you please let me know if this approach is fine?

Questioner
Manish Joisar
Viewed
0
Michael Gardner 2020-12-02 22:31:00

Corb2 would probably be a better solution. You can take your current XQuery and split it into two pieces. The first piece would gather the URIs that need to be updated.

The second piece takes the URI as input, and processes it accordingly. This allows for very large batches to be processed without timeouts.

Corb2 Wiki

Corb2 Github