neo4j-gremlin clone a node and its edges

Daniel Kuppitz 2018-08-19 12:35:47

A clone step doesn't exist (yet), but it can be solved with a single query.

Let's start with some sample data:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(4).valueMap(true)                                   // the vertex to be cloned
==>[label:person,name:[josh],age:[32],id:4]
gremlin> g.V(4).outE().map(union(identity(), valueMap()).fold()) // all out-edges
==>[e[10][4-created->5],[weight:1.0]]
==>[e[11][4-created->3],[weight:0.4]]
gremlin> g.V(4).inE().map(union(identity(), valueMap()).fold())  // all in-edges
==>[e[8][1-knows->4],[weight:1.0]]

Now the query to clone the vertex might look a bit scary at a first glance, but it's really just the same pattern over and over again - jumping between the original and the clone to copy the properties:

g.V(4).as('source').
  addV().
    property(label, select('source').label()).as('clone').
  sideEffect(                                                // copy vertex properties
    select('source').properties().as('p').
    select('clone').
      property(select('p').key(), select('p').value())).
  sideEffect(                                                // copy out-edges
    select('source').outE().as('e').
    select('clone').
    addE(select('e').label()).as('eclone').
      to(select('e').inV()).
    select('e').properties().as('p').                        // copy out-edge properties
    select('eclone').
      property(select('p').key(), select('p').value())).
  sideEffect(                                                // copy in-edges
    select('source').inE().as('e').
    select('clone').
    addE(select('e').label()).as('eclone').
      from(select('e').outV()).
    select('e').properties().as('p').                        // copy in-edge properties
    select('eclone').
      property(select('p').key(), select('p').value()))

And in action it looks like this:

gremlin> g.V(4).as('source').
......1>   addV().
......2>     property(label, select('source').label()).as('clone').
......3>   sideEffect(
......4>     select('source').properties().as('p').
......5>     select('clone').
......6>       property(select('p').key(), select('p').value())).
......7>   sideEffect(
......8>     select('source').outE().as('e').
......9>     select('clone').
.....10>     addE(select('e').label()).as('eclone').
.....11>       to(select('e').inV()).
.....12>     select('e').properties().as('p').
.....13>     select('eclone').
.....14>       property(select('p').key(), select('p').value())).
.....15>   sideEffect(
.....16>     select('source').inE().as('e').
.....17>     select('clone').
.....18>     addE(select('e').label()).as('eclone').
.....19>       from(select('e').outV()).
.....20>     select('e').properties().as('p').
.....21>     select('eclone').
.....22>       property(select('p').key(), select('p').value()))
==>v[13]
gremlin> g.V(13).valueMap(true)                                   // the cloned vertex
==>[label:person,name:[josh],age:[32],id:13]
gremlin> g.V(13).outE().map(union(identity(), valueMap()).fold()) // all cloned out-edges
==>[e[16][13-created->5],[weight:1.0]]
==>[e[17][13-created->3],[weight:0.4]]
gremlin> g.V(13).inE().map(union(identity(), valueMap()).fold())  // all cloned in-edges
==>[e[18][1-knows->13],[weight:1.0]]

UPDATE

Paging support is a little tricky. Let me split this whole thing into a 3-step process. I will use edge ids as the sort criterion and to identify the last processed edge (this might not work in Neptune, but you can use a unique sortable property instead).

// clone the vertex with its properties
clone = g.V(4).as('source').
  addV().
    property(label, select('source').label()).as('clone').
  sideEffect(
    select('source').properties().as('p').
    select('clone').
      property(select('p').key(), select('p').value())).next()

// clone out-edges
pageSize = 1
lastId = -1
while (true) {
  t = g.V(4).as('source').
    outE().hasId(gt(lastId)).
    order().by(id).limit(pageSize).as('e').
    group('x').
      by(constant('lastId')).
      by(id()).
    V(clone).
    addE(select('e').label()).as('eclone').
      to(select('e').inV()).
    sideEffect(
      select('e').properties().as('p').
      select('eclone').
        property(select('p').key(), select('p').value())).
    count()
  if (t.next() != pageSize)
    break
  lastId = t.getSideEffects().get('x').get('lastId')
}

// clone in-edges
lastId = -1
while (true) {
  t = g.V(4).as('source').
    inE().hasId(gt(lastId)).
    order().by(id).limit(pageSize).as('e').
    group('x').
      by(constant('lastId')).
      by(id()).
    V(clone).
    addE(select('e').label()).as('eclone').
      from(select('e').inV()).
    sideEffect(
      select('e').properties().as('p').
      select('eclone').
        property(select('p').key(), select('p').value())).
    count()
  if (t.next() != pageSize)
    break
  lastId = t.getSideEffects().get('x').get('lastId')
}

I don't know if Neptune allows you to execute full scripts - if not, you'll need to execute the outer while loops in you application's code.

rossb83 2018-08-18 11:15:15

How well would you say this scales and/or what limits would this query have? If there were 1,000,000 outgoing/incoming edges would this still work? Will this work with AWS Neptune API?

Daniel Kuppitz 2018-08-18 17:10:15

I think it ultimately depends on the graph db you're using. However, having that many incident edges is generally seen as a bad graph model.

rossb83 2018-08-19 02:16:36

it seems like AWS Neptune is limiting this query to only 999 edges, is there a way to paginate this query such that I can run it for nodes with larger number of edges?

Daniel Kuppitz 2018-08-19 03:19:43

You'll need a guaranteed sort order and a unique property on your edges, then it might work. I'll try to put something together later tonight, perhaps just extend the previous example using a page size of 1.

Daniel Kuppitz 2018-08-19 04:30:46

Incident edges refer to both - incoming and outgoing edges. For the paging you need a unique (sortable) identifier/property on the edges, not on the vertices. I'll post an update in a few minutes.

gremlin clone a node and its edges

热门帖子

热门github