Does gremlin provide the ability to clone a vertex for instance
v1->v2, v1->v3, v1->v4
how can I simply and efficiently create a new vertex v5
that also has edges that point to v2, v3, v4
(the same places that v1's
edges point to) without have to explicitly set them and instead saying something like g.createV(v1).clone(v2)
.
Note that I am using the AWS Neptune version of gremlin, solution must be compatible with that.
A clone
step doesn't exist (yet), but it can be solved with a single query.
Let's start with some sample data:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(4).valueMap(true) // the vertex to be cloned
==>[label:person,name:[josh],age:[32],id:4]
gremlin> g.V(4).outE().map(union(identity(), valueMap()).fold()) // all out-edges
==>[e[10][4-created->5],[weight:1.0]]
==>[e[11][4-created->3],[weight:0.4]]
gremlin> g.V(4).inE().map(union(identity(), valueMap()).fold()) // all in-edges
==>[e[8][1-knows->4],[weight:1.0]]
Now the query to clone the vertex might look a bit scary at a first glance, but it's really just the same pattern over and over again - jumping between the original and the clone to copy the properties:
g.V(4).as('source').
addV().
property(label, select('source').label()).as('clone').
sideEffect( // copy vertex properties
select('source').properties().as('p').
select('clone').
property(select('p').key(), select('p').value())).
sideEffect( // copy out-edges
select('source').outE().as('e').
select('clone').
addE(select('e').label()).as('eclone').
to(select('e').inV()).
select('e').properties().as('p'). // copy out-edge properties
select('eclone').
property(select('p').key(), select('p').value())).
sideEffect( // copy in-edges
select('source').inE().as('e').
select('clone').
addE(select('e').label()).as('eclone').
from(select('e').outV()).
select('e').properties().as('p'). // copy in-edge properties
select('eclone').
property(select('p').key(), select('p').value()))
And in action it looks like this:
gremlin> g.V(4).as('source').
......1> addV().
......2> property(label, select('source').label()).as('clone').
......3> sideEffect(
......4> select('source').properties().as('p').
......5> select('clone').
......6> property(select('p').key(), select('p').value())).
......7> sideEffect(
......8> select('source').outE().as('e').
......9> select('clone').
.....10> addE(select('e').label()).as('eclone').
.....11> to(select('e').inV()).
.....12> select('e').properties().as('p').
.....13> select('eclone').
.....14> property(select('p').key(), select('p').value())).
.....15> sideEffect(
.....16> select('source').inE().as('e').
.....17> select('clone').
.....18> addE(select('e').label()).as('eclone').
.....19> from(select('e').outV()).
.....20> select('e').properties().as('p').
.....21> select('eclone').
.....22> property(select('p').key(), select('p').value()))
==>v[13]
gremlin> g.V(13).valueMap(true) // the cloned vertex
==>[label:person,name:[josh],age:[32],id:13]
gremlin> g.V(13).outE().map(union(identity(), valueMap()).fold()) // all cloned out-edges
==>[e[16][13-created->5],[weight:1.0]]
==>[e[17][13-created->3],[weight:0.4]]
gremlin> g.V(13).inE().map(union(identity(), valueMap()).fold()) // all cloned in-edges
==>[e[18][1-knows->13],[weight:1.0]]
UPDATE
Paging support is a little tricky. Let me split this whole thing into a 3-step process. I will use edge ids as the sort criterion and to identify the last processed edge (this might not work in Neptune, but you can use a unique sortable property instead).
// clone the vertex with its properties
clone = g.V(4).as('source').
addV().
property(label, select('source').label()).as('clone').
sideEffect(
select('source').properties().as('p').
select('clone').
property(select('p').key(), select('p').value())).next()
// clone out-edges
pageSize = 1
lastId = -1
while (true) {
t = g.V(4).as('source').
outE().hasId(gt(lastId)).
order().by(id).limit(pageSize).as('e').
group('x').
by(constant('lastId')).
by(id()).
V(clone).
addE(select('e').label()).as('eclone').
to(select('e').inV()).
sideEffect(
select('e').properties().as('p').
select('eclone').
property(select('p').key(), select('p').value())).
count()
if (t.next() != pageSize)
break
lastId = t.getSideEffects().get('x').get('lastId')
}
// clone in-edges
lastId = -1
while (true) {
t = g.V(4).as('source').
inE().hasId(gt(lastId)).
order().by(id).limit(pageSize).as('e').
group('x').
by(constant('lastId')).
by(id()).
V(clone).
addE(select('e').label()).as('eclone').
from(select('e').inV()).
sideEffect(
select('e').properties().as('p').
select('eclone').
property(select('p').key(), select('p').value())).
count()
if (t.next() != pageSize)
break
lastId = t.getSideEffects().get('x').get('lastId')
}
I don't know if Neptune allows you to execute full scripts - if not, you'll need to execute the outer while loops in you application's code.
How well would you say this scales and/or what limits would this query have? If there were 1,000,000 outgoing/incoming edges would this still work? Will this work with AWS Neptune API?
I think it ultimately depends on the graph db you're using. However, having that many incident edges is generally seen as a bad graph model.
it seems like AWS Neptune is limiting this query to only 999 edges, is there a way to paginate this query such that I can run it for nodes with larger number of edges?
You'll need a guaranteed sort order and a unique property on your edges, then it might work. I'll try to put something together later tonight, perhaps just extend the previous example using a page size of 1.
Incident edges refer to both - incoming and outgoing edges. For the paging you need a unique (sortable) identifier/property on the edges, not on the vertices. I'll post an update in a few minutes.