I want to add persons as vertices in a graph which works with the following code:
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.traversal import Column
persons = [{"id":1,"name":"bob","age":25}, {"id":2,"name":"joe","age":25,"occupation":"lawyer"}]
g.inject(persons).unfold().as_('entity').\
addV('entity').as_('v').\
sideEffect(__.select('entity').unfold().as_('kv').select('v').\
property(__.select('kv').by(Column.keys),
__.select('kv').by(Column.values)
)
).iterate()
Question 1: What if one of the properties is a List or dict. Example:
persons = [{"id":1,"name":"bob","age":25, "house":{"a":1,"b":4}}, {"id":2,"name":"joe","age":25,"occupation":"lawyer","house":{"a":1,"b":4}}]
How do I ignore that 1 property (house) but still add the rest to the person vertex? Then take house and create another vertex (add properties a and b) with edge to person?
Question 2: What if I want to modify an attribute before I add it as a property to the graph? For example: Convert id into string and then add it as property
I could be wrong, but I sense that your question will end up being more complex than you've posted it. With that in mind, I will offer an answer that works given the assumption that each house is unique which I've made more clear with a "hid" (house id) that I've added to the data.
gremlin> persons = [["pid":1,"name":"bob","age":25, "house":["hid":10,"a":1,"b":4]],
......1> ["pid":2,"name":"joe","age":25,"occupation":"lawyer","house":["hid":20,"a":1,"b":4]]]
==>[pid:1,name:bob,age:25,house:[hid:10,a:1,b:4]]
==>[pid:2,name:joe,age:25,occupation:lawyer,house:[hid:20,a:1,b:4]]
gremlin> g.inject(persons).unfold().as('entity').
......1> addV('entity').as('v').
......2> sideEffect(select('entity').unfold().as('kv').select('v').
......3> choose(select('kv').by(keys).is('house'),
......4> addV('house').as('h').
......5> addE('owns').from('v').
......6> select('kv').by(values).unfold().as('hkv').select('h').
......7> property(select('hkv').by(keys),
......8> select('hkv').by(values)),
......9> property(select('kv').by(keys),
.....10> select('kv').by(values))))
==>v[0]
==>v[9]
gremlin> g.V().elementMap()
==>[id:0,label:entity,name:bob,pid:1,age:25]
==>[id:4,label:house,a:1,hid:10,b:4]
==>[id:9,label:entity,occupation:lawyer,name:joe,pid:2,age:25]
==>[id:14,label:house,a:1,hid:20,b:4]
gremlin> g.E().elementMap()
==>[id:5,label:owns,IN:[id:4,label:house],OUT:[id:0,label:entity]]
==>[id:15,label:owns,IN:[id:14,label:house],OUT:[id:9,label:entity]]
I've not really done anything new here, in that sense that I've largely just embedded the traversal pattern you were already using within itself. Note that at line 6 I'm just re-doing what was done on line 2 in the sideEffect()
.
Now, if my assumption was wrong about having unique houses in your data, then things get more complicated because you can't easily inline upsert traversal patterns in this context. Upserts typically involve a fold/coalesce/unfold pattern that immediately conflicts with this "insert only" pattern that you are using as you can't backtrack in a traversal (i.e. refer to a previous step) that is behind a reducing barrier (i.e. fold). I think I would try to restructure the source data in this case to make it more amenable for pure inserts rather than upsert operations.
This does help me to some extent. But yes, we do not have any unique ID's for each house. Also, we have multiple attributes with type list/dict. Is there anyway we can ignore those attributes? Eg: select('v').choose(dict/list attributes, ignore, add other properties)
sure..i did
is('house')
but you could supply any predicate you want there really.is(within('house','car'))
for example if you want to give multiple properties, but have a look atchoose()
step - it has a lot of flexibility for case/switch style statements: tinkerpop.apache.org/docs/current/reference/#choose-stepThis is perfect. Thank you. Lastly, is there any way, instead of providing names of the attributes, just provide data type of attribute? Because the source can have different attributes of that type. select('v').choose(attributes of type dict/list, ignore, add other properties)
no - at this time Gremlin doesn't have methods for handling the type of the data. hopefully in the future though as it is a common request.
Ok thanks. Quick question, when adding properties in the above code:
.property(select('kv').by(Column.keys),select('kv').by(Column.values))
how do we take care of null properties?