My structure looks like this:
Person -[:HAS_HOBBY]-> Hobby
I'm generating e.g. 500 person nodes and 20 hobby nodes randomly and would now like to generate random links in between them so that each person has 1 or more hobbies but not every person has the same one.
CALL apoc.periodic.iterate("
match (p:Person),(h:Hobby) with p,h limit 1000
where rand() < 0.1 RETURN p,h ",
"CREATE (p)-[:HAS_HOBBY]->(h)",
{batchSize: 20000, parallel: true})
YIELD batches, total
RETURN *
Without the APOC function the query looks like this:
MATCH(p:Person),(h:Hobby)
WITH p,h
LIMIT 10000
WHERE rand() < 0.1
CREATE (p)-[:HAS_HOBBY]->(h)
This is the query I have tried, the problem is that all the person nodes are all linked to one single hobby node so only 1/20 nodes is being used.
Is there anything missing in my query? Or should I tackle this problem in a different way?
I have also tried different approaches with FOREACH
clauses looping through all the nodes or using SKIP
and LIMIT
through a cartesian product
Thanks a lot!
edit:
Query by InverseFalcon using apoc.periodic.iterate
:
call apoc.periodic.iterate("
// first generate your range of how many hobbies you want a person to have
// for this example, 1 to 5 hobbies
WITH range(1,5) as hobbiesRange
// next get all hobies in a list
MATCH (h:Hobby)
WITH collect(h) as hobbies, hobbiesRange
MATCH (p:Person)
// randomly pick number of hobbies in the range, use that to get a number of random hobbies
WITH p, apoc.coll.randomItems(hobbies, apoc.coll.randomItem(hobbiesRange)) as hobbies
// create relationships
RETURN p,hobbies",
"FOREACH (hobby in hobbies | CREATE (p)-[:HAS_HOBBY]->(hobby))",
{batchSize: 1000, parallel: false});
It would be easier to not use iterate() in this case, but instead use some of APOC's collection helper functions, such as those used to get random items from a collection. Something like this:
// first generate your range of how many hobbies you want a person to have
// for this example, 1 to 5 hobbies
WITH range(1,5) as hobbiesRange
// next get all hobies in a list
MATCH (h:Hobby)
WITH collect(h) as hobbies, hobbiesRange
MATCH (p:Person)
// randomly pick number of hobbies in the range, use that to get a number of random hobbies
WITH p, apoc.coll.randomItems(hobbies, apoc.coll.randomItem(hobbiesRange)) as hobbies
// create relationships
FOREACH (hobby in hobbies | CREATE (p)-[:HAS_HOBBY]->(hobby))
Thank you very much, thank worked very well! I had to use the query with
apoc.periodic.iterate
though as it could not handle a load of 200.000 person nodes with 10 hobbies. I will post your code wrapped usingapoc.periodic.iterate
in my question