I have a question about Cypher requests and the update of a database. I have a python script that does web scraping and generate a csv at the end. I use this csv to import data in a neo4j database.
The scraping is done 5 times a day. So every time a new scraping is done the csv is updated, new data is added to the the previous csv and so on. I import the data after each scraping. Actually when I import the data after each scraping to update the DB, I have all the nodes created again even if it is already in the DB.
For example the first csv gives 5 rows and I insert this in Neo4j. Next the new scraping gives 2 rows so the csv has now 7 rows. And if I insert the data I will have the first five rows twice in the DB. I would like to have everything unique and not added if it is already in the database.
For example when I try to create node ARTICLE I do this:
CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})
I think MERGE instead of CREATE should solve the solution, but it doesn't and I can't figure it out why.
How can I do this ?
A MERGE clause will create its entire pattern if any part of it does not already exist. So, for a MERGE
clause to work reasonably, the pattern used with it must only specify the minimum data necessary to uniquely identify a node (or a relationship).
For instance, assuming ARTICLE
nodes are supposed to have unique id
properties, then you should replace your CREATE
clause:
CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})
with something like this:
MERGE (a:ARTICLE {id:$id})
SET a += {title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published}
In the above example, the SET
clause will always overwrite the non-id
properties. If you want to set those properties only when the node is created, you can use ON CREATE before the SET
clause.
Thank you, I saw that I have NaN values in some columns of the article dataframe. I replaced them by 'no content' and MERGE seems to work now. I think it was it but I don't understand why it is a problem?
I also do NER on articles and when I do NER on new scraping same NER nodes are created. I don't understand why
You have not shown enough information or code (e.g., how are you generating the parameters to pass to the Cypher code, and what your actual CYpher code now looks like, etc.). Also, what does "NER" mean? And what are the actual error messages you are getting? Etc.