Importing Netflow (ish) issues

Hello,

I'm having some issues building proper relationships with neo4j and I assume this is due to my poor loading. I basically want to import something that is almost like netflow:

user, src_ip,dst_ip,protocol,packets,bytes,start_time,end_time,action,log-status
tom,10.0.1.243,10.0.1.185,88,64304,6,4,378,1763709753,1763709811,ACCEPT,OK
tom,10.0.1.185,10.0.1.243,64309,88,6,4,478,1763709753,1763709811,ACCEPT,OK

I created my nodes as follows:

MERGE(a: attribute {ip:{IP}})"
ON CREATE SET a = {ip:{IP}, user_id:{USER_ID}}"

And the relationship:

MATCH (a: attribute), (b: attribute) 
WHERE a.ip = {SRC_IP} AND b.ip = {DST_IP}
MERGE (a)-[rel: flow {start_time:{START_TIME},end_time:{END_TIME}}]->(b)
ON CREATE SET rel = { 
                             proto:{PROTO}, 
                             src_port:{SRC_PORT},
                             dst_port:{DST_PORT}, 
                             packets:{PACKETS}, 
                             bytes:{BYTES}, 
                             action:{ACTION}}
RETURN rel

Then I us py2neo with:

graph.schema.create_uniqueness_constraint("attribute", "ip")

Couple of problems it loads but very slow and fails after 200k relationships (there should be around 3M). With what is loaded I see the relationship type "flow" which is what I added but I can't query it in meaningful way (e.g. I can't seem to do ip.src_port any to ip.dst_port =~ "443" or find the volume of traffic between nodes etc...).

Could someone point me in the right direction ?

Thanks !

You should create your unique constraint prior to loading the data. The MATCH otherwise gets slower and slower.
Also consider transaction sizes - adding 200k rels in one tx might be too large.

Hi @stefan.armbruster, I do not do 200k in one commits I do batch tx of 5000.
I just can't believe no one loaded netflow in neo4j I can't seem to find anything meaningful around netflow and neo4j.