I'm currently modeling the OMOP EMR to graph database and have loaded ~400 patients dataset into Neo4j. When I tried to apply graph algorithms, I found my graph very separated having limited sharing nodes. So I am wondering if anyone could share they experience/opinion on what is a better graph model for applying graph algorithms? I'm posting two schemas here for comparison-
a) every row in the EMR tables are turned into a node (all the properties are attached to the node)
b) every unique instance are turned into a node and use relationship properties to distinguish occurrence records (dates and ids on edges)
b)partially schema (as I haven't finished changing the rest)
Kind of same issue I faced when do Transaction by customers.
Here is my 2 cents.
As we knew..we have to create sub graph first (CALL gds.graph.create(...)..then apply GDS .when I try direct didn't get any at my first time
Just try to check do you have min 3 ..5 hops degree btw patients MATCH gp=(p1:Patient )-[*3..5]-(p2:Patient) RETURN nodes(gp) LIMIT 100
If you get results ...you can use that query to create a sub graph and then you can try relevant algo.
May be from your data model ..except gender and race not sure you have enough common nodes for connect for graph traversal. I cud see Drug exposure node but not sure have you included 'Drug ' or 'Initial symptoms or diagnosis ' in your model ..
Also you may have to have more data since sometimes initial data load may not cover all possible common or influence node.. example 5 people 5 txn 10 different products all will be separate 10 path _no connection btw people ..here we need more samples
Hope it helps.
Thanks a lot for sharing your experience!
- I agree trying cypher projection rather than native projection at the beginning is a better choice. It gives us more flexibility even though it takes longer to generate. Native projection are more suitable for production time after reading the documents neo4j provided
- I tried the degree check and I got no results not surprisingly
- No results
- Totally agree - and that is the reason I want to adjust the schema a) I'm using. Just a bit background about OMOP common data model - observation and condition_occurrence usually represent symptoms and diagnosis. but again my model has the same problem that each patient's symptom occurrence is independent
- I assume you mean the scale of the dataset - agree that having more data can give us enough shared nodes ( I think in our case ~400 patients in the same ICU should be sufficient) but for large company you might need more txn data to build the network.
Thanks again for insights!