Integrating Neo4j with MongoDB and ElasticSearch

Hi,

I'm working on a open source project named Researchkernel.org which provides recommendations and personalised search to the user. I'm using Neo4j for my recommendation systems and [Neo4j+Elasticsearch] for search engine.

  • For integrating mongo and Neo4j I used the Neo4j Doc Manager as provided on [Neo4j Developer Guide](https://neo4j.com/developer/mongodb/)

  • For integration of ElasticSearch and Neo4j I'm using GraphAware tool.

My data flow

MONGODB => NEO4J => ELASTICSEARCH
User profile / other data => Recommendation System => Search Engine

My Problem:

Neo4j Doc Manager sync all my mongo data to Neo4j. However, it also send the mongo _id field in Neo4j. As the changes of the Neo4j is being pushed to elastic search, I'm getting below error.

019-12-08 13:01:17.939+0000 WARN  Failed to execute an action against Elasticsearch. Details: {"root_cause":[{"type":"mapper_parsing_exception","reason":"Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."}],"type":"mapper_parsing_exception","reason":"Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."}

This is due to the _id field present in the data, I'm not able to push the data in elastic search.

My Choices:

  • Tweak Neo4j Doc Manager source code for my use case

  • Use APOC for mongo Integration

My Views:

I'm not sure what to do at this point. That's why I'm here asking you all :stuck_out_tongue: .

It will be great if someone can show me the easier path for solving this.

Thanks.

Out of your two choices you thought of, I'd go with APOC option.

Might I also suggest another option. Introduce another technology to be a message hub. This is issue I've been seeing more as micro-services are coming into popularity, how do you share data across microservices? This is were technologies like kafka, rabbitMQ, etc... fill a role. One service publishes data to the message bus, then subscribers ingest the message, do any transformation they need to for the receiving service to use the data how they need to. This does introduce more technologies to your stack but it will allow you to decouple your services and do any data transformations you need to which is the problem you're running into.

Then just because this is a Neo4j forum, I have to ask, why even use MongoDB and ElasticSearch? Why not keep it all in one database? Neo4j has the Lucene index just like ElasticSearch for the fuzzy matching and you can combine Cypher for a truly rich search experience. And as for MongoDB, Neo4j is technically a NoSQL and you can store anything you would in MongoDB in Neo4j. You can implement clusters and horizontally scale out the system.

So I would just ask yourself, why all three technologies? If you only used Neo4j and none of the others, you would solve your data transformation issue and greatly simplify your stack.

1 Like

Hi Mike,

Thanks for the suggestion, I came to know about Kafka integration with Neo4j after posting here. It seems viable option.

Regarding you question on why use mongoDB and Elastic Search. I totally agree with your statement that we can use Neo4j as main database. However, we started our initial development on mongodb and elasticsearch and I was not aware of Neo4j at that moment, writing everything again just after finishing is a challenging task. That's the reason I'm looking into DB integrations, it is easier that way.