My name is Ariel and I work at WishTrip, a company located in Jerusalem, Israel. I am interested in applying graph technology in the calculation of the shortest navigation route on a geographical map.
At the moment I am using Python and the OSMNx library to convert OpenStreetMap data to a weighted and searchable (R-tree) graph structure and to perform the geospatial and shortest path search. However, as the size of the graphs grows, loading of the graphs into memory and subsequent search for the shortest path, takes too much time. I hope that having the graphs ready in Neo4j may help in reducing the calculation time considerably. Moreover, I have the feeling that Neo4j could be very convenient for updating or editing the graphs in the future.
I would love to get in touch with people who work in the same field and could help me to get started with Neo4j + Python + Geospatial!
All the best,
Hello @ariel.degraaf and welcome!
Shortest path with OpenStreetMaps has a great tradition with Neo4j. Though it is in java, you should take a look at Neo4j Spatial, which is a full featured package for spatial operations. There's also a companion project for importing OSM data.
Those could be used for setting up the database. You could then turn to Python as a client calling
Cypher to run shortest path or other algorithms.
Let us know how it goes!
Thanks, Andreas! I will have a look at the links that you have posted.
Neo4j Spatial library has a lot of spatial functions, and includes importing of smaller OpenStreetMap models, but does not make a graph that is very suitable for routing. The other
OSM library Andreas mentioned can import much larger OSM data, and includes some procedures for massaging the final graph into a state much more suitable for route calculations. The actual routing would be done using shortest path algorithms from various sources: neo4j include a non-weighted shortest path in Cypher, APOC includes access to dikstra and A-star, and GDS includes more heavy duty versions of these too.
For an introduction to using the 'OSM' library for routing, take a peek at the presentation Will Lyon and I did at GraphConnect 2018: GraphConnect 2018: Videos - Neo4j Graph Data Platform
A direct link to the video is Going Spatial: Leverage Geospatial Data with Neo4j 3.4 - YouTube
Thank you so much for joining the conversation! The GraphConnect 2018 video is the reason I got interested in Neo4j in the first place, so I am delighted to see your response here. I also learned a lot from your other videos (All About Neo4j Spatial (Neo4j Online Meetup #44), Building Spatial Search Algorithms for Neo4j).
I have a couple of questions:
Does the OSM importer only support the OSM file format? I have several hundreds of graphs stored in graphml format (downloaded from OSM and converted with help of Python's OSMNx library). Would it be possible to import them into Neo4j in a format that is suitable for routing? I could use call apoc.import.graphml to import the graphml, but would probably need to perform some additional operations to get the graph in the "OSM importer" format, isn't it?
Does the OSM importer also perform (a part of) the post-processing as described in slides 27-33 of the aforementioned presentation?
Is it possible to preserve the Linestring geometry of curved roads (i.e. relationship connecting the nodes) in the graph?
Since Neo4j is supporting the management of multiple databases within the same DBMS, I was wondering how to organize the import of all the graphs or the post-processing in order to be able to access a certain graph instantly (e.g. when the user requests navigation instructions in either one of the regions with its own unique id). Does someone have experience with this? How many graphs can I insert into the same DBMS?
For calculating the shortest route between two arbitrary geolocations, we will first need to find the nearest nodes on the graph and then use the identified origin and destination nodes to perform a shortest path search, taking into account the distances between the path nodes. I was thinking of combining the "find in radius" function (for finding the nearest nodes) with the Dijkstra or A-star algorithm. Would that be a good and most time-efficient approach?
Thanks in advance!
I have been playing around with
I'm glad you found my presentations useful. I'll try answer each of your questions in turn.
The OSM importer can only import OSM format. It is built on the same internal API's as the
neo4j-admin import CSV importer, which does a high performance offline import. Any format could be supported by
neo4j-admin import by pre-processing into a set of CSV files, which can be quite tricky, hence my decision to write the OSM specific offline importer. The OSM model that results is not suitable for routing, which is why I wrote the procedures you mentioned in the next question. If you were to import graphml format, you would end up with a different model, which would also probably need post-processing to be suitable for routing. I have not looked at OSMNx and do not know the data model they export to graphml, so I do not know how similar that post processing would be to what we wrote for the OSM model we built.
The OSM importer library includes the procedures for post processing, but that is not done as part of the actual import. The sequence of tasks you need to perform is a) run offline import b) start neo4j on the new database created c) in the running database perform the post-processing functions to create the routing graph. It is possible to perform online import, but that capability is currently only in the separate
Neo4j Spatial library. However, I have not tested the OSM libraries route creation functions on an online import from the spatial library. I have an interest in bringing these two libraries closer together to give you more choices in how to get data imported for routing.
All original geometries are retained in the graph by the OSM importer as understood by OSM. The routing procedures create an additional graph on top of the OSM model, and do not modify the OSM model itself. However, accessing these original geometries in a convenient way, or GIS friendly way, is not well supported by the OSM library. The separate
Neo4j Spatial library provides a similar import, but also an adapter layer that allows you to express the OSM model as geometries (Points, LineString, Polygon, MultiLineString, MultiPolygon). As mentioned above, I'm interested in bringing the libraries closer so that high-performance imports from OSM are available to the GIS-friendly functions of
Neo4j Spatial, and low-performance (online) imports from spatial are available to the routing generation functions in the OSM library.
While it is possible to import many databases into the same Neo4j DBMS, this is only possible in Neo4j Enterprise. If you import OSM using online import (not offline import), then you can import all your different regions into the same database. Each would be a separate sub-graph - not connected in any way to the other imports. Querying these all within the same database within the same query would be faster than using
fabric to query across multiple databases. I can think of two reasons why you might want multiple databases: a) you want to use offline import which will always create the database from scratch, b) you want to be sure that the graphs are completely separate and cannot intermingle in any way. Is that what you want? In any case, I believe performance should not be a factor in your choice.
Yes, a distance search using Neo4j's built in spatial index and distance function would be an efficient way of finding starting points. What we did in the demo's was use this to find the closest edge on a road, and then we actually saved that knowledge into the graph so we did not need to run that on every query, but I think you will probably find the performance adequate even if you do not save the results.
Thank you very much for your extensive answers! Because I am quite new to Neo4j, I do not understand all the terms that you are using, so please forgive me for asking probably stupid questions:
What do you exactly mean by "online import"?
I have attached examples of the graphml format obtained with Python's OSMNx package, and the Geopandas CSV files that you can produce from the graph with that same package. I would love to hear your opinion about the usability of these files for routing purposes and the post- (or pre-)processing that I would need to do.
Where can I find the OSM libraries route creation functions? I discovered that many path-finding functions are in the GDS library, which is not yet available in Neo4j Aura, the product that we are considering to use. So for the moment I am stuck with what APOC has to offer.
Great to hear that you are planning to combine the Neo4j Spatial library and OSM route creation functions into a flexible interface for routing applications. When do you think such an interface could be ready for use?
Is Neo4j Enterprise the same as Neo4j Aura?
The idea for putting all my graphs in one DBMS is to be able to switch quickly between different regions, for instance if someone wants navigation instructions in New York and the other moment in Chicago, or Miami. Do you think that there is a chance of data intermingling in such a setup? If so, I could perhaps add a label with the site's unique id to each node and relation to avoid this from happening.
How did you find the closest edge on a road? If you have fixed POIs you can do that, but in my case the origin and destination are arbitrary locations (I only know in which region/graph I have to do the search), so I am wondering if this is doable.
(Attachment nodes.csv is missing)
(Attachment papago.graphml is missing)
(Attachment edges.csv is missing)
I’ve been following the thread because I find this a very interesting topic.
As I‘m new to Neo4j spatial I’ll just answer your general questions and leave the rest to Craig.
- With offline import you load data into a newly created, empty database - this is a lot quicker than online import, where the database is already running.
- Neo4j Aura is a fully managed cloud-hosted version of Neo4j, whereas Neo4j Enterprise you have to host and manage yourself.
Thanks for the answers @dkm1006. I'll add my take:
online/offline: As answered by @dkm1006.
GraphML: I would need to spend some time evaluating this data, as I am not familiar with it. I cannot give an immediate answer.
If you are planning to use Aura, you will come up against the problem that other libraries than APOC are not available (at all), so you will not be able to run the OSM import or the route creating functions. One option is to run these all on a local database on your computer, then make a dump with
neo4j-admin dump, and use 'push-to-cloud' to upload that to Aura. The OSM offline importer and the route creation procedures are all in the osm library at github.com/neo4j-contrib-osm. The Dijktra and A-Star routing procedures are available in APOC.
I am still in the planning phase. Hard to put a time on this. I'm hoping that we are talking about months, but we don't know for sure because this is not my main activity, but a side project.
Neo4j Community and Neo4j Enterprise are two editions of Neo4j you can run yourself. Typically you download it and run it on your local computer. However, for real production use, you want a proper managed system available online. In this case, one option is Aura, which is a managed service that provides Neo4j Enterprise (not community). Aura offers the database in various configurations (and price points). Since Aura uses Neo4j Enterprise, you could say it is the same, but there are some differences. The security model with users/roles is more like Neo4j Community, and you don't have the ability to install any plugins.
If you intermingle data between sub-graphs in the same database, but did not intend to, then you have made a mistake. This is nothing in neo4j preventing this from happening, so it is up to your application code (or Cypher queries) to not intermingle data. If you don't want to trust your queries, then your idea of using multiple databases will be safer, because you cannot create relationships between nodes in different databases. But then you have a problem in that Neo4j Aura does not support multiple databases.
I did discuss this in the Graph Connect 2018 presentation, which you could review. In summary, I did a search for points nearby, traversed to the owning way to make sure the points were all actually streets, and then picked the closest street. Then I decided if there was a point on the street such that a line from that point to the point of interest would be approximately perpendicular to the street, and if not, I created one. Then create the relations to make this routable.
Hope this helps.
I had a quick glance at the nodes.csv, edges.csv and papago.graphmv
- nodes and edges - these seem to be an extract of the OSM data with most tags removed, but focusing on some specific nodes and edges. The convenient thing with this data is you could use
neo4j-admin import with some appropriate Cypher to create a graph. It is not clear how easy it would be to route on this graph without trying it out to see the final state. In any case it is only a filtered subset of OSM data, so there is a risk something was lost.
- graphml looks like an alternative XML format (my code uses OSM XML), so it would need a custom importer to import it. It looks like the same filtered subset that we saw in the nodes/edges files, but cannot be directly imported by
One advantage of these files compared to raw OSM is that they duplicate information in that the geometries do not need to be deduced from the graph, but are actually pre-calculated and stored in the data. This can be useful if you want to pass these geometries to other GIS functions. I do not, however, see this as useful for routing, which would be done on the graph.
Thank you dkm1006!
After learning that the Aura edition is missing quite some essential functionality for my needs, I am going to have a look at the Neo4j Enterprise edition.
Thank you so much, Craig!
Since I definitely want to use multiple databases/graphs in my application, I now understand that I should resort to Neo4j Enterprise.
I have used these files for routing on a graph in Python (with help of the OSMNx and Networkx libraries). The nodes represent the intersections, and the edges connecting roads.The geometries stored on the edges are useful if you want to nicely display the route on the map, following the curvature of the roads. I have also used it to find the nearest point on the road to a POI.
Your idea of storing the nearest node and connecting route in the graph is something that I will certainly try to implement in my own application.
The graphml file can be imported into Neo4j with call apoc.import.graphml