Iterate over nodes and fetch rdf data using a specific property on each node

I used the neosemantics plugin to import all instances of cities located in Italy. I also imported all instances of points of interest located in Italy. Now, I want to add a relationship to show which points of interest are in which cities.

To do this, I'm trying to iterate through all the City nodes and for each node, use its uri to find all the points of interest in that city. This is the query I tested in https://query.wikidata.org/ using the uri for Florence (wd:Q2044)

SELECT DISTINCT ?attraction ?attractionLabel ?streetAddress ?country

WHERE {
    ?attraction (wdt:P31/wdt:P279*) wd:Q570116;
        rdfs:label ?attractionLabel.
    ?attraction wdt:P131 wd:Q2044 .
    FILTER(LANG(?attractionLabel) = "en")
}

I'm having trouble converting this to Cypher in neo4J Desktop. I tried converting ?attraction wdt:P131 wd:Q2044 to filter (?attraction wdt:P131 c.uri), but it's returning 0 triples. Has anyone who's done this before give me some pointers?

MATCH (c:City)
WITH ' PREFIX sch: <http://schema.org/> 
CONSTRUCT{ ?item a sch:City;
			   sch:attraction ?attraction.
		        ?attraction a sch:Attraction;
           	           sch:city ?item. } 
WHERE { ?attraction (wdt:P31/wdt:P279*) wd:Q960648 .
        filter (?attraction wdt:P131 c.uri)
        ?attraction rdfs:label ?attractionName . 
          filter(lang(?attractionName) = "en") 
 } ' AS sparql CALL n10s.rdf.import.fetch(
  "https://query.wikidata.org/sparql?query=" +  
      apoc.text.urlencode(sparql),"JSON-LD", 
    { headerParams: { Accept: "application/ld+json"} ,   
      handleVocabUris: "IGNORE"})
YIELD terminationStatus, triplesLoaded
RETURN terminationStatus, triplesLoaded

Hi,

I have not tried it yet but I will test with:

MATCH (c:City)
WITH ' PREFIX sch: <http://schema.org/> 
CONSTRUCT{ ?item a sch:City;
			   sch:attraction ?attraction.
		        ?attraction a sch:Attraction;
           	           sch:city ?item. } 
WHERE { ?attraction (wdt:P31/wdt:P279*) wd:Q960648 .
        filter (?attraction wdt:P131 c.uri)
        ?attraction rdfs:label ?attractionName . 
          filter(lang(?attractionName) = "en") 
 } ' AS sparql
CALL n10s.rdf.import.fetch("https://query.wikidata.org/sparql","JSON-LD", {
  handleVocabUris: "IGNORE",
  headerParams: { Accept: "application/ld+json"},
  payload: "query=" + apoc.text.urlencode(sparql),   
    handleVocabUris: "IGNORE"
})
yield triplesLoaded
return triplesLoaded

It should produce the same result but this is the way documented in thte wiki.

BR. Paul

You're right, it unfortunately produces the same result with 0 triples fetched. I looked at this Medium article and updated my query as follows:

MATCH (c:City)
WITH ' PREFIX sch: <http://schema.org/> 
CONSTRUCT{ ?item a sch:City;
			   sch:attraction ?attraction.
		        ?attraction a sch:Attraction;
           	           sch:city ?item. } 
WHERE { 
		?attraction (wdt:P31/wdt:P279*) wd:Q960648 .
        ?attraction wdt:P131 <' + c.uri + '> .
        ?attraction rdfs:label ?attractionName . 
          filter(lang(?attractionName) = "en") 
 } ' AS sparql CALL n10s.rdf.import.fetch(
  "https://query.wikidata.org/sparql?query=" +  
      apoc.text.urlencode(sparql),"JSON-LD", 
    { headerParams: { Accept: "application/ld+json"} ,   
      handleVocabUris: "IGNORE"})
YIELD terminationStatus, triplesLoaded
RETURN terminationStatus, triplesLoaded

This returned a terminationStatus of OK for each city entity and showed that triples were loaded, but it didn't have the effect I wanted. It seems to have overwritten some of the resources already in the db. For example, the Attraction nodes previously had the following properties: id, wikipedia image link (optional), wiki article, geo location, name, and uri. Now some of the nodes only have the id and uri. In addition, the attraction nodes had relationships with another resource Type (i.e. museum, monument, garden) but now that relationship is gone, and I still don't see a relationship between city and attraction.

I found interesting information about the Wikidata SPARQL endpoint and linked data fragments endpoint here: Wikidata Query Service/User Manual - MediaWiki

To verify the endpoint usage I put your SPARQL query in Postman and successfully got the result:

I tried to formats, "xml" and "json". The xml output was a valid RDF file. I don't know if the json output is JSON-LD conform. Could be tested.

Then I tried the same with the n10s.rdf.import.fetch stored procedure and got no results back.

The next step was to use the example describe in the neosemantics wiki: https://neo4j.com/docs/labs/nsmntx/current/import/#advancedfetching

This example is working.

I think either is a configuration problem with the Wikidata API and formats or a bug.

Thanks for the help! I managed to import the cities by altering the cypher query in the section titled " Dynamically extend the Knowledge Graph" from Dr. Barassa's blog post here. Adding it here in case it's useful to anyone else in the future:

MATCH (attraction:PointOfInterest) 
WITH 'PREFIX sch: <http://schema.org/> 
CONSTRUCT { ?city a sch:City ; 
sch:containsPlace ?attraction ; 
rdfs:label ?cityName  } 
WHERE { ?attraction wdt:P131 ?city . 
filter(?attraction = <' + attraction.uri +'>) 
?city rdfs:label 
?cityName . 
filter (lang(?cityName) = "en") }' AS city_sparql, 
attraction CALL n10s.rdf.import.fetch("https://query.wikidata.org/sparql?query=" + 
apoc.text.urlencode(city_sparql),"JSON-LD", 
{ headerParams: { Accept: "application/ld+json"} , 
handleVocabUris: "IGNORE"}) 
YIELD terminationStatus, triplesLoaded 
RETURN attraction.name, terminationStatus, triplesLoaded
1 Like

Hello! Coming back to this post as I'm still having issues using neosemantics to get all the info I need. The above cypher command worked to get the cities for each attraction in my graph and create a node for each city with a "containsPlace" relationship to the attractions. But I want my City node to also have a link to the city's article and image as properties, and I haven't been able to get that to work.

I tested out this call with the wikidata query service using the uri for a specific attraction, and it worked exactly the way I want: https://tinyurl.com/y993tlef. So I should be able to just replace everything between WITH and AS above with this call, right? But when I run it:

MATCH (attraction:PointOfInterest) 
WITH 'PREFIX sch: <http://schema.org/> 
CONSTRUCT { ?city a sch:City ; 
sch:containsPointOfInterest ?attraction ; 
sch:name ?cityName ; 
wdt:article ?articleAsStr ;
wdt:image ?imageAsStr . } 
WHERE { ?attraction wdt:P131 ?city ; 
filter(?attraction = <' + attraction.uri +'>) 
?city rdfs:label ?cityName ;
?city wdt:P31 wd:Q515 ;
filter (lang(?cityName) = "en") 
  OPTIONAL { ?article sch:about ?pointOfInterest . 
             ?article sch:isPartOf <https://en.wikivoyage.org/> .
             bind(str(?article) as ?articleAsStr)
           }
  OPTIONAL { ?pointOfInterest wdt:P18 ?image .
            bind(str(?image) as ?imageAsStr)
           }
}' AS city_sparql, attraction 
CALL n10s.rdf.import.fetch("https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(city_sparql),"JSON-LD", 
{ headerParams: { Accept: "application/ld+json"} , 
handleVocabUris: "IGNORE"}) 
YIELD terminationStatus, triplesLoaded 
RETURN attraction.name, terminationStatus, triplesLoaded

It returns no triples :thinking_face:. I also observed that my first call to get just the cities and their names works only intermittently. If I delete those nodes/relationships and try to run that call again, it returns 0 triples even though it worked before. I'm wondering if there's some sort of cache behind the scenes making my db think that those nodes exist even though I've deleted them? Has anyone run into similar issues before?

If it matters, I'm using neo4j version 4.1.5 and neosemantics version 4.1.0.1

Hello,

The SPARQL query in the wikidata link is quite different from the one defined in the cypher fragment.

The one in the cypher fragment has a syntax error. After ?cityName you need a dot instead of a semicolon.

?city rdfs:label ?cityName .
?city wdt:P31 wd:Q515 ;

And in addition to that, in the optional patterns of the query you're using ?pointOfInterest while the first part of the query uses ?attraction. This is sintactically ok but probably not what you want to do because it actually returns all articles and images in wikidata given that ?pointOfInterest is not bound. That's a lot of articles and images, probably millions of them :scream:, and obviously makes the query time out.

Not sure what's the exact error you're getting but it will have to do with these two problems.

I would suggest using the n10s.rdf.stream.fetch method [link to manual] as a SPARQL debugger to make sure your sparql query is returning what you expect before running the import.

Thank you so much for pointing this out! I completely overlooked that I was using ?pointOfInterest instead of ?attraction. Fixing that plus the semicolon did the trick :smile: .

1 Like