I would like to solve some challenges I have on Customers De Dup.
I thought about Graph DB as a candidate for the challenge.
Found this reference but unfortunately, the full article is missing.
Philip Howard, Research Director – Data Management, Bloor Research, explores matching and de-duplication in a graph database, topics discussed at The Data Warehousing Institute TDWI London Symposium. If Bill Clinton and William Clinton (this was the...
Does anyone have any information /Ideas about this issue?
I think you are looking for information on a topic of research within Natural Language Processing
check out the Neo4j chapter on NLP
and the sub section on APOC NLP
These procedures support
entity extraction, key phrase extraction, sentiment analysis, and document classification.
Sorry, No. I'm not looking for NLP. The Article i'm looking for is about de-duplication challenge. The link I added to an article dealing about it but I can't find the extension
Hi Tal, I'll rephrase, de-duplicating names,
is an NLP challenge, and there are NLP solutions built to solve it. Best of luck! Regards, Joel
On this article the issue is to build data structure that support matching AFTER the process:
meaning to deiced if combination of matches are bring to the same person: for example , day of birth, gender, address together are found the same - meaning it is the same person.
I'm looking for the article discuss that since when i open full article found 404 ...
Neo4j onineTraining describes what a graph database is, how to install Neo4j, how to query graphs in Neo4j with a query language, Cypher, and how to add and manipulate data. All these topics are well covered in the training curriculum to help learners get better insight.
If your situation is similar to the scenario mentioned in your referenced article, then this is same as in money laundering schemes.
In these scenarios, you can build a similarity relationships between the two customer names. For this use, Jaro-Winkler similarity.
To explain this in simple terms:
Consider two simple words: 'coronavirus' and 'cornivorus'. They both are of same length and contain same alphabets but rearranged differently in cornivorus.
Now we know the invisible connection between these two words!
Here comes the similarity:
with "coronavirus" as norm1, "cornivorus" as norm2
return toInteger(apoc.text.jaroWinklerDistance(norm1, norm2) * 100) as similarity
You can setup your own similarity limit to build a similarity relationship and this should help you to address de-duplication scenarios.