Neo4J Load CSV Function for the following database schema

Hi,

My name is Milind and I am doing my PhD at the moment at Bond University. I am trying to develop a model for identifying illicit shell companies in UK. For this purpose, I have a data of over 200 private limited UK incorporated entities, the information for which has been extracted from the British corporate registry. In order for my model development, I would like to use the power of graph models to extract more information about the data, that is, relation between companies, their addresses and company officers which could be incorporated further for model development.
However, I have been unsuccessful in uploading the data on Neo4J and run queries to extract any suitable information. I am really keen on using the power of graphical database for my research. The database schema that I could come up to examine relations among entities, their addresses and officers is as follows:
Database Schema

  1. Node – Entity
    1.1 Label – Company Name
    1.1.1 Properties – Company Number, Case, SIC Code, Company Status, Date of Incorporation, Date of Dissolution, Previous Names, Tenure of Previous Names, Number of Previous Addresses, Number of Previous Names, Total Number of Executives, Phoenix Activity, Availability of Ultimate Ownership Information, Number of Beneficial Owners,

  2. Node – Addresses
    2.1 Label – Company Address
    2.1.1. Properties – Registered Company Address, Change in Registered Address, Previous Addresses, Tenure of Previous Addresses, Corresponding Address of Executives

  3. Node – Executives
    3.1 Label – Company Executives
    3.1.1 Properties – Name of Executives, Natural Person, Nationality of Executives, Residence of Executives, Date of Appointment of Executives, Date of Resignation of Executives, Date of Birth of Executives, Number of Appointments of Executives, Name of Beneficial Owners, Nationality of Beneficial Owners
    Constraints:
    • As many as 32 executives for some entities
    • As many as 6 previous addresses for some entities
    • A lot of cells do not have any information in them.

I look forward to the much needed help as I have been struggling with Neo4J data import function which is essential for me to run queries to examine if any useful information could be obtained through graph analysis.

Hi,
Please post your csv file with sample data and this will be helpful in finding the problems.

1 Like

Can you share what you have tried so far? Can you draw a picture of your model?

Hi Michael,

I have been unable to upload the file on Neo4J. However, the queries that I seek to look for is
-The common addresses used by different entities identified in corruption schemes

  • The previous addresses of entities which are current addresses of entities in the data
    -The directors among entities which are common
  • The common beneficial owners of entities
    -Graphically examine if entities in particular schemes exhibit a particular patter.

If you would be interested, I can share with you t eh sample CSV as I am unable to upload it on the forum. Only images can be uploaded on it. I can discuss with you in detail about my project and seek your advice as to how can I tap Neo4J's power to maximize the impact of my research. Looking forward to hearing from you.

Can you share the actual queries? And the query plans produced by explain?
You can upload your file and your statements into a secret github Gist and link them here.

Thanks for letting me know. I shall do that then and paste the link here.

Sorry for getting back to you so late. This is the gist for my data of over 200 private limited UK companies identified in wrongdoing. My research work involves developing a model towards identification of such illicit shell companies. For this purpose, I intend to use Neo4J platform to extract information for the purpose of developing variables that could be used for model development.
In this dataset, I am looking for examining the relationship between:

  1. A Company belonging to a particular case and its executives with their nationalities.
  2. Companies linked by addresses (both current and previous)
  3. Companies linked by executives (both current and previous)
  4. Relationship between company and ultimate owners and their nationality
    5.Relationship between Cases, Companies and Types of Companies used (SIC code)
    And so forth.

MilFile CSV https://gist.github.com/milind92/6239d6f75f1a45f0a9332eaf290a3c0f

I have this database schema in my mind:

Node: Entity {Properties: Company Number, Company Name, Case, SIC Code, Company Status, Incorporation Date, Dissolution Date, Changes in registered address, Number of previous addresses, Previous Name (Yes or No), Number of previous names, Previous Names, Tenure of previous names, Total Number of Execs, Availability of PSC Info, Number of Beneficial Owners}.

Node: Address {Properties: Current Registered Address, Previous Addresses, Tenure of Previous Addresses, Corresponding Address of Executives}.

Node: Person {Properties: Name of Executives, Natural Person(Yes or No), Nationality of Executive, Residency of Executive, Date of Birth of Executive, Date of Appointment of Executive, Date of Resignation, Number of Appointments, Name of Beneficial Owner, Nationality of Beneficial Owner}.

Relationships: [Is Currently registered at, Was previously registered at, is linked to (executive), Is the Ultimate Owner].

I have been trying to improve my understanding of the data schema for use in Neo4J. I don't know how successful I have been in doing so. However, I did come across one such example where they have used a similar if not exact schema

. It was the example of Paradise Papers. Maybe it could be of some help. In our case, however, we do use previous addresses and some of the properties associated with executives as well.....

I had this in my mind. I am not sure how correct I am. I need all help to explore the dataset and make some progress. Looking forward to hearing from you all.

I'm no expert and I don't know what kind of queries you intend to do, but your example shows a lot of properties in the nodes. Might it be a better graph database if some of those properties were nodes? For example, your "Nationality" property could be a ":Nationality" relationship to a ":Country" node.... same for most of the other properties.

And BTW, do you actually have any CSV import issues? It seems you are still doing database design.

1 Like

Yes, I am no expert either. Your advice does make sense. In fact, I am in the initial stages to import the data. With so much out there, I really don't know how to proceed. Keen on using Neo4J for analysis purpose as it seems to be something new and could give me an edge in my research.

Not being from data science background make things a tad bit difficult to implement.

What do you suggest I should keep the database schema like? Your insight would be valuable as I agree there are lot of properties in nodes...