Skip to content
This repository was archived by the owner on Sep 18, 2024. It is now read-only.

senzing-garage/connector-neo4j

Repository files navigation

connector-neo4j

⛔ Deprecated

No Maintenance Intended

If you are beginning your journey with Senzing, please start with Senzing Quick Start guides.

You are in the Senzing Garage where projects are "tinkered" on. Although this GitHub repository may help you understand an approach to using Senzing, it's not considered to be "production ready" and is not considered to be part of the Senzing product. Heck, it may not even be appropriate for your application of Senzing!

Overview

The Neo4j connector is an application, written in Java, which gathers information from Senzing and maps it into Neo4j graph database. The connector reads messages containing Senzing information from a message queue (RabbitMQ or AWS SQS), derives from that data what entities in the Senzing repository are affected, gets the entity data, using the Senzing API, finds how the entities relate to other entities and inserts that data into a Neo4j database. Note that this connector does not load source records into the Neo4j database. It loads the Senzing entity information and each entity can be constructed from multiple source records. If the source record data is desired, and how it relates to the Senzing entities, it will need to be loaded into the database prior to loading the Senzing entities. In that case the records need to contain DATA_SOURCE and RECORD_ID fields, matching those used in Senzing repository for linking the Senzing entities back to source system records.

The messages read from the message queue are in json format and an example looks like this:

{"DATA_SOURCE":"TEST","RECORD_ID":"RECORD3","AFFECTED_ENTITIES":[{"ENTITY_ID":1,"LENS_CODE":"DEFAULT"}]}

This project gives the framework for mapping Senzing data to Neo4j database but can be modified to fit the user's specific solutions.

Contents

  1. Demonstrate using Command Line
    1. Dependencies
    2. Building
    3. Preparation for running
    4. Running
  2. Demonstrate using Docker
    1. Expectations for docker
    2. Configuration
    3. Develop
    4. Run docker container

Legend

  1. 🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps you'll need to make some choices. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Demonstrate using Command Line

Dependencies

To build the Neo4j Connector you will need Apache Maven (recommend version 3.6.1 or later) as well as OpenJDK version 11.0.x (recommend version 11.0.6+10 or later).

This application interacts with Senzing API so it needs to be installed beforehand. Information on how to install it can be found here: Senzing API quick start

  1. Setup your environment. The Connector relies on native libraries and the environment must be properly setup to find those libraries:

    1. Linux

      export SENZING_G2_DIR=/opt/senzing/g2
      
      export LD_LIBRARY_PATH=${SENZING_G2_DIR}/lib:${SENZING_G2_DIR}/lib/debian:$LD_LIBRARY_PATH
    2. Windows

      set SENZING_G2_DIR="C:\Program Files\Senzing\g2"
      
      set Path=%SENZING_G2_DIR%\lib;%Path%

Building

To build connector-neo4j:

git clone [email protected]:Senzing/connector-neo4j.git
cd connector-neo4j
mvn install

The JAR file will be contained in the target directory under the name neo4j-connector-[version].jar.

Where [version] is the version number from the pom.xml file.

In addition target/libs will contain all the depending jar files needed by the application and target/conf/neo4jconnector.properties holds the configuration needed by the application and it will require modifications to match the installation of g2 and other applications the Connector depends on. SEE BELOW.

Preparation for running

The Connector requires installations of Senzing API (see above), RabbitMQ and Neo4j for its operation.

Note: if docker containers are used it is best to use a docker network to facilitate communication between the containers. An example for setting up a network:

sudo docker network create -d bridge ncn

This network "ncn" will be used when dealing with containers in this write-up.

  1. Installing G2

    If not done already. See Dependencies above.

  2. Install Neo4j

    An easy way to install and run Neo4j is to run it as a docker container

        sudo sudo docker run --detach \
            --publish=7474:7474 \
            --publish=7687:7687 \
            --volume=$HOME/neo4j/data:/data \
            --volume=$HOME/neo4j/logs:/logs \
            --network ncn \
            neo4j:latest

    Other ways to install and run Neo4j can be found here: Neo4j Installation.

    Once the installation is done go to http://<server name>:7474, using a browser. If the installation is local that would be http://locahlost:7474. Log in using default user name and password, which are neo4j/neo4j. You will be asked to change your password. Do so and remember the password since you will need it for the Edit configuration section below.

  3. Install RabbitMQ

    Again, run it as a docker container is a simple option

       sudo docker run -it --rm --name rabbitmq \
           --publish 5672:5672 \
           --publish 15672:15672 \
            --network ncn \
           rabbitmq:3-management

    If using an installer is preferred please see Downloading and Installing RabbitMQ.

  4. 🤔 Optional: Create a queue in RabbitMQ

    The Connector will create the queue specified in configuration if it doesn't exist already. If having a queue created beforehand is desired, here are the steps:

    1. Open up a browser and enter http://<host name>:15672 into the address bar. If you install locally this will be http://localhost:14562
    2. Log in. Default is guest/guest on a fresh install.
    3. Select Queues tab at the top.
    4. Click Add a new queue below the grid.
    5. Enter senzing in the Name box.
    6. For the Durability option, click the pull-down and selet Transient.
    7. Click Add Queue button at the bottom.
  5. ✏️ Edit configuration

    There are two ways to pass configuration to the connector. Through a configuration file and with command line parameters.

    Lets first look at the configuration file. The configuration file is found at target/conf/neo4jconnector.properties. The steps to set it up follow.

    1. Locate the G2 ini file. It can generally be found in the project path as /home/<user>/senzing/etc/G2Module.ini where user is the user account. See the Quick Start Guide for further information.
    2. Open target/conf/neo4jconnector.properties in an editor.
    3. Change the value of neo4jconnector.g2.inifile to what was found in step 1. above.
    4. Change the neo4jPassword for neo4jconnector.neo4j.uri to the password you created in Install Neo4j section above.
    5. Make any other changes needed. For example if RabbitMQ was set up with user security then user name and password need to be set in the file.

    The command line takes following options:

    -iniFile
        path to the G2 ini file
    -neo4jConnection
        connection string for neo4j, the format is `bolt://<user>:<password>@<hostname>:<port>`
    -mqHost
        host name or ip address for RabbitMQ server
    -mqUser
        RabbitMQ user name
    -mqPassword
        Password for RabbitMQ
    -mqQueue
        The name of the RabbitMQ queue used for receiving messages

If both configuration file and command line options are used the command line options take precedence.

Running

To execute the server you will use java -jar. It is assumed that your environment is properly configured as described in the "Dependencies" and "Preparation for running" sections above.

Type

java -jar neo4j-connector-[version].jar

Where [version] is the version number from the pom.xml file.

If command line options are used it could look like this:

java -jar neo4j-connector-[version].jar \
    -iniFile /home/user/senzing/etc/G2Module.ini \
    -neo4jConnection bolt://neo4j:neo4jPassword@localhost:7687 \
    -mqHost localhost \
    -mqQueue senzing

Demonstrate using Docker

Expectations for docker

Space for docker

This repository and demonstration require 6 GB free disk space.

Time for docker

Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.

Background knowledge for docker

This repository assumes a working knowledge of:

  1. Docker

Configuration

Configuration values specified by environment variable or command line parameter.

Develop

Prerequisite software

The following software programs need to be installed:

  1. git
  2. make
  3. jq
  4. docker

Clone repository

For more information on environment variables, see Environment Variables.

  1. Set these environment variable values:

    export GIT_ACCOUNT=senzing
    export GIT_REPOSITORY=connector-neo4j
    export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
    export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"
  2. Follow steps in clone-repository to install the Git repository.

Build docker image for development

  1. Build docker image.

    cd ${GIT_REPOSITORY_DIR}
    
    sudo make docker-build

    Note: sudo make docker-build-development-cache can be used to create cached docker layers.

Run docker container

  1. Prepare for running.

    Ensure the steps in Preparation for running have been executed before running the docker container.

  2. Run docker container.

    1. ⚠️ macOS - File sharing must be enabled for the volumes.
    2. ⚠️ Windows - File sharing must be enabled for the volumes.

    When running the docker container the command line options need to be used.

    Example:

    sudo docker run \
      --network ncn \
      senzing/connector-neo4j \
          -iniFile /home/user/senzing/etc/G2Module.ini \
          -neo4jConnection bolt://neo4j:neo4jPassword@localhost:7687 \
          -mqHost localhost \
          -mqQueue senzing