Logo

dev-resources.site

for different kinds of informations.

GraphDB for CMDB

Published at
1/13/2025
Categories
graphdb
postgressql
python
cmdb
Author
sone
Categories
4 categories in total
graphdb
open
postgressql
open
python
open
cmdb
open
Author
4 person written this
sone
open
GraphDB for CMDB

1. Purpose

Comparing search speed between GraphDB and RDB in some scenarios.

2. Overview

I compared search speed of GraphDB and RDB for data which expresses the spine-leaf architecture.
I found that dataset which has a lot of nodes and depth above a certain level is suitable for GraphDB.

3. Introduction

Recently I became interest in GraphDB.
In paticular, I'm interested in applying GraphDB to managing configuration management database(CMDB) of IT systems.
In this post, I confirmed the gap between GraphDB and RDB from the perspective of performance.

4. Assumption

Environment

I selected Neo4j as GraphDB, and PostgreSQL as RDB.
I constructed the test environment with docker containers.
The definition of containers is as follows.

version: '3'
services:
  postgres:
    image: postgres:15
    ports:
      - 5433:5432
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres
  neo4j:
    image: neo4j:5.26.0
    ports:
      - 7474:7474
      - 7687:7687
  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080
Enter fullscreen mode Exit fullscreen mode

Target Architecture

I prepared three scenarios for comparison.
These scenarios are based on the architecture which consists of spine-leaf network architecture and virtualization environments.
Diagrams of the architectures are as follows.

Scenario1

Scenario1 is so simple.
The total number of nodes is 19, and the depth is 4.

Image description

Scenario2

Scenario2 is more complex than scenario1.
The number of servers is six times as much as scenario1.
And also, connections between leaf switches and servers are full mesh.
Each server has two virtual machines, and each service is running on three virtual machines which belong to each different leaf switches.
The total number of nodes is 273, and the depth is 4.

Image description

Scenario3

Scenario3 is deeper than scenario2.
Each virtual machine has a pod, and each service is running on three pods which belong to each different leaf pairs.
The total number of nodes is 417, and the depth is 5.

Image description

Data Schema Definition

Neo4j(GraphDB)

I defined each node which has just its node name.
Relations between nodes are defined as edges, and I defined two types of edge.
One is has_parent which is the edge from the child node to its parent node, another is has_child which is the edge from the parent node to its child node.
In the case of scenario1, I defined data about spine and leaf swiches as follows.

CREATE (ssw1: SpineSwitch {name: "ssw1"})
CREATE (ssw2: SpineSwitch {name: "ssw2"})
CREATE (ssw3: SpineSwitch {name: "ssw3"})
CREATE (lsw1: LeafSwitch {name: "lsw1"})
CREATE (lsw2: LeafSwitch {name: "lsw2"})
CREATE (lsw3: LeafSwitch {name: "lsw3"})
CREATE (lsw4: LeafSwitch {name: "lsw4"})
CREATE (ssw1)-[:has_child]->(lsw1)
CREATE (ssw1)-[:has_child]->(lsw2)
CREATE (ssw1)-[:has_child]->(lsw3)
CREATE (ssw1)-[:has_child]->(lsw4)
CREATE (ssw2)-[:has_child]->(lsw1)
CREATE (ssw2)-[:has_child]->(lsw2)
CREATE (ssw2)-[:has_child]->(lsw3)
CREATE (ssw2)-[:has_child]->(lsw4)
CREATE (ssw3)-[:has_child]->(lsw1)
CREATE (ssw3)-[:has_child]->(lsw2)
CREATE (ssw3)-[:has_child]->(lsw3)
CREATE (ssw3)-[:has_child]->(lsw4)
CREATE (lsw1)-[:has_child]->(sv1)
CREATE (lsw2)-[:has_child]->(sv2)
CREATE (lsw3)-[:has_child]->(sv3)
CREATE (lsw4)-[:has_child]->(sv4);
Enter fullscreen mode Exit fullscreen mode

PostgreSQL(RDB)

I defined two tables.
One is nodes table which store each node data, another is relationships table which store relation data between nodes.

CREATE TABLE nodes (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL UNIQUE,
    type VARCHAR(50) NOT NULL
);

CREATE TABLE relationships (
    id SERIAL PRIMARY KEY,
    parent_id INT NOT NULL,
    child_id INT NOT NULL,
    relationship_type VARCHAR(50) NOT NULL,
    FOREIGN KEY (parent_id) REFERENCES nodes (id),
    FOREIGN KEY (child_id) REFERENCES nodes (id)
);
Enter fullscreen mode Exit fullscreen mode

Search Query

I used the query for searching paths from service1 to spine switches both of which exist in the end of architecture in order to compare search speed.
I measured each of search speed by using following scripts.

Script for Neo4j(GraphDB)

from neo4j import GraphDatabase
import time

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4j"))

def query_execution_time():
    with driver.session() as session:
        start_time = time.time()

        result = session.run("""
            MATCH path = (srv:Service {name: "srv1"})-[:has_parent*]->(ssw:SpineSwitch)
            RETURN path, nodes(path) AS allNodes;
        """)

        end_time = time.time()

        # display results
        for ind, record in enumerate(result):
            nodes = record["allNodes"]
            print(f"Path nodes: {ind+1}")
            for node in nodes:
                print(f"Node: {node['name']}, Type: {list(node.labels)}")
    # Search speed
    print(f"Execution time: {end_time - start_time:.10f} seconds")

query_execution_time()
driver.close()
Enter fullscreen mode Exit fullscreen mode

Script for PostgreSQL(RDB)

import psycopg2
import time

# Connect to PostgreSQL
connection = psycopg2.connect(
    host="localhost",
    port=5433,
    dbname="postgres",
    user="postgres",
    password="postgres"
)

# search for path from srv1 -> spine(Scenario1)
def get_path_from_srv_to_spine_scenario1(srv):
    paths = []
    start_time = time.time()

    with connection.cursor() as cursor:
        # search for vm
        cursor.execute("""
            SELECT p.name
            FROM nodes p
            JOIN relationships r ON p.id = r.parent_id
            JOIN nodes c ON r.child_id = c.id
            WHERE c.name = %s AND p.type = 'VirtualMachine';
        """, (srv,))
        vms = cursor.fetchall()

        print("Virtual Machines connected to srv1:")
        for vm in vms:
            print(f"  {vm[0]}")

        # search for server
        for vm in vms:
            cursor.execute("""
                SELECT p.name
                FROM nodes p
                JOIN relationships r ON p.id = r.parent_id
                JOIN nodes c ON r.child_id = c.id
                WHERE c.name = %s AND p.type = 'Server';
            """, (vm[0],))
            servers = cursor.fetchall()

            print(f"\nServers connected to {vm[0]}:")
            for server in servers:
                print(f"  {server[0]}")

            # search for leaf
            for server in servers:
                cursor.execute("""
                    SELECT p.name
                    FROM nodes p
                    JOIN relationships r ON p.id = r.parent_id
                    JOIN nodes c ON r.child_id = c.id
                    WHERE c.name = %s AND p.type = 'LeafSwitch';
                """, (server[0],))
                leaf_switches = cursor.fetchall()

                print(f"\nLeafSwitches connected to {server[0]}:")
                for lsw in leaf_switches:
                    print(f"  {lsw[0]}")

                # search for spine
                for lsw in leaf_switches:
                    cursor.execute("""
                        SELECT p.name
                        FROM nodes p
                        JOIN relationships r ON p.id = r.parent_id
                        JOIN nodes c ON r.child_id = c.id
                        WHERE c.name = %s AND p.type = 'SpineSwitch';
                    """, (lsw[0],))
                    spine_switches = cursor.fetchall()
                    print(f"\nSpineSwitches connected to {lsw[0]}:")
                    for ssw in spine_switches:
                        print(f"  {ssw[0]}")
                        paths.append(f"{srv}->{vm[0]}->{server[0]}->{lsw[0]}->{ssw[0]}")
    end_time = time.time()
    print(paths)
    print(f"Execution time: {end_time - start_time:.10f} seconds")

# search for path from srv1 -> spine(Scenario2)
def get_path_from_srv_to_spine_scenario2(srv):
    paths = []
    start_time = time.time()

    with connection.cursor() as cursor:
        # search for vm
        cursor.execute("""
            SELECT p.name
            FROM nodes p
            JOIN relationships r ON p.id = r.parent_id
            JOIN nodes c ON r.child_id = c.id
            WHERE c.name = %s AND p.type = 'VirtualMachine';
        """, (srv,))
        vms = cursor.fetchall()

        print("Virtual Machines connected to srv1:")
        for vm in vms:
            print(f"  {vm[0]}")

        # search for server
        for vm in vms:
            cursor.execute("""
                SELECT p.name
                FROM nodes p
                JOIN relationships r ON p.id = r.parent_id
                JOIN nodes c ON r.child_id = c.id
                WHERE c.name = %s AND p.type = 'Server';
            """, (vm[0],))
            servers = cursor.fetchall()

            print(f"\nServers connected to {vm[0]}:")
            for server in servers:
                print(f"  {server[0]}")

            # search for leaf
            for server in servers:
                cursor.execute("""
                    SELECT p.name
                    FROM nodes p
                    JOIN relationships r ON p.id = r.parent_id
                    JOIN nodes c ON r.child_id = c.id
                    WHERE c.name = %s AND p.type = 'LeafSwitch';
                """, (server[0],))
                leaf_switches = cursor.fetchall()

                print(f"\nLeafSwitches connected to {server[0]}:")
                for lsw in leaf_switches:
                    print(f"  {lsw[0]}")

                # search for spine
                for lsw in leaf_switches:
                    cursor.execute("""
                        SELECT p.name
                        FROM nodes p
                        JOIN relationships r ON p.id = r.parent_id
                        JOIN nodes c ON r.child_id = c.id
                        WHERE c.name = %s AND p.type = 'SpineSwitch';
                    """, (lsw[0],))
                    spine_switches = cursor.fetchall()
                    print(f"\nSpineSwitches connected to {lsw[0]}:")
                    for ssw in spine_switches:
                        print(f"  {ssw[0]}")
                        paths.append(f"{srv}->{vm[0]}->{server[0]}->{lsw[0]}->{ssw[0]}")
    end_time = time.time()
    print(paths)
    print(f"Execution time: {end_time - start_time:.10f} seconds")

# search for path from srv1 -> spine(Scenario3)
def get_path_from_srv_to_spine_scenario3(srv):
    paths = []
    start_time = time.time()

    with connection.cursor() as cursor:
        # search for pod
        cursor.execute("""
            SELECT p.name
            FROM nodes p
            JOIN relationships r ON p.id = r.parent_id
            JOIN nodes c ON r.child_id = c.id
            WHERE c.name = %s AND p.type = 'Pod';
        """, (srv,))
        pods = cursor.fetchall()

        print("Pods connected to srv1:")
        for pod in pods:
            print(f"  {pod[0]}")
            # search for vm
            cursor.execute("""
                SELECT p.name
                FROM nodes p
                JOIN relationships r ON p.id = r.parent_id
                JOIN nodes c ON r.child_id = c.id
                WHERE c.name = %s AND p.type = 'VirtualMachine';
            """, (pod[0],))
            vms = cursor.fetchall()

            print("Virtual Machines connected to srv1:")
            for vm in vms:
                print(f"  {vm[0]}")

            # search for server
            for vm in vms:
                cursor.execute("""
                    SELECT p.name
                    FROM nodes p
                    JOIN relationships r ON p.id = r.parent_id
                    JOIN nodes c ON r.child_id = c.id
                    WHERE c.name = %s AND p.type = 'Server';
                """, (vm[0],))
                servers = cursor.fetchall()

                print(f"\nServers connected to {vm[0]}:")
                for server in servers:
                    print(f"  {server[0]}")

                # search for leaf
                for server in servers:
                    cursor.execute("""
                        SELECT p.name
                        FROM nodes p
                        JOIN relationships r ON p.id = r.parent_id
                        JOIN nodes c ON r.child_id = c.id
                        WHERE c.name = %s AND p.type = 'LeafSwitch';
                    """, (server[0],))
                    leaf_switches = cursor.fetchall()

                    print(f"\nLeafSwitches connected to {server[0]}:")
                    for lsw in leaf_switches:
                        print(f"  {lsw[0]}")

                    # search for spine
                    for lsw in leaf_switches:
                        cursor.execute("""
                            SELECT p.name
                            FROM nodes p
                            JOIN relationships r ON p.id = r.parent_id
                            JOIN nodes c ON r.child_id = c.id
                            WHERE c.name = %s AND p.type = 'SpineSwitch';
                        """, (lsw[0],))
                        spine_switches = cursor.fetchall()
                        print(f"\nSpineSwitches connected to {lsw[0]}:")
                        for ssw in spine_switches:
                            print(f"  {ssw[0]}")
                            paths.append(f"{srv}->{pod[0]}->{vm[0]}->{server[0]}->{lsw[0]}->{ssw[0]}")
    end_time = time.time()
    print(paths)
    print(f"Execution time: {end_time - start_time:.10f} seconds")

get_path_from_srv_to_spine_scenario1("srv1")
# get_path_from_srv_to_spine_scenario2("srv1")
# get_path_from_srv_to_spine_scenario3("srv1")

connection.close()
Enter fullscreen mode Exit fullscreen mode

5. Result

Search Speed Comparison in each scenario

Image description

6. Consideration

I found GraphDB suitable for expressing entity which has dozens of nodes and depth above a certain level after receiving the result.
On the contrary, the gap between GraphDB and RDB is insignificant in the case of the small dataset even if its concept maches Graph expression well.

And also, I found the gap of query plainness too big to ignore.
The query of Cypher which is query used in Neo4j is very simple, but SQL is so complex.
Therefore, apart from the gap of performance, this may become a big reason we select GraphDB in order to express information which maches the format of Graph well.

postgressql Article's
30 articles in total
Favicon
GraphDB for CMDB
Favicon
Not able to connect to PostgreSQL server on Fedora
Favicon
Master Test Data Generation With dbForge Studio for PostgreSQL
Favicon
Exploring the Power of Full-Stack Development with Next.js and Prisma
Favicon
Bringing PostgreSQL Query Issues to Light with Insightful Visuals
Favicon
POSTGRESQL - ÍNDICE GIN NA PRÁTICA
Favicon
Reading PostgreSQL Query Plans Brought to a New Level
Favicon
Understanding PostgreSQL Isolation Levels
Favicon
How to Activate and Enable the PostgreSQL Service on Your Kali Linux System
Favicon
Mastering charts and database visualization with ChartDB
Favicon
The Best Ways to Connect to a PostgreSQL Database
Favicon
Hey, welcome to my blog
Favicon
How to Create a Database and Always Connect to It in PostgreSQL Without Needing Superuser Access
Favicon
03. ரிலேஷனல் டேட்டாபேஸ் மாடல் என்றால் என்ன? What is Relational Database Model ? (RDBMS)
Favicon
04. தரவு ஒருங்கிணைவு (Data Integrity)
Favicon
02. DBMS என்றால் என்ன? What is a DBMS?
Favicon
How To Use Materialized Views
Favicon
PostgreSQL Secrets You Wish You Knew Earlier
Favicon
Reading Parallel Plans Correctly
Favicon
New PostgreSQL ORM for Golang: Enterprise
Favicon
Migrate 🪳Coackroach DB into Postgres🐘
Favicon
💡 Database Development: It’s Not Just About Querying!
Favicon
Building Real-Time Data Pipelines with Debezium and Kafka: A Practical Guide
Favicon
01. தரவுத்தளம் எவ்வாறு உருவானது, அதன் தேவை என்ன? How did the database come about, What is its need?
Favicon
Data inconsistency in AWS Amazon Aurora Postgres solved with Local Write Forwarding?
Favicon
PostgreSQL vs. MySQL
Favicon
How To Handle Custom S/DQL Queries On Different Database Engine with DoctrineExpression
Favicon
Deploying PostgreSQL on Kubernetes: 2024 Guide
Favicon
Step-by-Step Guide to Installing PostgreSQL on Arch Linux
Favicon
Remedy for Poor-Performing SQL Queries

Featured ones: