Logo

dev-resources.site

for different kinds of informations.

YugabyteDB as a Graph database with PuppyGraph

Published at
10/7/2024
Categories
yugabytedb
graph
postgres
database
Author
franckpachot
Author
12 person written this
franckpachot
open
YugabyteDB as a Graph database with PuppyGraph

PostgreSQL can handle any workload without streaming data to a purpose-built database. This includes graph databases using PuppyGraph, which maps relational tables to a graph model.
When resilience, scalability, and elasticity are necessary, YugabyteDB provides a distributed alternative to PostgreSQL.

This article details the step-by-step tutorial to using PuppyGraph to query data in PostgreSQL, but using YugabyteDB instead of PostgreSQL.


Deployment

I Create a file docker-compose.yaml with the following content:

services:
  puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
  yugabytedb:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --enable_pg_parity_tech_preview --background=false
    environment:
      - YSQL_USER=puppygraph
      - YSQL_PASSWORD=puppygraph123
    networks:
      yugabytedb_net:
    ports:
      - "5433:5433"
    volumes:
      - ./yugabyte-data:/root/var/data
networks:
  yugabytedb_net:
    name: puppy-yugabytedb
Enter fullscreen mode Exit fullscreen mode

I run the following command to start YugabyteDB and PuppyGraph:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

I can connect on http://localhost:8080/

Image description

Image description


Data Preparation

I run the following command to start ysqlsh, the PostgreSQL shell, to access the database:


docker compose run -it -e PGPASSWORD=puppygraph123 -e PGUSER=puppygraph -e PGHOST=yugabytedb -e PGPORT=5433 -e PGDATABASE=yugabyte yugabytedb ysqlsh 

Enter fullscreen mode Exit fullscreen mode

I execute the following SQL statements in the shell to create tables and insert data:

create schema modern;
create table modern.person (id text, name text, age integer);
insert into modern.person values
                              ('v1', 'marko', 29),
                              ('v2', 'vadas', 27),
                              ('v4', 'josh', 32),
                              ('v6', 'peter', 35);

create table modern.software (id text, name text, lang text);
insert into modern.software values
                                ('v3', 'lop', 'java'),
                                ('v5', 'ripple', 'java');

create table modern.created (id text, from_id text, to_id text, weight double precision);
insert into modern.created values
                               ('e9', 'v1', 'v3', 0.4),
                               ('e10', 'v4', 'v5', 1.0),
                               ('e11', 'v4', 'v3', 0.4),
                               ('e12', 'v6', 'v3', 0.2);

create table modern.knows (id text, from_id text, to_id text, weight double precision);
insert into modern.knows values
                             ('e7', 'v1', 'v2', 0.5),
                             ('e8', 'v1', 'v4', 1.0);

analyze;

Enter fullscreen mode Exit fullscreen mode

Modeling a Graph

I use the PuppyGraph schema file schema.json with the following content:

{
  "catalogs": [
    {
      "name": "postgres_data",
      "type": "postgresql",
      "jdbc": {
        "username": "puppygraph",
        "password": "puppygraph123",
        "jdbcUri": "jdbc:postgresql://yugabytedb:5433/yugabyte",
        "driverClass": "org.postgresql.Driver"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "person",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "Int",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    },
    {
      "label": "software",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "software",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "String",
          "name": "lang"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "knows",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    },
    {
      "label": "created",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "modern",
        "table": "created",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "software",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

I upload the schema by selecting the file schema.json in the

Image description

Image description

Querying the graph via CLI

I execute the following command to access the PuppyGraph Gremlin Console

docker compose exec -it puppygraph ./bin/console
Enter fullscreen mode Exit fullscreen mode

I run the following queries in the console to query the Graph:

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.359s, rows: 1
==>map[age:29 name:marko]
puppy-gremlin> g.V().has("name", "marko").out("knows").out("created").valueMap()
Done! Elapsed time: 0.108s, rows: 2
==>map[lang:java name:lop]
==>map[lang:java name:ripple]
puppy-gremlin>
Enter fullscreen mode Exit fullscreen mode

SQL Queries

I refresh the local cache to check from pg_stat_statements that the queries are fast:

Image description

yugabyte=# select total_time/calls as time, calls, substr(query,1,80)
 from pg_stat_statements order by 1
;
Enter fullscreen mode Exit fullscreen mode

Image description

Scale-Out YugabyteDB

To add more nodes to the YugabyteDB cluster, you can add a service that starts YugabyteDB and join the first node. Add the following to docker-compose.yaml:

  more-nodes:
    image: yugabytedb/yugabyte:2024.1.2.0-b77
    command: yugabyted start --join yugabytedb --enable_pg_parity_tech_preview --background=false
    networks:
      yugabytedb_net:
    deploy:
     replicas: 0
Enter fullscreen mode Exit fullscreen mode

Start a second node with:

docker compose up -d --scale more-nodes=1
Enter fullscreen mode Exit fullscreen mode

Start a third node with:

docker compose up -d --scale more-nodes=2
Enter fullscreen mode Exit fullscreen mode

With three nodes, the cluster is automatically set to Replication Factor 3 and becomes resilient to one node failure

Conclusion

This tutorial demonstrates using PuppyGraph to query YugabyteDB as a graph database, leveraging the PostgreSQL compatibility of YugabyteDB's distributed architecture. It follows the simple deployment steps from PuppyGraph documentation.

With PuppyGraph, you can directly query your existing data stores without needing ETL pipelines. PuppyGraph and YugabyteDB together provide a powerful solution for modern graph-based analytics.


YugabyteDB. The Distributed SQL Database for Mission-Critical Apps

YugabyteDB is the 100% open source cloud native database for mission critical applications. YugabyteDB runs in any public or hybrid cloud.

favicon yugabyte.com
yugabytedb Article's
30 articles in total
Favicon
PostgreSQL plan_cache_mode
Favicon
Index Filtering in PostgreSQL and YugabyteDB (Index Scan instead of Index Only Scan)
Favicon
Unique Index on NULL Values in SQL & NoSQL
Favicon
More details in pg_locks for YugabyteDB
Favicon
Large IntentsDB MemTable with Many Small SST Files
Favicon
Aurora DSQL: How it Compares to YugabyteDB
Favicon
Document data modeling to avoid write skew anomalies
Favicon
When to replace IN() with EXISTS() - correlated and uncorrelated subqueries
Favicon
2024.2: Faster with Shared Memory Between PostgreSQL and TServer Layers
Favicon
Aurora DSQL - Simple Inserts Workload from an AWS CloudShell
Favicon
Amazon Aurora DSQL: Which PostgreSQL Service Should I Use on AWS ?
Favicon
YugabyteDB MVCC and Updates: columns vs. JSON
Favicon
No Gap Ordered Numbering in SQL: A Unique Index to Serialize In Read Committed
Favicon
Starting a YugabyteDB lab cluster with AWS CLI
Favicon
Speeding Up Foreign Key Constraints During Migrations
Favicon
Indexing for a Scalable Serialization Isolation Level
Favicon
The Doctor's On-Call Shift example and a Normalized Relational Schema to Avoid Write Skew
Favicon
You Probably Don't Need Serializable Isolation
Favicon
A brief example of an SQL serializable transaction
Favicon
YugabyteDB as a Graph database with PuppyGraph
Favicon
Native GLIBC instead of Linuxbrew since 2.21
Favicon
pgSphere and Q3C on Distributed SQL
Favicon
IN() Index Scan in PostgreSQL 17 and YugabyteDB LSM Tree
Favicon
Frequent Re-Connections improved by Connection Manager
Favicon
Maintaining Throughput With Less Physical Connections
Favicon
YugabyteDB Connection Manager: a Database Resident Connection Pool with Shared Processes
Favicon
ERROR: index row size 3056 exceeds btree version 4 maximum 2704 for index
Favicon
Write Buffering to Reduce Raft Consensus Latency in YugabyteDB
Favicon
Asynch replication for Disaster Recovery, Read Replicas, and Change Data Capture
Favicon
Fast PITR and MVCC reads with Key-Value LSM Tree

Featured ones: