Wednesday, April 10, 2024

PGQL Property Graphs and Virtual Private Database (VPD)

Securing your data and controlling its access is and should always be a significant concern. The Oracle Database has plenty of built-in security features that help reduce the risk of data breaches or provide granular access control. Virtual Private Database (VPD) is one such feature for the latter.

Oracle Graph enables you to focus on exploring and analysing connections in your data using a Property Graph Query Language such as PGQL, graph algorithms, or graph machine learning.

The question is how you can combine these two things: Using your Oracle Database as a Graph Database and still securing access to your data used in graphs.

The example I will discuss in my post uses an Oracle Autonomous Database Serverless 19c. If you don´t have access to the Oracle Cloud Infrastructure (OCI) and Autonomous Database yet, don´t mind. Everything I describe works nicely on the Oracle Database wherever you have it installed and running.

The data


Let us start with providing data, we want to control access to and which we also explore later using Oracle Graph. One representing a typical financial services use case, where we have bank accounts money being transferred between the accounts, is provided if you run Lab 1 of the LiveLabs tutorial "Find Circular Payment Chains with Graph Queries in Autonomous Database". The tutorial uses resources on OCI.

I use the LiveLabs tutorial, since it sets up everything I need by applying a Terraform stack. You will find the following resources ready-to-use:

  • An Autonomous Database (ADW or ATP) with a randomized password for the default database user ADMIN (check the end of the "Apply" log displaying the output).
  • A database user named GRAPHUSER with a randomized password (check the end of the "Apply" log displaying the output).
  • Two database tables, BANK_ACCOUNTS and BANK_TXNS with data imported, primary and foreign key constraints properly in place.

Make yourself familiar with the data set. It is stripped to basic information only to demonstrate the use case. See that table BANK_TXNS has a column AMOUNT.

select amount, count(*)
from bank_txns
group by amount
order by 1 desc;

Secure access to your data


There is a regulatory requirement that amounts higher than 5000 are only revealed to a specific user. We will use VPD to ensure that the requirement is met.

VPD is based on two things, a policy function and a policy. We define both as follows using the ADMIN user:

-- Policy function
CREATE OR REPLACE FUNCTION hide_big_txns (
  p_schema IN VARCHAR2,
  p_object IN VARCHAR2
)
RETURN VARCHAR2 AS
  l_predicate VARCHAR2 (200);
  l_user VARCHAR2(100);
BEGIN
  select user into l_user from dual;
  if l_user != 'GRAPHUSER' then
    l_predicate:='amount <= 5000';
  end if;
  RETURN (l_predicate);
END hide_big_txns;
/

-- Policy
BEGIN
  DBMS_RLS.ADD_POLICY (
    object_schema            => 'graphuser',
    object_name              => 'bank_txns',
    policy_name              => 'hide_big_txns_policy',
    function_schema          => 'admin',
    policy_function          => 'hide_big_txns',
    statement_types          => 'select',
    sec_relevant_cols        => 'amount',
    sec_relevant_cols_opt    => DBMS_RLS.ALL_ROWS
  );
END;
/

You can test, if your security requirement defined by VPD is met. Check it first for user GRAPHUSER, who should still be able to see transactions with amounts larger than 5000.

As GRAPHUSER run:

SELECT *
FROM bank_txns
WHERE amount > 5000
ORDER BY amount DESC
FETCH FIRST 5 ROWS ONLY;

Then as ADMIN user we set up a new user granting SELECT privileges to the tables owned by GRAPHUSER.

CREATE USER testuser IDENTIFIED BY <PWD>;
GRANT RESOURCE, CONNECT, CREATE SESSION, CREATE TABLE TO testuser;
ALTER USER testuser QUOTA UNLIMITED ON data;
GRANT SELECT ON graphuser.bank_txns TO testuser;
GRANT SELECT ON graphuser.bank_accounts TO testuser;

Now log in using TESTUSER and run the query again. No transactions should come back as result.

VPD and Graph combined


The next step is to verify that access to the data is also restricted if we use transaction information in a graph and query that graph using PGQL.

For the next steps we use Graph Studio, one of the tools available for Autonomous Databases. Log in to Graph Studio with the user GRAPHUSER.

Define your graph

Log into Graph Studio using GRAPHUSER. Create the following paragraphs and run them once the environment is attached to the Graph Studio session.

%pgql-rdbms
/* Create the graph. Rows in table BANK_ACCOUNTS become vertices, rows in table BANK_TXNS become edges of a graph named BANK_GRAPH_VPD. */
CREATE PROPERTY GRAPH bank_graph_vpd
    VERTEX TABLES (
        graphuser.bank_accounts
        KEY (acct_id)
        LABEL account
        PROPERTIES ( acct_id, name )
    )
    EDGE TABLES (
        graphuser.bank_txns
        KEY (txn_id)
        SOURCE KEY ( src_acct_id ) REFERENCES bank_accounts
        DESTINATION KEY ( dst_acct_id ) REFERENCES bank_accounts
        LABEL transfers
        PROPERTIES ( txn_id, amount, src_acct_id, dst_acct_id, description )
  ) OPTIONS (PG_PGQL)

PGQL Property Graphs and Virtual Private Database (VPD)
Create a graph using PGQL

Load the graph into memory

%python-pgx
# Load the graph into memory
GRAPH_NAME="BANK_GRAPH_VPD"
# try getting the graph from the in-memory graph server
graph = session.get_graph(GRAPH_NAME);
# if it does not exist read it into memory
if (graph == None) :
    session.read_graph_by_name(GRAPH_NAME, "pg_view")
    print("Graph "+ GRAPH_NAME + " successfully loaded")
    graph = session.get_graph(GRAPH_NAME)
else :
    print("Graph '"+ GRAPH_NAME + "' already loaded")

PGQL Property Graphs and Virtual Private Database (VPD)
Load the graph in memory using the Python API

Query the graph

%pgql-pgx
/* Show transactions querying the in-memory graph using PGQL */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd

PGQL Property Graphs and Virtual Private Database (VPD)
Query the in-memory representation of the graph using PGQL

%pgql-pgx
/* Show transactions with amount >= 5000 querying the in-memory graph */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
WHERE t.amount >= 5000

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL: Find transactions with amount >= 5000

You can see that as GRAPHUSER you are allowed to see all transaction amounts including the larger one.

How does it look like running the same PGQL queries as TESTUSER? According the the VPD policy, the amount > 5000 should be returned as NULL. Let us verify it.

Log out from Graph Studio and log in again, this time as TESTUSER. Create a notebook using with the first two paragraphs that GRAPHUSER used to create and load the graph.

%pgql-pgx
/*
 * Show transactions querying the in-memory graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amount should be higher than 5000.
 */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount DESC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph with access restricted by VPD

You probably wonder, why the transactions with amounts > 5000 do not appear at all in the graph. The NULL values for the transaction amount are converted to 0.0 by loading the graph into memory. Hence you can reveal them by running the following PGQL query:

%pgql-pgx

/*
 * Show transactions querying the in-memory graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amount should be higher than 5000.
 */
SELECT s.acct_id as src_acct_id, t.amount, d.acct_id AS dst_acct_id
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount ASC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL against the in-memory graph with access restricted by VPD

If you would rather see the NULL displayed, you can also execute a PGQL query directly against the database, bypassing the graph loaded into memory. 

%pgql-rdbms
/*
 * Show transactions querying the database directly graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amounts higher than 5000 should be displayed .
 */
SELECT s.acct_id as src_acct_id, t.amount, d.acct_id AS dst_acct_id
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount DESC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL against the database graph with access restricted by VPD

How can you distinguish when the PGQL is executed against the in-memory graph and when against the database directly? Look at the interpreter specified in the first line of each paragraph.

  • %pgql-pgx executes a PGQL query against the in-memory graph
  • %pgql-rdbms executes a PGQL query directly against the database

Quod erat demonstrandum


Virtual Private Database policies restrict access to your data. They are also effective when accessing graphs built from the data. Setting up policies once and using them everywhere, including with graphs, is an excellent way to ensure that everybody can see and use what they intend to see and use, regardless of the way, they access the data.

Source: oracle.com

Related Posts

0 comments:

Post a Comment