Wednesday, April 24, 2024

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

The Oracle Database and in particular the Oracle Autonomous Database offers multiple technologies to accelerate queries and improve database performance over JSON data, including indexes, materialized views, in-memory column storage, and Exadata storage-cell pushdown.

Simple Oracle Document Access (SODA) is a NoSQL query language for storing, editing and deleting JSON documents. A SODA collection is a set of documents (in particular JSON) that is backed by an Oracle Database table or view. By default, creating a SODA document collection creates the following in the Oracle Database:

◉ Persistent default collection metadata.
◉ A table for storing the collection, in the database schema to which your SODA client is connected.

If the init.ora parameter compatible is at least 20, then SODA uses JSON data type by default for JSON content, and the version method is UUID. If the init.ora parameter compatible is less than 20, then SODA uses BLOB textual data by default for JSON content and the data is character data. 

Collection metadata is composed of multiple components. The kind of database you use determines the collection metadata that is used by default, and which of its field values you can modify for custom metadata.

Let us consider a collection of movies within the ADMIN schema in a 19c Oracle Autonomous JSON Database (think of one JSON file per movie). After opening JSON from Database Actions, we will create an index on the MOVIES colection:

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

Creating an index for a JSON collection requires the name and the type. The different options are Functional, Spatial and Search.

Before creating a functional index on the runtime field (we will be running order-by queries against it), let us see what are the field needed (fields to enter):

  • For a functional type index, the fields to enter are:
    • Unique: make all indexed values unique
    • Index Nulls: use the index in order-by queries
    • Path Required: the path must select a scalar value, even a JSON null value
    • Properties: select the property that you want to index on, or easier is to just type * to display all available document properties in the collection
    • Composite Index: use more than one property
    • Advanced: change the storage properties of the indexed property
  • For search index, the options are:
    • Dataguide off-on: create JSON data guide for collection
    • Text Search off-on: index all properties in documents to support full-text search based on string equality
    • Range Search off-on: choose on to support range search when string-range search or temporal search (equality or range) is required
  • Spatial index is used to index GeoJSON geographic data. The selected property should be of GeoJSON type.
    • Path Required: the path must select a value, even if it is a JSON null value
    • Lax: the targeted field does not need to be present or does not have a GeoJSON geometry object as its value

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

The properties of the selected index appear in JSON format below the listed indexes:

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

We can also index a singleton scalar field using SQL, here is how:

CREATE INDEX YEAR_NDX ON movies (json_value(data, '$.year.number()' ERROR ON ERROR));

Item method numberOnly() is used in the path expression that identifies the field to index, to ensure that the field value is numeric. As I have the year field in one of the JSON files as a string, I am getting the following error if I use the method numberOnly(): ORA-01722: invalid number. Method numberOnly() is used instead of method number(), because number() allows also for conversion of non-numeric fields to numbers. Clearly as some moview are produced during the same year I cannot make the index unique: ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found.

Creating a collection relational views of JSON documents is simple, select the columns we need, say title, year and runtime:

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

In SQL, we can use the view and order the movies by runtime:

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

Regardless of your database release you can create whatever Oracle Database indexes you need directly, using:

(1) the JSON Page of Using Oracle Database Actions, check Creating Indexes for JSON Collections

(2) Simple Oracle Document Access (SODA)

(3) SQL — see Indexes for JSON Data in Oracle Database JSON Developer’s Guide.

Using the JSON page in ADB is perhaps the easiest approach to indexing JSON data.

The static data dictionary views USER_SODA_COLLECTIONS lists the basic features of all of your SODA collections. Correspondently, you have also DBA_SODA_COLLECTIONS and ALL_SODA_COLLECTIONS. Collection metadata, expressed in JavaScript Object Notation (JSON) can be obtained from the last column, called JSON_DESCRIPTOR.

Users will typically work with JSON collections using native language drivers, for example, SODA for Java or SODA for Python. SODA native language drivers generally provide more throughput (operations per second) than the REST driver (SODA for REST). It is recommended to configure the SODA drivers as follows:

  • Enable SODA Metadata Cache: The SODA driver needs to know the metadata of each JSON collection (the column names, types, etc.). By enabling the metadata cache, roundtrips to the database can be saved, improving latency and throughput.
  • Enable Statement Cache: Statement caching improves performance by caching executable statements that are used repeatedly, such as in a loop or in a method that is called repeatedly. For Java, the statement cache is enabled using JDBC.
  • For load-balanced systems: turn off DNS caching: Load balancing allows to distribute SODA operations across different nodes. If DNS caching is turned on, then all connections are likely to use the same node and nullifying the load balancing. For Java, the following system property should be set: inet.addr.ttl=0

The database performance tuning techniques also apply to SODA: for example, SODA collections can be partitioned or sharded, and queries can be accelerated using indexes and/or materialized views.

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

How to monitor all that? You can turn on performance monitoring of the SQL operations that underlie a SODA read or write operation, by adding a SQL hint to the SODA operation.

Use only hint MONITOR (turn on monitoring) or NO_MONITOR (turn off monitoring). You can use this to pass any SQL hints, but MONITOR and NO_MONITOR are the useful ones for SODA, and an inappropriate hint can cause the optimizer to produce a suboptimal query plan.

Note in addition that when using SODA with Autonomous Database the following restrictions apply:

  • Automatic indexing is not supported for SQL and PL/SQL code that uses the SQL/JSON function json_exists.
  • Automatic indexing is not supported for SODA query-by-example (QBE)

Final note: since Auto Indexing is disabled by default in the Autonomous Database, you can enable it by running: EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

Source: juliandontcheff.wordpress.com

Monday, April 22, 2024

Oracle Database API for MongoDB - Best Practices

Oracle Database API for MongoDB - Best Practices

The Oracle Database API for MongoDB (hereafter MongoDB API) allows you to access Oracle Databases from MongoDB-compatible programs and drivers simply by modifying the access URI of your program or application.

There are two ways to use the Oracle Database API for MongoDB (or MongoDB API):

1. As part of a fully managed Autonomous Database Serverless.
2. Using a standalone ORDS server connected to an Autonomous Database Dedicated/CC or an unmanaged Oracle Database, on-premises or in the Cloud.

As a fully managed service, the first option requires no configuration (beyond turning the feature on) nor any maintenance from a customer perspective. This document will therefore concentrate on the second option, where the customer manages the deployment and support of the MongoDB API.

MongoDB API architecture


MongoDB programs and drivers communicate with their database(s) over a socket connection, usually using port 27017. A URI contains all the information needed to define the server connection and typically takes the form

mongodb://[user:pass@]servername:27017/databasename?parameters

In the case of a MongoDB database, the database server itself listens on port 27017 and acts on incoming requests. Oracle MongoDB API provides the same transparent connection through Oracle REST Data Services (ORDS), so all you need to do is to change your connection string. No additional port openings or firewall changes are required; all you need is network connectivity to your ORDS installation, on-premises, or in the Cloud. 

The MongoDB API is an integral part of ORDS. It rewrites MongoDB requests into appropriate Oracle SQL or SODA operations and passes them to the Oracle database server over SQL*Net. Storing your documents centrally in an Oracle Database enables you to provide MongoDB compatibility with your data stored and managed in the most mature, mission-critical Oracle Database and enhance your capabilities to work with the data beyond pure document-store APIs, such as full SQL/JSON access for analytics.

Oracle Database API for MongoDB - Best Practices

You need to deploy ORDS as a standalone application, meaning that the ORDS process spins up its own Jetty Web Server. Such a deployment has the advantage of requiring only Java and a minimal resource footprint.

The following defines some best practices for deploying the MongoDB API for your applications.

I have an Autonomous Database Serverless. Is there any reason to use a standalone ORDS server??


No.

In most cases, the built-in MongoDB API support is sufficient, and you don’t want to deploy and manage anything yourself. However, you may get more tunability and performance under certain challenging conditions by utilizing a standalone ORDS server on an OCI VM or a machine outside of OCI. If you decide to deploy your own ORDS, you can still use the MongoDB API interface provided by the Autonomous Database Service in parallel, for example, for testing or development purposes.

Can I use WebLogic Server or Apache Tomcat?


No.

Although ORDS itself can run standalone or as part of WebLogic Server (WLS) or Tomcat, the MongoDB API is currently only supported through standalone ORDS deployments.

Where should I run ORDS?


The location of where to deploy ORDS is primarily a decision of your architectural preference and how you leverage ORDS in your enterprise. It also depends on whether ORDS is just a means for providing the MongoDB API or is also leveraged for other ORDS functionality, such as serving as a REST Server or as a backend for Oracle APEX. Depending on your use case, you can either:

◉ Deploy ORDS centrally as a separate component.
◉ Consider ORDS part of your application that requires MongoDB API compatibility.
 
When working with ORDS, where the sole purpose is to provide the MongoDB API, it is usually best to run the ORDS server in the application node next to your application. When you have multiple application servers on individual nodes, you can run an ORDS instance with each application server. That way:

◉ The MongoDB API scales with the number of application server nodes.
◉ You only need to size the resources for each ORDS installation based on the requirements of a single application node.
◉ There’s no need to keep other ports open between your application node and the ORDS node. The only connection port required to communicate with the Oracle Database is the port for your SQL*Net connection.
◉ You don’t need to maintain separate hardware or containers for the ORDS nodes.
◉ Assuming you have a high-availability and/or load-balancing solution for your application nodes, that will also automatically work for ORDS.

However, there can be advantages to running ORDS on a separate cluster, mainly in the case of using ORDS for more than the MongoDB API. The benefits would be:

◉ A single ORDS instance can serve multiple applications and/or databases.
◉ ORDS can be scaled independently of the application.
◉ You can maintain a shared connection pool.

When deploying ORDS as a separate central component in a multi-tenant fashion, you need to ensure that:

◉ Your HA requirements must be met with multiple deployments and a mechanism to transparently work with any of your deployed ORDS instances, for example, through Load Balancers in a Cloud environment.
◉ You need to size the resources required for ORDS based on the aggregated requirements of all ORDS-supported services.

Are there other recommendations for installing and using Standalone ORDS?


Yes. Please see Oracle REST Data Services (ORDS) best practices, but when reading it, remember that any references to WLS or Tomcat should be ignored—any recommendation wrt. High Availability is applicable for standalone ORDS installations. Most of these recommendations will be more relevant for separate, centrally deployed ORDS environments.

Anything I need to do to achieve high throughput?


Yes. ORDS is configured with a relatively basic set of performance settings out of the box. Those might be ok for testing or small applications with light-weight utilization, but if you want high throughput, you must configure some settings like the connection pool size. Please see Section 9.7, Achieving High Performance in the ORDS Installation and Configuration Guide

Source: oracle.com

Friday, April 19, 2024

Unleashing the Power of Oracle Enterprise Manager Cloud Control for Oracle Exadata Cloud

Unleashing the Power of Oracle Enterprise Manager Cloud Control for Oracle Exadata Cloud

Introduction: Understanding Oracle Enterprise Manager Cloud Control


Oracle Enterprise Manager Cloud Control is a comprehensive management solution that offers a unified and centralized approach to managing Oracle environments. Specifically designed for Oracle Exadata Cloud, this powerful tool empowers organizations to streamline operations, enhance performance, and optimize resources effectively.

Streamlined Management with Oracle Enterprise Manager Cloud Control


Centralized Monitoring and Administration

Oracle Enterprise Manager Cloud Control provides a centralized platform for monitoring and administering Oracle Exadata Cloud environments. With real-time insights into system performance, resource utilization, and application health, administrators gain full visibility and control over their infrastructure.

Proactive Performance Management

By leveraging advanced monitoring capabilities, Oracle Enterprise Manager Cloud Control enables proactive performance management. Administrators can identify potential issues, analyze trends, and take preemptive actions to ensure optimal performance and prevent downtime.

Automated Provisioning and Scaling

One of the key benefits of Oracle Enterprise Manager Cloud Control is its ability to automate provisioning and scaling tasks. With predefined templates and policies, administrators can effortlessly deploy new resources, scale capacity up or down as needed, and optimize resource utilization based on workload demands.

Enhanced Security and Compliance


Comprehensive Security Controls

Oracle Enterprise Manager Cloud Control offers a wide range of security controls to safeguard Oracle Exadata Cloud environments. From user authentication and access control to encryption and audit logging, administrators can enforce stringent security measures to protect sensitive data and ensure compliance with regulatory requirements.

Continuous Compliance Monitoring

With built-in compliance management features, Oracle Enterprise Manager Cloud Control enables continuous monitoring of regulatory compliance. Administrators can define and enforce compliance policies, conduct regular audits, and generate compliance reports to demonstrate adherence to industry standards and regulations.

Optimized Resource Management


Efficient Resource Allocation

Oracle Enterprise Manager Cloud Control optimizes resource management by providing tools for efficient allocation and utilization of resources. By analyzing workload patterns and performance metrics, administrators can allocate resources dynamically, optimize resource utilization, and eliminate resource bottlenecks.

Cost Optimization

By optimizing resource usage and improving operational efficiency, Oracle Enterprise Manager Cloud Control helps organizations minimize costs associated with Oracle Exadata Cloud deployment. Administrators can identify opportunities for cost savings, optimize licensing agreements, and streamline resource provisioning to achieve maximum ROI.

Conclusion: Empowering Organizations with Oracle Enterprise Manager Cloud Control
Oracle Enterprise Manager Cloud Control is a game-changing solution for organizations leveraging Oracle Exadata Cloud. From streamlined management and enhanced security to optimized resource utilization and cost savings, this powerful tool offers a comprehensive suite of features to empower organizations and drive business success.

Monday, April 15, 2024

Where is the Complexity?

Where is the Complexity?

One of the common arguments I hear about avoiding XA distributed transactions is due to their complexity. In this series of blog posts I’ll examine that claim by looking at three versions of the same application. The first version of the application ignores data consistency issues and operates as though failures essentially don’t occur. This unfortunately is a pretty common practice due to the perceived complexity of introducing distributed transactions into an application. The second version adopts the saga pattern that is all the rage. It uses Eclipse MicroProfile Long Running Actions to implement the saga. Unlike the toy examples used to illustrate how sagas work, this version will include the necessary logic to actually be able to complete or compensate a transaction. Finally, the third version will use the XA pattern to ensure data consistency.

Basic Problem to Solve


The application I’ll use to illustrate the issues associated with ensuring data consistency is one that provides the transfer of money from one account to another account where each account is serviced by a different microservice. Fundamentally a very simple application if you ignore failures. The flow is basically a Transfer microservice:

1. Accepts a request to transfer an amount of money from one account to another account
2. Makes a request to withdraw the amount from the first account
3. Makes a request to deposit the amount in the second account
4. Returns success

Where is the Complexity?

A very simple application, what could possibly go wrong?

Simplistic Application Without Considering Failures


Let’s look first at a possible simple Transfer microservice. It offers a single service named “transfer”, which transfers money from an account in one microservice (department1) to an account in a different microservice (department2). Here is a simple Spring Boot based teller service that handles transferring the money:

@RestController
@RequestMapping("/transfers")
@RequestScope
public class TransferResource {

    private static final Logger LOG = LoggerFactory.getLogger(TransferResource.class);

    @Autowired
    RestTemplate restTemplate;

    @Value("${departmentOneEndpoint}")
    String departmentOneEndpoint;

    @Value("${departmentTwoEndpoint}")
    String departmentTwoEndpoint;

    @RequestMapping(value = "transfer", method = RequestMethod.POST)
    public ResponseEntity<?> transfer(@RequestBody Transfer transferDetails) throws TransferFailedException {
        ResponseEntity<String> withdrawResponse = null;
        ResponseEntity<String> depositResponse = null;

        LOG.info("Transfer initiated: {}", transferDetails);
        try {
            withdrawResponse = withdraw(transferDetails.getFrom(), transferDetails.getAmount());
            if (!withdrawResponse.getStatusCode().is2xxSuccessful()) {
                LOG.error("Withdraw failed: {} Reason: {}", transferDetails, withdrawResponse.getBody());
                throw new TransferFailedException(String.format("Withdraw failed: %s Reason: %s", transferDetails, withdrawResponse.getBody()));
            }
        } catch (Exception e) {
            LOG.error("Transfer failed as withdraw failed with exception {}", e.getLocalizedMessage());
            throw new TransferFailedException(String.format("Withdraw failed: %s Reason: %s", transferDetails, Objects.nonNull(withdrawResponse) ? withdrawResponse.getBody() : withdrawResponse));
        }

        try {
            depositResponse = deposit(transferDetails.getTo(), transferDetails.getAmount());
            if (!depositResponse.getStatusCode().is2xxSuccessful()) {
                LOG.error("Deposit failed: {} Reason: {} ", transferDetails, depositResponse.getBody());
                LOG.error("Reverting withdrawn amount from account {}, as deposit failed.", transferDetails.getFrom());
                redepositWithdrawnAmount(transferDetails.getFrom(), transferDetails.getAmount());
                throw new TransferFailedException(String.format("Deposit failed: %s Reason: %s ", transferDetails, depositResponse.getBody()));
            }
        } catch (Exception e) {
            LOG.error("Transfer failed as deposit failed with exception {}", e.getLocalizedMessage());
            LOG.error("Reverting withdrawn amount from account {}, as deposit failed.", transferDetails.getFrom());
            redepositWithdrawnAmount(transferDetails.getFrom(), transferDetails.getAmount());
            throw new TransferFailedException(String.format("Deposit failed: %s Reason: %s ", transferDetails, Objects.nonNull(depositResponse) ? depositResponse.getBody() : depositResponse));
        }
        LOG.info("Transfer successful: {}", transferDetails);
        return ResponseEntity
                .ok(new TransferResponse("Transfer completed successfully"));
    }
    
    /**
     * Send an HTTP request to the service to withdraw amount from the provided account identity
     *
     * @param accountId The account Identity
     * @param amount    The amount to be withdrawn
     */
    private void redepositWithdrawnAmount(String accountId, double amount) {
        URI departmentUri = UriComponentsBuilder.fromUri(URI.create(departmentOneEndpoint))
                .path("/accounts")
                .path("/" + accountId)
                .path("/deposit")
                .queryParam("amount", amount)
                .build()
                .toUri();

        ResponseEntity<String> responseEntity = restTemplate.postForEntity(departmentUri, null, String.class);
        LOG.info("Re-Deposit Response: \n" + responseEntity.getBody());
    }

    /**
     * Send an HTTP request to the service to withdraw amount from the provided account identity
     *
     * @param accountId The account Identity
     * @param amount    The amount to be withdrawn
     * @return HTTP Response from the service
     */
    private ResponseEntity<String> withdraw(String accountId, double amount) {
        URI departmentUri = UriComponentsBuilder.fromUri(URI.create(departmentOneEndpoint))
                .path("/accounts")
                .path("/" + accountId)
                .path("/withdraw")
                .queryParam("amount", amount)
                .build()
                .toUri();

        ResponseEntity<String> responseEntity = restTemplate.postForEntity(departmentUri, null, String.class);
        LOG.info("Withdraw Response: \n" + responseEntity.getBody());
        return responseEntity;
    }

    /**
     * Send an HTTP request to the service to deposit amount into the provided account identity
     *
     * @param accountId The account Identity
     * @param amount    The amount to be deposited
     * @return HTTP Response from the service
     */
    private ResponseEntity<String> deposit(String accountId, double amount) {
        URI departmentUri = UriComponentsBuilder.fromUri(URI.create(departmentTwoEndpoint))
                .path("/accounts")
                .path("/" + accountId)
                .path("/deposit")
                .queryParam("amount", amount)
                .build()
                .toUri();

        ResponseEntity<String> responseEntity = restTemplate.postForEntity(departmentUri, null, String.class);
        LOG.info("Deposit Response: \n" + responseEntity.getBody());
        return responseEntity;
    }

}

Note that this simplistic implementation of the transfer service at least considers the possibility that the deposit service fails and if so attempts to redeposit the withdrawn amount back into the source account.  However as should be obvious, it’s possible that the redeposit fails thus leaving the funds in limbo.

Here is the code providing the REST service interface for withdraw for Department1:

@RequestMapping(value = "/{accountId}/withdraw", method = RequestMethod.POST)
    public ResponseEntity<?> withdraw(@PathVariable("accountId") String accountId, @RequestParam("amount") double amount) {
        try {
            this.accountOperationService.withdraw(accountId, amount);
            return ResponseEntity.ok("Amount withdrawn from the account");
        } catch (NotFoundException e) {
            return ResponseEntity.status(HttpStatus.NOT_FOUND).body(e.getMessage());
        } catch (UnprocessableEntityException e) {
            LOG.error(e.getLocalizedMessage());
            return ResponseEntity.status(HttpStatus.UNPROCESSABLE_ENTITY).body(e.getMessage());
        } catch (Exception e) {
            LOG.error(e.getLocalizedMessage());
            return ResponseEntity.internalServerError().body(e.getLocalizedMessage());
        }
    }

The withdraw service calls the withdraw method on the accountOperationService, Here is the code providing the withdraw method used by the above REST service. It uses an injected EntityManager for JPA:

/**
 * Service that connects to the accounts database and provides methods to interact with the account
 */
@Component
@RequestScope
@Transactional
public class AccountOperationService implements IAccountOperationService {

    private static final Logger LOG = LoggerFactory.getLogger(AccountOperationService.class);

    @Autowired
    EntityManager entityManager;

    @Autowired
    IAccountQueryService accountQueryService;

    @Override
    public void withdraw(String accountId, double amount) throws UnprocessableEntityException, NotFoundException {
        Account account = accountQueryService.getAccountDetails(accountId);
        if (account.getAmount() < amount) {
            throw new UnprocessableEntityException("Insufficient balance in the account");
        }
        LOG.info("Current Balance: " + account.getAmount());
        account.setAmount(account.getAmount() - amount);
        account = entityManager.merge(account);
        entityManager.flush();
        LOG.info("New Balance: " + account.getAmount());
        LOG.info(amount + " withdrawn from account: " + accountId);
    }

    @Override
    public void deposit(String accountId, double amount) throws NotFoundException {
        Account account = accountQueryService.getAccountDetails(accountId);
        LOG.info("Current Balance: " + account.getAmount());
        account.setAmount(account.getAmount() + amount);
        account = entityManager.merge(account);
        entityManager.flush();
        LOG.info("New Balance: " + account.getAmount());
        LOG.info(amount + " deposited to account: " + accountId);
    }
}

The withdraw method gets the current account balance and checks for sufficient funds and throws an exception if insufficient funds. Otherwise it updates the account balance and saves the account. We can also see the deposit method which gets the current account balance, adds the amount to deposit and saves the updated account information.

What About Failures?


The developer of this teller service realizes there might be some failure scenarios to handle. For example, what happens if the deposit request fails? The developer solved that by having the teller service takes the corrective measure of redepositing the money back into the first account. Problem solved. But what happens if between the time of the withdrawal request and the request to redeposit the funds the teller microservice dies? What happens to the funds that were withdrawn? As it stands, they’re lost!

The developer could solve this problem by creating a table of pending operations that could be examined when the teller microservice starts up. But that would also mean that the deposit service must be idempotent as the only thing the teller service can do is retry the deposit request until it succeeds at which point it would remove the entry from its pending operations table. Until the deposit succeeds, the funds are basically in limbo and inaccessible to the account owner or anyone else.

So far, the developer has only handled some of the possible failures by adding error recovery logic into their microservice. And this is only for a trivial microservice. As more state information is updated, more complex recovery mechanisms may need to be added to the microservice. In the next post, we’ll look at how we can apply the saga pattern to solve this data consistency problem using Eclipse MicroProfile Long Running Actions, coordinated by MicroTx.

Source: oracle.com

Friday, April 12, 2024

Proper SQL comes to MongoDB applications .. with the Oracle Database!

Proper SQL comes to MongoDB applications .. with the Oracle Database!

As some of you might know, an exciting part of my job is working with the Oracle Database API for MongoDB (short MongoDB API), a part of Oracle's converged database that brings MongoDB's document store API to the world's best database (personal opinion, no need to rebuff here but to discuss offline). No, it's not about Oracle wanting to be a MongoDB - they're probably better at chasing their niche - but to offer their simple and widely used document store APIs and framework integration just like Mongo does, together with all of Oracle's powerful support of all workloads and datatypes. And the integration continues.

MongoDB added a new operator $sql to their aggregation pipeline framework not too long ago (as of end of February 2024 it's currently still in beta), so we at Oracle figured, hey, we have SQL, too ;-). But unlike them, we've been doing SQL for quite some time, so why not support that operator and offer our customers the world of Oracle's powerful SQL within the realms of the MongoDB API? That's precisely what we did.

Use Oracle SQL with the Oracle MongoDB API and instantaneously benefit from Oracle's converged database.

What can I use it for?


  • Have data in classical relational tables that you want to share as a collection in your MongoDB app? We got you covered.
  • Have data in classical relational tables that you want to combine and process together with collections in your MongoDB application? We got you covered.
  • Want to leverage some advanced SQL functionality that is hard or impossible to do in MongoDB? We got you covered.
  • Have some procedural logic you want to integrate in an aggregation pipeline? We got you covered.
  • Need a pragmatic and straightforward way to deal with the ever-decreasing little things we have not gotten to implement without leaving the Mongo ecosystem? We got you covered with that, too.

Yes, you use Oracle SQL without leaving the world of MongoDB API and integrate and work jointly with relational data and MongoDB collections side-by-side.

A Quick Walk-Through


Let me give you some simple examples to give you a glimpse of what's doable here. The following are simple mongosh examples for illustration, but needless to say that any integration - like using bind variables - can be fully embedded in your application.

Expose relational data

Suppose you just simply want to expose data coming from your relational core system as a read only collection without persisting and periodically updating the data as Mongo collection. No need to copy or transfer data: you just select the information dynamically with SQL from the pure relational structures and put it in your application.

db.aggregate([{$sql: `
             select fiscal_year, fiscal_quarter_number, sum(amount_sold) as sum_amount
             from sh.times t join sh.sales s on (t.time_id = s.time_id)
             group by fiscal_year, fiscal_quarter_number order by 1,2`}
])

In real world this is often embedded in the context of an application and previous filters and values, so let's use bind variables to limit the result set in your application. Using binds just like in JDBC helps to improve performance and prevents any sort of SQL injection:

db.aggregate([ {$sql: 
                  { statement: ` 
                      select 
                         fiscal_year, fiscal_quarter_number, sum(amount_sold) as sum_amount 
                      from sh.times t join sh.sales s on (t.time_id = s.time_id) 
                      where fiscal_year = :1 
                      group by fiscal_year, fiscal_quarter_number 
                      order by 1,2`, 
                    binds: [
                        { index: 1, value: 2019}
                        ]
                    }}
])

That just works fine in any Oracle Database 19c and above, on-premises and with Autonomous Database.

Simple lookup with relational data

Let's make things a bit more interesting: you're running your Mongo application for a specific business unit that now wants to augment their collections with data from common corporate entitities, stored centrally in your enterprise database. You can do so with a simple "$sql lookup" - join in the relational lingo - and add as many additional common attributes as you like. Within an aggregation pipeline, you are using $sql just like any other stage that consumes the input documents from the previous stage and produces documents for subsequent stages.

Since I am an Oracle person for a long time and there are many of us out there, I assume - in fact, hope - that some of you are reading this blog. I also assume that some of you have heard about EMP and DEPT, one of the oldest relational examples out there. I figured I am just using this to illustrate the functionality I am talking about. Yes, you can use this schema everywhere.

Let's first JSON-ize our EMP table and consider this our document collection (we name it empJSON), with a referencing model linking to our purely relational table DEPT. We want to expose our employee information (excluding salary) in a simple web application. Let's use mongosh for that:

jason> db.aggregate([{$sql:`
                         select json{empno, ename, job, mgr, hiredate, deptno} from emp`
                      },
...                   {$out: "empJSON"}])

jason> db.empJSON.findOne()
{
  _id: ObjectId('65ea34720af5351d8f7bf901'),
  empno: 7839,
  ename: 'KING',
  job: 'PRESIDENT',
  mgr: null,
  hiredate: ISODate('1981-11-17T00:00:00.000Z'),
  deptno: 10
}

That wasn't too hard, was it?

However, we want to not show the department number, but the name of the department in our app. And for some reason, we don't want to persist the department name in our stored collection. So let's just look it up in realtime using a $sql stage whenever we need the result, joining our collection with our purely relational table.

jason> db.empJSON.aggregate([{$sql: `
                                select json_transform(e.data, set '$.dname' = d.dname, remove '$.deptno') 
                                from input e, dept d 
                                where e.data.deptno.number() = d.deptno`
                               },
...                            {$limit: 1}])
[
  {
    _id: ObjectId('65ea34720af5351d8f7bf901'),
    empno: 7839,
    ename: 'KING',
    job: 'PRESIDENT',
    mgr: null,
    hiredate: ISODate('1981-11-17T00:00:00.000Z'),
    dname: 'ACCOUNTING'
  }
]

What you see in this little example is how we integrated the $sql stage transparently into Mongo's aggregation pipeline framework: collections produced by previous stages are represented as a JSON collection table INPUT with a single column DATA, containing your documents. We simply joined our collection empJSON with the relational table DEPT, added the field 'dname' and removed the unnecessary field 'deptno'. Mission accomplished, EMP and DEPT are now officially a part of MongoDB demos.

Using the aggregation pipeline in such a manner requires Oracle Database 23c.

Leverage analytics and encapsulated business logic

So you are as savvy in SQL as you are in MongoDB lingo? Choose what you do best and fastest. The following is a rather simple example that aggregates and ranks your yearly gross revenue with your movies using SQL, sorts the data and gives us the key attributes for the top ten.

db.movies.aggregate([
          {$match: {year:2019, gross : {$ne: null}}},
          {$sql:`
              select json_mergepatch(i.data, json {'rank': rank() over (order by i.data."gross" desc)})
              from input i`},
          {$project: { rank: 1, year: 1, title: 1, gross: 1, "_id": 0 }},
          {$match: {rank : {$le : 10}}},
          {$sort: {rank: 1}}
])

In this simple example, we are doing the ranking in SQL, but the sorting and limiting to the top ten in the aggregation pipeline, just to show the interchangeability. We could have done everything in SQL (or the Mongo aggregation pipeline for that matter), but decided to only do the ranking (and implicitly required sorting) in Oracle, letting Oracle's enterprise performance features loose to munge through the data. 

But wait. 

Our finance department had worked hard on our magical global financial gross adjustment that is applied everywhere and encapsulated in SQL. What now? Well, just add a pipeline stage and apply the magic to your Mongo collection, calculate the right number, and add the adjusted gross revenue to your collection:

db.movies.aggregate([{$sql: `
                      select json_mergepatch(i.data,
                             json{'adjGross':adjust_gross(i.data.gross.number())})
                      from input i, dual`}
])

Bridge the gaps

Last but not least, you can use SQL for everything that Oracle MongoDB API does not support. As briefly mentioned before, our vision and aim is not to be a Me-Too Mongo. Our vision is the enterprise and the completeness of a converged database, supporting any datatype with any workload. There will most likely always be gaps in functionality as long as MongoDB and Oracle exist.

One of the most prominent gaps as of today is the lack of index creation through the MongoDB API in Oracle Database 19c. Prior to the $sql operator, you had to leave the MongoDB eco system, connect with a SQL tool, and create any index from there. With the introduction of the $sql operator, you still use SQL, but there is no need to leave the MongoDB eco system. Just use the $sql operator in Oracle Database 19c, and bridge this gap pragmatically for now, until Oracle Database 23c is on your radar. Oracle Database 23c supports index creation through the MongoDB API, but prior to that - like in Autonomous Database - our $sql stage comes to the rescue.

db.aggregate([{ $sql: `
                create index i_movies_sku
                on movies(json_value(data, '$.sku.stringOnly()' ERROR ON ERROR))`}
])

You will see the successful index creation right afterwards:

jason> db.movies.getIndexes()
[
  {
    name: 'I_MOVIES_SKU',
    indexNulls: false,
    unique: false,
    v: 2,
    key: { 'sku.stringOnly()': 1 },
    ns: 'jason.movies'
  },
  { v: 2, key: { _id: 1 }, ns: 'jason.movies', name: '_id_' }

]

See the index at work:

jason> db.movies.find({"sku":"NTV55017"}).explain()
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'jason.movies',
    indexFilterSet: false,
    parsedQuery: { sku: { '$stringOnly': 'NTV55017' } },
    rewrittenQuery: { sku: { '$stringOnly': 'NTV55017' } },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'TABLE ACCESS',
        options: 'BY INDEX ROWID BATCHED',
        source: 'MOVIES',
        columns: '"MOVIES"."ID"[RAW,4000], "CREATED_ON"[TIMESTAMP,11], "LAST_MODIFIED"[TIMESTAMP,11], "VERSION"[VARCHAR2,255], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200]',
        inputStage: {
          stage: 'INDEX',
          options: 'RANGE SCAN',
          source: 'I_MOVIES_SKU',
          columns: `"MOVIES".ROWID[ROWID,10], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.sku.stringOnly()' RETURNING VARCHAR2(4000) ERROR ON ERROR)[VARCHAR2,4000]`,
          filterType: 'access',
          filter: `JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.sku.stringOnly()' RETURNING VARCHAR2(4000) ERROR ON ERROR)=:1`,
          path: "$.sku.stringOnly()'"
        }
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}

That's quite a list of cool things you can do now, isn't it?

Source: oracle.com

Wednesday, April 10, 2024

PGQL Property Graphs and Virtual Private Database (VPD)

Securing your data and controlling its access is and should always be a significant concern. The Oracle Database has plenty of built-in security features that help reduce the risk of data breaches or provide granular access control. Virtual Private Database (VPD) is one such feature for the latter.

Oracle Graph enables you to focus on exploring and analysing connections in your data using a Property Graph Query Language such as PGQL, graph algorithms, or graph machine learning.

The question is how you can combine these two things: Using your Oracle Database as a Graph Database and still securing access to your data used in graphs.

The example I will discuss in my post uses an Oracle Autonomous Database Serverless 19c. If you don´t have access to the Oracle Cloud Infrastructure (OCI) and Autonomous Database yet, don´t mind. Everything I describe works nicely on the Oracle Database wherever you have it installed and running.

The data


Let us start with providing data, we want to control access to and which we also explore later using Oracle Graph. One representing a typical financial services use case, where we have bank accounts money being transferred between the accounts, is provided if you run Lab 1 of the LiveLabs tutorial "Find Circular Payment Chains with Graph Queries in Autonomous Database". The tutorial uses resources on OCI.

I use the LiveLabs tutorial, since it sets up everything I need by applying a Terraform stack. You will find the following resources ready-to-use:

  • An Autonomous Database (ADW or ATP) with a randomized password for the default database user ADMIN (check the end of the "Apply" log displaying the output).
  • A database user named GRAPHUSER with a randomized password (check the end of the "Apply" log displaying the output).
  • Two database tables, BANK_ACCOUNTS and BANK_TXNS with data imported, primary and foreign key constraints properly in place.

Make yourself familiar with the data set. It is stripped to basic information only to demonstrate the use case. See that table BANK_TXNS has a column AMOUNT.

select amount, count(*)
from bank_txns
group by amount
order by 1 desc;

Secure access to your data


There is a regulatory requirement that amounts higher than 5000 are only revealed to a specific user. We will use VPD to ensure that the requirement is met.

VPD is based on two things, a policy function and a policy. We define both as follows using the ADMIN user:

-- Policy function
CREATE OR REPLACE FUNCTION hide_big_txns (
  p_schema IN VARCHAR2,
  p_object IN VARCHAR2
)
RETURN VARCHAR2 AS
  l_predicate VARCHAR2 (200);
  l_user VARCHAR2(100);
BEGIN
  select user into l_user from dual;
  if l_user != 'GRAPHUSER' then
    l_predicate:='amount <= 5000';
  end if;
  RETURN (l_predicate);
END hide_big_txns;
/

-- Policy
BEGIN
  DBMS_RLS.ADD_POLICY (
    object_schema            => 'graphuser',
    object_name              => 'bank_txns',
    policy_name              => 'hide_big_txns_policy',
    function_schema          => 'admin',
    policy_function          => 'hide_big_txns',
    statement_types          => 'select',
    sec_relevant_cols        => 'amount',
    sec_relevant_cols_opt    => DBMS_RLS.ALL_ROWS
  );
END;
/

You can test, if your security requirement defined by VPD is met. Check it first for user GRAPHUSER, who should still be able to see transactions with amounts larger than 5000.

As GRAPHUSER run:

SELECT *
FROM bank_txns
WHERE amount > 5000
ORDER BY amount DESC
FETCH FIRST 5 ROWS ONLY;

Then as ADMIN user we set up a new user granting SELECT privileges to the tables owned by GRAPHUSER.

CREATE USER testuser IDENTIFIED BY <PWD>;
GRANT RESOURCE, CONNECT, CREATE SESSION, CREATE TABLE TO testuser;
ALTER USER testuser QUOTA UNLIMITED ON data;
GRANT SELECT ON graphuser.bank_txns TO testuser;
GRANT SELECT ON graphuser.bank_accounts TO testuser;

Now log in using TESTUSER and run the query again. No transactions should come back as result.

VPD and Graph combined


The next step is to verify that access to the data is also restricted if we use transaction information in a graph and query that graph using PGQL.

For the next steps we use Graph Studio, one of the tools available for Autonomous Databases. Log in to Graph Studio with the user GRAPHUSER.

Define your graph

Log into Graph Studio using GRAPHUSER. Create the following paragraphs and run them once the environment is attached to the Graph Studio session.

%pgql-rdbms
/* Create the graph. Rows in table BANK_ACCOUNTS become vertices, rows in table BANK_TXNS become edges of a graph named BANK_GRAPH_VPD. */
CREATE PROPERTY GRAPH bank_graph_vpd
    VERTEX TABLES (
        graphuser.bank_accounts
        KEY (acct_id)
        LABEL account
        PROPERTIES ( acct_id, name )
    )
    EDGE TABLES (
        graphuser.bank_txns
        KEY (txn_id)
        SOURCE KEY ( src_acct_id ) REFERENCES bank_accounts
        DESTINATION KEY ( dst_acct_id ) REFERENCES bank_accounts
        LABEL transfers
        PROPERTIES ( txn_id, amount, src_acct_id, dst_acct_id, description )
  ) OPTIONS (PG_PGQL)

PGQL Property Graphs and Virtual Private Database (VPD)
Create a graph using PGQL

Load the graph into memory

%python-pgx
# Load the graph into memory
GRAPH_NAME="BANK_GRAPH_VPD"
# try getting the graph from the in-memory graph server
graph = session.get_graph(GRAPH_NAME);
# if it does not exist read it into memory
if (graph == None) :
    session.read_graph_by_name(GRAPH_NAME, "pg_view")
    print("Graph "+ GRAPH_NAME + " successfully loaded")
    graph = session.get_graph(GRAPH_NAME)
else :
    print("Graph '"+ GRAPH_NAME + "' already loaded")

PGQL Property Graphs and Virtual Private Database (VPD)
Load the graph in memory using the Python API

Query the graph

%pgql-pgx
/* Show transactions querying the in-memory graph using PGQL */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd

PGQL Property Graphs and Virtual Private Database (VPD)
Query the in-memory representation of the graph using PGQL

%pgql-pgx
/* Show transactions with amount >= 5000 querying the in-memory graph */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
WHERE t.amount >= 5000

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL: Find transactions with amount >= 5000

You can see that as GRAPHUSER you are allowed to see all transaction amounts including the larger one.

How does it look like running the same PGQL queries as TESTUSER? According the the VPD policy, the amount > 5000 should be returned as NULL. Let us verify it.

Log out from Graph Studio and log in again, this time as TESTUSER. Create a notebook using with the first two paragraphs that GRAPHUSER used to create and load the graph.

%pgql-pgx
/*
 * Show transactions querying the in-memory graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amount should be higher than 5000.
 */
SELECT *
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount DESC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph with access restricted by VPD

You probably wonder, why the transactions with amounts > 5000 do not appear at all in the graph. The NULL values for the transaction amount are converted to 0.0 by loading the graph into memory. Hence you can reveal them by running the following PGQL query:

%pgql-pgx

/*
 * Show transactions querying the in-memory graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amount should be higher than 5000.
 */
SELECT s.acct_id as src_acct_id, t.amount, d.acct_id AS dst_acct_id
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount ASC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL against the in-memory graph with access restricted by VPD

If you would rather see the NULL displayed, you can also execute a PGQL query directly against the database, bypassing the graph loaded into memory. 

%pgql-rdbms
/*
 * Show transactions querying the database directly graph using PGQL.
 * Order the result by the transaction amount, the highest first.
 * According to the VPD policy, no amounts higher than 5000 should be displayed .
 */
SELECT s.acct_id as src_acct_id, t.amount, d.acct_id AS dst_acct_id
FROM MATCH (s IS ACCOUNT)-[t IS TRANSFERS]->(d IS ACCOUNT) ON bank_graph_vpd
ORDER BY t.amount DESC
FETCH FIRST 10 ROWS ONLY

PGQL Property Graphs and Virtual Private Database (VPD)
Query the graph using PGQL against the database graph with access restricted by VPD

How can you distinguish when the PGQL is executed against the in-memory graph and when against the database directly? Look at the interpreter specified in the first line of each paragraph.

  • %pgql-pgx executes a PGQL query against the in-memory graph
  • %pgql-rdbms executes a PGQL query directly against the database

Quod erat demonstrandum


Virtual Private Database policies restrict access to your data. They are also effective when accessing graphs built from the data. Setting up policies once and using them everywhere, including with graphs, is an excellent way to ensure that everybody can see and use what they intend to see and use, regardless of the way, they access the data.

Source: oracle.com