Friday, April 12, 2024

Proper SQL comes to MongoDB applications .. with the Oracle Database!

Proper SQL comes to MongoDB applications .. with the Oracle Database!

As some of you might know, an exciting part of my job is working with the Oracle Database API for MongoDB (short MongoDB API), a part of Oracle's converged database that brings MongoDB's document store API to the world's best database (personal opinion, no need to rebuff here but to discuss offline). No, it's not about Oracle wanting to be a MongoDB - they're probably better at chasing their niche - but to offer their simple and widely used document store APIs and framework integration just like Mongo does, together with all of Oracle's powerful support of all workloads and datatypes. And the integration continues.

MongoDB added a new operator $sql to their aggregation pipeline framework not too long ago (as of end of February 2024 it's currently still in beta), so we at Oracle figured, hey, we have SQL, too ;-). But unlike them, we've been doing SQL for quite some time, so why not support that operator and offer our customers the world of Oracle's powerful SQL within the realms of the MongoDB API? That's precisely what we did.

Use Oracle SQL with the Oracle MongoDB API and instantaneously benefit from Oracle's converged database.

What can I use it for?


  • Have data in classical relational tables that you want to share as a collection in your MongoDB app? We got you covered.
  • Have data in classical relational tables that you want to combine and process together with collections in your MongoDB application? We got you covered.
  • Want to leverage some advanced SQL functionality that is hard or impossible to do in MongoDB? We got you covered.
  • Have some procedural logic you want to integrate in an aggregation pipeline? We got you covered.
  • Need a pragmatic and straightforward way to deal with the ever-decreasing little things we have not gotten to implement without leaving the Mongo ecosystem? We got you covered with that, too.

Yes, you use Oracle SQL without leaving the world of MongoDB API and integrate and work jointly with relational data and MongoDB collections side-by-side.

A Quick Walk-Through


Let me give you some simple examples to give you a glimpse of what's doable here. The following are simple mongosh examples for illustration, but needless to say that any integration - like using bind variables - can be fully embedded in your application.

Expose relational data

Suppose you just simply want to expose data coming from your relational core system as a read only collection without persisting and periodically updating the data as Mongo collection. No need to copy or transfer data: you just select the information dynamically with SQL from the pure relational structures and put it in your application.

db.aggregate([{$sql: `
             select fiscal_year, fiscal_quarter_number, sum(amount_sold) as sum_amount
             from sh.times t join sh.sales s on (t.time_id = s.time_id)
             group by fiscal_year, fiscal_quarter_number order by 1,2`}
])

In real world this is often embedded in the context of an application and previous filters and values, so let's use bind variables to limit the result set in your application. Using binds just like in JDBC helps to improve performance and prevents any sort of SQL injection:

db.aggregate([ {$sql: 
                  { statement: ` 
                      select 
                         fiscal_year, fiscal_quarter_number, sum(amount_sold) as sum_amount 
                      from sh.times t join sh.sales s on (t.time_id = s.time_id) 
                      where fiscal_year = :1 
                      group by fiscal_year, fiscal_quarter_number 
                      order by 1,2`, 
                    binds: [
                        { index: 1, value: 2019}
                        ]
                    }}
])

That just works fine in any Oracle Database 19c and above, on-premises and with Autonomous Database.

Simple lookup with relational data

Let's make things a bit more interesting: you're running your Mongo application for a specific business unit that now wants to augment their collections with data from common corporate entitities, stored centrally in your enterprise database. You can do so with a simple "$sql lookup" - join in the relational lingo - and add as many additional common attributes as you like. Within an aggregation pipeline, you are using $sql just like any other stage that consumes the input documents from the previous stage and produces documents for subsequent stages.

Since I am an Oracle person for a long time and there are many of us out there, I assume - in fact, hope - that some of you are reading this blog. I also assume that some of you have heard about EMP and DEPT, one of the oldest relational examples out there. I figured I am just using this to illustrate the functionality I am talking about. Yes, you can use this schema everywhere.

Let's first JSON-ize our EMP table and consider this our document collection (we name it empJSON), with a referencing model linking to our purely relational table DEPT. We want to expose our employee information (excluding salary) in a simple web application. Let's use mongosh for that:

jason> db.aggregate([{$sql:`
                         select json{empno, ename, job, mgr, hiredate, deptno} from emp`
                      },
...                   {$out: "empJSON"}])

jason> db.empJSON.findOne()
{
  _id: ObjectId('65ea34720af5351d8f7bf901'),
  empno: 7839,
  ename: 'KING',
  job: 'PRESIDENT',
  mgr: null,
  hiredate: ISODate('1981-11-17T00:00:00.000Z'),
  deptno: 10
}

That wasn't too hard, was it?

However, we want to not show the department number, but the name of the department in our app. And for some reason, we don't want to persist the department name in our stored collection. So let's just look it up in realtime using a $sql stage whenever we need the result, joining our collection with our purely relational table.

jason> db.empJSON.aggregate([{$sql: `
                                select json_transform(e.data, set '$.dname' = d.dname, remove '$.deptno') 
                                from input e, dept d 
                                where e.data.deptno.number() = d.deptno`
                               },
...                            {$limit: 1}])
[
  {
    _id: ObjectId('65ea34720af5351d8f7bf901'),
    empno: 7839,
    ename: 'KING',
    job: 'PRESIDENT',
    mgr: null,
    hiredate: ISODate('1981-11-17T00:00:00.000Z'),
    dname: 'ACCOUNTING'
  }
]

What you see in this little example is how we integrated the $sql stage transparently into Mongo's aggregation pipeline framework: collections produced by previous stages are represented as a JSON collection table INPUT with a single column DATA, containing your documents. We simply joined our collection empJSON with the relational table DEPT, added the field 'dname' and removed the unnecessary field 'deptno'. Mission accomplished, EMP and DEPT are now officially a part of MongoDB demos.

Using the aggregation pipeline in such a manner requires Oracle Database 23c.

Leverage analytics and encapsulated business logic

So you are as savvy in SQL as you are in MongoDB lingo? Choose what you do best and fastest. The following is a rather simple example that aggregates and ranks your yearly gross revenue with your movies using SQL, sorts the data and gives us the key attributes for the top ten.

db.movies.aggregate([
          {$match: {year:2019, gross : {$ne: null}}},
          {$sql:`
              select json_mergepatch(i.data, json {'rank': rank() over (order by i.data."gross" desc)})
              from input i`},
          {$project: { rank: 1, year: 1, title: 1, gross: 1, "_id": 0 }},
          {$match: {rank : {$le : 10}}},
          {$sort: {rank: 1}}
])

In this simple example, we are doing the ranking in SQL, but the sorting and limiting to the top ten in the aggregation pipeline, just to show the interchangeability. We could have done everything in SQL (or the Mongo aggregation pipeline for that matter), but decided to only do the ranking (and implicitly required sorting) in Oracle, letting Oracle's enterprise performance features loose to munge through the data. 

But wait. 

Our finance department had worked hard on our magical global financial gross adjustment that is applied everywhere and encapsulated in SQL. What now? Well, just add a pipeline stage and apply the magic to your Mongo collection, calculate the right number, and add the adjusted gross revenue to your collection:

db.movies.aggregate([{$sql: `
                      select json_mergepatch(i.data,
                             json{'adjGross':adjust_gross(i.data.gross.number())})
                      from input i, dual`}
])

Bridge the gaps

Last but not least, you can use SQL for everything that Oracle MongoDB API does not support. As briefly mentioned before, our vision and aim is not to be a Me-Too Mongo. Our vision is the enterprise and the completeness of a converged database, supporting any datatype with any workload. There will most likely always be gaps in functionality as long as MongoDB and Oracle exist.

One of the most prominent gaps as of today is the lack of index creation through the MongoDB API in Oracle Database 19c. Prior to the $sql operator, you had to leave the MongoDB eco system, connect with a SQL tool, and create any index from there. With the introduction of the $sql operator, you still use SQL, but there is no need to leave the MongoDB eco system. Just use the $sql operator in Oracle Database 19c, and bridge this gap pragmatically for now, until Oracle Database 23c is on your radar. Oracle Database 23c supports index creation through the MongoDB API, but prior to that - like in Autonomous Database - our $sql stage comes to the rescue.

db.aggregate([{ $sql: `
                create index i_movies_sku
                on movies(json_value(data, '$.sku.stringOnly()' ERROR ON ERROR))`}
])

You will see the successful index creation right afterwards:

jason> db.movies.getIndexes()
[
  {
    name: 'I_MOVIES_SKU',
    indexNulls: false,
    unique: false,
    v: 2,
    key: { 'sku.stringOnly()': 1 },
    ns: 'jason.movies'
  },
  { v: 2, key: { _id: 1 }, ns: 'jason.movies', name: '_id_' }

]

See the index at work:

jason> db.movies.find({"sku":"NTV55017"}).explain()
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'jason.movies',
    indexFilterSet: false,
    parsedQuery: { sku: { '$stringOnly': 'NTV55017' } },
    rewrittenQuery: { sku: { '$stringOnly': 'NTV55017' } },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'TABLE ACCESS',
        options: 'BY INDEX ROWID BATCHED',
        source: 'MOVIES',
        columns: '"MOVIES"."ID"[RAW,4000], "CREATED_ON"[TIMESTAMP,11], "LAST_MODIFIED"[TIMESTAMP,11], "VERSION"[VARCHAR2,255], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200]',
        inputStage: {
          stage: 'INDEX',
          options: 'RANGE SCAN',
          source: 'I_MOVIES_SKU',
          columns: `"MOVIES".ROWID[ROWID,10], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.sku.stringOnly()' RETURNING VARCHAR2(4000) ERROR ON ERROR)[VARCHAR2,4000]`,
          filterType: 'access',
          filter: `JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.sku.stringOnly()' RETURNING VARCHAR2(4000) ERROR ON ERROR)=:1`,
          path: "$.sku.stringOnly()'"
        }
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}

That's quite a list of cool things you can do now, isn't it?

Source: oracle.com

Related Posts

0 comments:

Post a Comment