Tuesday, May 7, 2024

Transportable Binary XML – modern XML document storage in Oracle Database 23ai

Traditional XML storage in Oracle Database


In 2010, Oracle Database introduced native XML storage capabilities to optimize the storage and usage of XML documents with Oracle Database 11g (11.2.0.2). Oracle's approach for native XMLType storage then was purely document-centric, called Compact Schema-Aware XML (CSX) or Non-transportable Binary XML. This binary format was built on a central token table set approach where every XML document is encoded in binary format, and the keys to encode and decode the binary data are stored in a single, central dictionary. Such a design is excellent for the storage and performance within a single database. There is no duplication of work for encoding and decoding tokens, and there is always just one reference. Since hashing of tokens is one of the most expensive tasks, this approach saves a lot of time. However, as a natural consequence, every operation involving the decoding of binary XML data needs to access the central dictionary.

What is Transportable Binary XML


Although CSX succinctly compresses XMLType data, its use of a central dictionary (token table) and schema registries prevent an easy adoption for distributed and remote architectures, such as sharding, cross containers, remote database links, or even robust import/export capabilities. Any scenario where the central dictionary is not directly accessible imposes a challenge. To overcome these challenges, Oracle Database 23ai introduced Transportable Binary XML (TBX) as a storage option, a variant built on top of CSX but without the dependency of a central dictionary.

TBX is far more flexible and scalable without this dependency, enabling binary stored XML documents to be transportable and usable in distributed environments. Transportable Binary XML is the recommended method for storing XML documents natively in the Oracle Database beginning with Oracle Database 23ai. The following will walk you through how to use Transportable Binary XML and how to migrate earlier legacy binary XML storage formats.

Working with Transportable Binary XML


Beginning with Oracle Database 23ai, Transportable Binary XML is the default XML storage type. You do not need to explicitly specify the storage type of your XML columns. However, if you want to specify TBX explicitly, the DDL looks as follows:

CREATE TABLE tx( ID NUMBER, XMLDOC XMLTYPE) 
XMLTYPE COLUMN XMLDOC STORE AS TRANSPORTABLE BINARY XML;

To validate that a table has an XMLType column stored as TBX, you can just describe the table:

DESCRIBE TX 
Name                         Null?    Type 
------------------          -------- ----------------------------------------- 
ID                                         NUMBER 
XMLDOC                             SYS.XMLTYPE STORAGE TRANSPORTABLE BINARY

Transportable Binary XML provides full data processing compatibility with non-transportable binary XML storage. All XML operators and SQL generally continue to work like before without any behavior changes. Transportable Binary XML only changes how your XML documents are stored in the database and how XML documents are handled internally.

The only change of Transportable Binary XML is lifting existing constraints, enabling binary XML documents in modern architectures like Oracle Sharding.

Transportable Binary XML and Oracle Sharding


Sharding is a database architecture technique used to scale databases by partitioning data across multiple independent servers or instances, each holding a portion of the total data (shards). The shards are all coordinated by one instance (coordinator), where all the information is gathered and presented as a single database.

Transportable Binary XML – modern XML document storage in Oracle Database 23ai

This architecture is based on a shared-nothing architecture, meaning that each shard knows nothing about the other shards. Each shard could also be used as an independent, single database. Transportable binary XML as self-encapsulated, self-decoding/encoding binary storage on table level allows using a native XMLType with Sharding: it enables sending XML documents over the wire to any other database without limits. Transportable Binary XML is the only supported XMLType storage format in Oracle Sharding environments; specifying other XMLType storage types will throw an error.

To create a sharded table with an XMLType column using Transportable Binary XML, the DDL looks as follows:

CREATE SHARDED TABLE SHARDEDTAB
   ( SHDKEY NUMBER NOT NULL, 
     CTRY_CODE VARCHAR2(3) NOT NULL, 
     XMLDOC XMLTYPE, 
     CONSTRAINT SHARD_PK PRIMARY KEY (SHDKEY, CTRY_CODE)
    ) 
PARTITION BY LIST (CTRY_CODE) 
    ( PARTITION P_REG_1 VALUES (‘RG1’) TABLESPACE TBS1, 
      PARTITION P_REG_2 VALUES (‘RG2’) TABLESPACE TBS2, 
      PARTITION P_REG_3 VALUES (‘RG3’) TABLESPACE TBS3 
     ) 
XMLTYPE COLUMN XMLDOC STORE AS SECUREFILE TRANSPORTABLE BINARY XML;

Your table will be sharded across all participating databases (shards), using the new Transportable Binary XML as storage format for your XML documents without the need of a central repository. Needless to say that TBX is also supported in system or composite sharding environments.

Transportable Binary XML and Search Indexes


Hand in hand with TBX, Oracle Database 23ai introduced new, simplified SQL syntax for XML Search Indexes specifically for TBX. Whenever you require full-text and/or range-search capabilities over large, unstructured XML documents and you adopted the new Transportable Binary XML storage format, Oracle recommends that you use XML Search Index to index your XML data.

To create an XML search index for both full-text and range-search, you can use syntax similar to the following DDL:

CREATE SEARCH INDEX myXMLDocumentsSearchIndex 
ON myXMLDocumentsTable(myXMLTypeColumn)
FOR XML PARAMETERS ('SEARCH_ON TEXT_VALUE (BINARY_DOUBLE, VARCHAR2, TIMESTAMP)');

This example enables both the full-text and range-search components for the specified data types BINARY_DOUBLE, VARCHAR2, and TIMESTAMP.

If you are unsure about your workload and business requirements it is recommended to build the index for search and range scans for common data types in your XML documents.

The use cases of a search index for full text search and range-search using relational operators (>, <, <=, >=, or =) in an XMLExists expression within a SQL where clause are slightly different in nature of internal processing, so you can optimize your XML search index structure for your specific requirements.

If you are aiming solely for text search, you can create your XML search index by specifying the FOR XML clause in the CREATE SEARCH INDEX statement. As mentioned before, you can create such an XML search index only on XML documents stored as TBX and you must be granted the CTXAPP database role.

The syntax to create a search index to solely index documents for text search is as simple as the following example.

CREATE SEARCH INDEX myXMLDocumentsSearchIndex 
ON myXMLDocumentsTable(myXMLTypeColumn) 
FOR XML;

OR, using the more verbose syntax. You would only do this for clarity reasons; this is the default, as seen in the previous example:

CREATE SEARCH INDEX myXMLDocumentsSearchIndex 
ON myXMLDocumentsTtable(myXMLTypeColumn) 
FOR XML PARAMETERS (‘SEARCH_ON TEXT’);

If you are aiming to build your index solely for range scans within your XML documents, you simply specify the list of datatypes that you intend to search on:

CREATE SEARCH INDEX myXMLDocumentsSearchIndex 
ON MyXMLDocumentsTable(myXMLTypeColumn) 
FOR XML PARAMETERS ('SEARCH_ON VALUE (BINARY_DOUBLE, VARCHAR2, TIMESTAMP)');

Conversion to Transportable Binary XML


While it’s easy to see how to adopt TBX for future applications and systems, many existing applications built using XML storage in the Oracle Database will be upgraded to Oracle Database 23ai. It is recommended to migrate all your tables using a legacy XML storage format to TBX.

There are different options to do this migration:

◉ You can do a simple CTAS (Create Table As Select), casting the legacy XML datatype to TBX. This is the most pragmatic approach. However, it introduces downtime for your application and requires you to manually rebuild all dependent structures, such as indexes.

◉ Online Redefinition is similar to the CTAS approach, but enables your migration to happen in an ONLINE mode.

◉ DataPump has specific import parameters to do the storage migration at data import time, namely TRANSFORM = XMLTYPE_STORAGE_CLAUSE: '" TRANSPORTABLE BINARY XML"'.

◉ GoldenGate, Oracle’s flagship product for real-time data integration and replication in heterogeneous IT environments. This method is only applicable if you are moving your whole database. 

Source: oracle.com

Related Posts

0 comments:

Post a Comment