Wednesday, July 29, 2020

Multimedia Database

Multimedia Database, Oracle Database Tutorial and Materials, Oracle Database Exam Prep

Multimedia database is the collection of interrelated multimedia data that includes text, graphics (sketches, drawings), images, animations, video, audio etc and have vast amounts of multisource multimedia data. The framework that manages different types of multimedia data which can be stored, delivered and utilized in different ways is known as multimedia database management system. There are three classes of the multimedia database which includes static media, dynamic media and dimensional media.

Content of Multimedia Database management system :

1. Media data – The actual data representing an object.

2. Media format data – Information such as sampling rate, resolution, encoding scheme etc. about the format of the media data after it goes through the acquisition, processing and encoding phase.

3. Media keyword data – Keywords description relating to the generation of data. It is also known as content descriptive data. Example: date, time and place of recording.

4. Media feature data – Content dependent data such as the distribution of colors, kinds of texture and different shapes present in data.

Types of multimedia applications based on data management characteristic are :

1. Repository applications – A Large amount of multimedia data as well as meta-data(Media format date, Media keyword data, Media feature data) that is stored for retrieval purpose, e.g., Repository of satellite images, engineering drawings, radiology scanned pictures.

2. Presentation applications – They involve delivery of multimedia data subject to temporal constraint. Optimal viewing or listening requires DBMS to deliver data at certain rate offering the quality of service above a certain threshold. Here data is processed as it is delivered. Example: Annotating of video and audio data, real-time editing analysis.

3. Collaborative work using multimedia information – It involves executing a complex task by merging drawings, changing notifications. Example: Intelligent healthcare network.

There are still many challenges to multimedia databases, some of which are :

1. Modelling – Working in this area can improve database versus information retrieval techniques thus, documents constitute a specialized area and deserve special consideration.

2. Design – The conceptual, logical and physical design of multimedia databases has not yet been addressed fully as performance and tuning issues at each level are far more complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not easy to convert from one form to another.

3. Storage – Storage of multimedia database on any standard disk presents the problem of representation, compression, mapping to device hierarchies, archiving and buffering during input-output operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.

4. Performance – For an application involving video playback or audio-video synchronization, physical limitations dominate. The use of parallel processing may alleviate some problems but such techniques are not yet fully developed. Apart from this multimedia database consume a lot of processing time as well as bandwidth.

5. Queries and retrieval – For multimedia data like images, video, audio accessing data through query opens up many issues like efficient query formulation, query execution and optimization which need to be worked upon.

Areas where multimedia database is applied are :

◉ Documents and record management: Industries and businesses that keep detailed records and variety of documents. Example: Insurance claim record.

◉ Knowledge dissemination: Multimedia database id a very effective tool for knowledge dissemination in terms of providing several resources. Example: Electronic books.

◉ Education and training: Computer-aided learning materials can be designed using multimedia sources which are nowadays very popular sources of learning. Example: Digital libraries.

◉ Marketing, advertising, retailing, entertainment and travel. Example: a virtual tour of cities.

◉ Real-time control and monitoring: Coupled with active database technology, multimedia presentation of information can be very effective means for monitoring and controlling complex tasks Example: Manufacturing operation control.

Monday, July 27, 2020

Oracle Database Clusters

Overview of Clusters


Clusters are an optional method of storing table data. A cluster is a group of tables that share the same data blocks because they share common columns and are often used together. For example, the employees and departments table share the department_id column. When you cluster the employees and departments tables, Oracle physically stores all rows for each department from both the employees and departments tables in the same data blocks.

Figure 1 shows what happens when you cluster the employees and departments tables:

Figure 1 Clustered Table Data

Oracle Database Clusters, Oracle Database Tutorial and Material, Oracle Database Guides, DB Exam Prep

Because clusters store related rows of different tables together in the same data blocks, properly used clusters offers these benefits:

- Disk I/O is reduced for joins of clustered tables.

- Access time improves for joins of clustered tables.

- In a cluster, a cluster key value is the value of the cluster key columns for a particular row. Each cluster key value is stored only once each in the cluster and the cluster index, no matter how many rows of different tables contain the value. Therefore, less storage is required to store related table and index data in a cluster than is necessary in nonclustered table format. For example, in Figure 1, notice how each cluster key (each department_id) is stored just once for many rows that contain the same value in both the employees and departments tables.

Overview of Hash Clusters


Hash clusters group table data in a manner similar to regular index clusters (clusters keyed with an index rather than a hash function). However, a row is stored in a hash cluster based on the result of applying a hash function to the row's cluster key value. All rows with the same key value are stored together on disk.

Hash clusters are a better choice than using an indexed table or index cluster when a table is queried frequently with equality queries (for example, return all rows for department 10). For such queries, the specified cluster key value is hashed. The resulting hash key value points directly to the area on disk that stores the rows.

Hashing is an optional way of storing table data to improve the performance of data retrieval. To use hashing, create a hash cluster and load tables into the cluster. Oracle physically stores the rows of a table in a hash cluster and retrieves them according to the results of a hash function.

Oracle Database Clusters, Oracle Database Tutorial and Material, Oracle Database Guides, DB Exam Prep
Sorted hash clusters allow faster retrieval of data for applications where data is consumed in the order in which it was inserted.

Oracle uses a hash function to generate a distribution of numeric values, called hash values, which are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, Oracle applies the hash function to the row's cluster key value. The resulting hash value corresponds to a data block in the cluster, which Oracle then reads or writes on behalf of the issued statement.

A hash cluster is an alternative to a nonclustered table with an index or an index cluster. With an indexed table or index cluster, Oracle locates the rows in a table using key values that Oracle stores in a separate index. To find or store a row in an indexed table or cluster, at least two I/Os must be performed:

- One or more I/Os to find or store the key value in the index

- Another I/O to read or write the row in the table or cluster

Thursday, July 23, 2020

How to insert an image in to Oracle database using Java program?

Oracle Database Tutorial and Material, Oracle Database Exam Prep

To hold an image in Oracle database generally blob type is used. Therefore, make sure that you have a table created with a blob datatype as:

Name Null? Type
----------------------------------------- -------- ----------------------------
NAME VARCHAR2(255)
IMAGE BLOB
To insert an image in to Oracle database, follow the steps given below:

Step 1: Connect to the database


You can connect to a database using the getConnection() method of the DriverManager class

Connect to the Oracle database by passing the Oracle URL which is jdbc:oracle:thin:@localhost:1521/xe (for express edition), username and password as parameters to the getConnection() method.

String oracleUrl = "jdbc:oracle:thin:@localhost:1521/xe";
Connection con = DriverManager.getConnection(oracleUrl, "user_name", "password");

Step 2: Create a Prepared statement


Create a PreparedStatement object using the prepareStatement() method of the Connection interface. To this method pass the insert query (with place holders) as a parameter.

PreparedStatement pstmt = con.prepareStatement("INSERT INTO MyTable VALUES(?, ?)");

Step 3: Set values to the place holders


Set the values to the place holders using the setter methods of the PreparedStatement interface. Chose the methods according to the datatype of the column. For Example if the column is of VARCHAR type use setString() method and if it is of INT type you can use setInt() method.

And if it is of Blob type you can set value to it using the setBinaryStream() or setBlob() methods. To these methods pass an integer variable representing the parameter index and an object of InputStream class as parameters.

pstmt.setString(1, "sample image");
//Inserting Blob type
InputStream in = new FileInputStream("E:\\images\\cat.jpg");
pstmt.setBlob(2, in);

Step 4: Execute the statement


Oracle Database Tutorial and Material, Oracle Database Exam Prep
Execute the above created PreparedStatement object using the execute() method of the PreparedStatement interface.

Example

import java.io.FileInputStream;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
public class InsertImageToOracleDB {
   public static void main(String args[]) throws Exception{
      //Registering the Driver
      DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver ());
      //Getting the connection
      String oracleUrl = "jdbc:oracle:thin:@localhost:1521/xe";
      Connection con = DriverManager.getConnection(oracleUrl, "system", "password");
      System.out.println("Connected to Oracle database.....");
      PreparedStatement pstmt = con.prepareStatement("INSERT INTO MyTable VALUES(?,?)");
      pstmt.setString(1, "sample image");
      //Inserting Blob type
      InputStream in = new FileInputStream("E:\\images\\cat.jpg");
      pstmt.setBlob(2, in);
      //Executing the statement
      pstmt.execute();
      System.out.println("Record inserted");
   }
}

Output


Connected to Oracle database.....
Record inserted.....

Wednesday, July 22, 2020

Oracle Database client libraries for Java now on Maven Central

Oracle Database, Oracle Database Study Materials, Oracle Database Learning

Oracle has published its Oracle Database JDBC client libraries on Maven Central. From now on you can find Oracle Database related jar files under the com.oracle.database group id. You will find all libraries from version 11.2.0.4 (e.g. ojdbc6) to 19.3.0 (e.g. ojdbc10).

Going forward, Oracle will use Maven Central as one of the primary distribution mechanisms for Oracle Database Java client libraries, meaning that you will also be able to find new versions of these libraries on Maven Central in the future.

To get the latest Oracle Database JDBC driver, use the following dependency GAV in your Maven POM file:

<dependency>
  <groupId>com.oracle.database.jdbc</groupId>
  <artifactId>ojdbc10</artifactId>
  <version>19.3.0.0</version>
</dependency>

The com.oracle.database group id has the following subgroups:

◉ oracle.database.jdbc: this group contains all JDBC libraries

     - ojdbc[N].jar:

The Oracle Database JDBC driver compiled with Java [N]

     - ucp.jar:

The Universal Connection Pool for JDBC

    - ojdbc[N]dms.jar:

The Oracle Database JDBC driver compiled with Java [N] including the Dynamic Monitoring System (DMS)

Note: ojdbc8dms.jar and ojdbc10dms.jar contain the instrumentation to support the Dynamic Monitoring System (DMS) and limited support for java.util.logging.

◉ oracle.database.jdbc.debug: this group contains all JDBC debug libraries

◉ oracle.database.security: this group contains all Security libraries for Oracle wallet and more

     - oraclepki.jar:

The Oracle PKI provider used for Oracle wallets

     - osdt_cert.jar:

Certificate management components used for Oracle wallets

     - osdt_core.jar:

Core components between oraclepki.jar and osdt_cert.jar

◉ oracle.database.ha: this group contains all High Availability libraries

     - ons.jar:

Oracle Notification System library

     - simplefan.jar:

Simple Fast Application Notification library

◉ oracle.database.nls: this group contains the Internationalization library

     - orai18n.jar:

orainternationalization.jar --> orai - 18 letters in between - n.jar

◉ oracle.database.xml: this group contains all XML and XML DB related libraries

     - xdb.jar, xdb6.jar

Support for the JDBC 4.x standard java.sql.SQLXML interface

     - xmlparserv2.jar

The Oracle Database XML Parser library, including APIs for:

        - DOM and Simple API for XML (SAX) parsers
        - XML Schema processor
        - Extensible Stylesheet Language Transformation (XSLT) processor
        - XML compression
        - Java API for XML Processing (JAXP)
        - Utility functionality such as XMLSAXSerializer and asynchronous DOM Builder

Note: xdb6.jar is a legacy name, xdb.jar is the new name.

◉ oracle.database.observability: this group contains the Observability library

     - dms.jar:

The Oracle Database Dynamic Monitoring System (DMS)

◉ oracle.database.soda (coming soon!): this group contains the Simple Oracle Document Access driver for Oracle Database

◉ oracle.database.messaging (coming soon!): this group contains the Advanced Queuing Java Messaging Service driver for Oracle Database

With this change, Oracle made it easier than ever before for developers and users alike to consume the Oracle Database Java client libraries.

Monday, July 20, 2020

Persistent Memory Primer

The introduction of Persistent Memory (PMEM) marks the beginning of a revolution in the computing industry. There has always been a separation between system memory where the contents are ephemeral and byte addressable, and storage where the data is persistent and block oriented. Persistent Memory (such as Intel Optane DC Persistent Memory) blurs the line between storage and memory by being both byte addressable as well as persistent.

This new class of Non-Volatile Memory is fast enough to operate alongside conventional (volatile) DRAM in a DIMM (Dual In-Line Memory Module) form factor. Integrating into systems in DIMM slots means that Persistent Memory is able to play a vastly different role than conventional block-oriented storage such as Hard Disk Drives (HDD) or Solid State Disk (SSD).

In this first of a multi-part series on Persistent Memory, we will cover the fundamentals of Persistent Memory and how this technology works under the covers.

3D XPoint Non-Volatile Memory


3D XPoint is the underlying silicon of Intel Optane products and is a new form of non-volatile memory that is different from the Flash-based solutions that have become prevalent in storage products.  NAND Flash or simply Flash is named after the NAND logic gate and is primarily used in memory cards, in USB Flash drives, and in Solid State Drives (SSD) for general purpose storage. 3D XPoint is also available as memory DIMMs as well as in an SSD format branded as Intel Optane SSD.


3D XPoint is faster than NAND Flash, but slower than DRAM. 3D XPoint is also more expensive (by capacity) than Flash but less expensive than DRAM. These 2 factors of cost and speed place 3D XPoint between Flash and DRAM. There is also a set of API and programming standards for Persistent Memory that will support other Non-Volatile (NV) Memory technologies beyond 3D XPoint. Persistent Memory documentation is published at http://pmem.io.

This blog post is focused on Intel Optane DC Persistent Memory, but as previous mentioned, Intel Optane is also available in an SSD form. Optane SSD is faster than NAND Flash SSD and can be used in the conventional manner as storage. 

Speed!


Put plainly, Persistent Memory is fast! It's dramatically faster than spinning disk or Flash SSD, but still a bit slower than volatile memory (DRAM). To put this in perspective, consider the following time measures:

◉ Millisecond = 1/1,000 second
◉ Microsecond = 1/1,000,000 second
◉ Nanosecond = 1/1,000,000,000 second

Data access times (often referred to as latency) for current generation DRAM is in the range of 80-100 nanoseconds, while spinning disk when configured properly should perform in the range of single-digit (1-9) milliseconds. Persistent Memory access times for 64-bytes is around 300 nanoseconds. While this is about 3 times slower than DRAM, it's more than 3,000 times faster than spinning disk.


Performance of Flash SSD falls between these two extremes, and varies based on the configuration and workload. For example, local Flash SSD attached directly to a server will deliver lower latency (faster access) than a server accessing Flash SSD contained in a storage array, although it isn't sharable by multiple servers. Performance also depends on the amount of system load (such as number of users) as well as the type of application (such as OLTP vs. Data Analytics), so performance results will vary.

In a Flash SSD shared storage configuration delivering 200 microsecond I/O latency (0.2 milliseconds), data access is at least 5X faster than spinning disk (assuming 1 millisecond disk access time). However, Persistent Memory data access (64 bytes) at 300 nanoseconds, is more than 600 times faster than Flash SSD (such as an all-flash array).

PMEM vs. Disk Performance


Using traditional disk (HDD) as a baseline, we can clearly see the dramatic performance increases that have come with new technologies, as well as the difference between shared and direct attached storage. The following table shows typical I/O response (or latency) with different technologies. The specific performance on any given system will vary, but this table illustrates the general performance characteristics.


Notice that memory performance is measured in nanoseconds (one billionth of a second), while hard disk drive performance has typically been measured in milliseconds (one thousandth of a second) which is a 10,000X difference in performance.  Notice also that memory is accessed at the byte level, whereas storage (HDD and SSD) is typically accessed in terms of blocks (8K in this example).

Persistent Memory might also be used with a block-oriented application, but it's quite easy to see this doesn't take full advantage of the performance benefits. It's only when the application is specifically designed to work at the byte level that we see the full performance advantages of PMEM. The main challenge in developing applications (such as databases!) to work with PMEM is the granularity of atomicity, which refers to how PMEM persists data as we will see later. Of course the order in which writes occur is also critical to ensuring data integrity. Before we delve into the finer details, it is important to understand the multiple configuration options are that are available.

PMEM Configuration Options


Persistent Memory can be used in multiple configurations, each of which suits a particular need.  The two major configuration options for configuring PMEM are known as "AppDirect Mode" and "Memory Mode", and there are multiple options within AppDirect.


Memory Mode


Persistent Memory can be used in what is known as Memory Mode, where the system uses a combination of PMEM and DRAM and the O/S automatically migrates "hot" blocks to the faster DRAM and "colder" blocks to larger but slower PMEM. The PMEM is essentially used for its larger capacity rather than for purposes of data persistence. Memory Mode operates transparently to the application, and is supported for running Oracle Databases. The majority of this blog is focused on AppDirect, which enables applications to use the Persistent Memory directly rather than simply using it as a larger system memory.

AppDirect Mode


PMEM configured in AppDirect mode begins with a Namespace configuration, which is similar to a logical volume manager for Persistent Memory.  Namespaces are configured on Linux systems using the "ndctl" utility, with 4 options of fsdax, sector, devdax, and raw. File systems can then be mounted on fsdax or sector namespaces. There are 2 layers to understand, which are the Namespace or device configuration, as well as the filesystem layer that is mounted on top of the device.

Namespace Configuration


The ndctl utility is used to configure non-volatile memory devices on the Linux Platform. The command "ndctl list -N" will display a listing of configured namespaces. The following graphic shows fsdax and sector namespaces and their configuration details.


Direct Access (or DAX) refers to the ability to directly access Persistent Memory contents without copying data into buffers in DRAM. The fsdax configuration uses an 8-byte atomicity, whereas sector can use 512 byte or 4,096 byte atomicity. Once the device level is configured, a file system can be mounted on the device.

File System Configuration


On the Linux platform, XFS and ext4 file systems can be mounted for DAX operation on fsdax or sector Namespaces using the -o dax option of the mount command.  There are 3 configuration options for filesystems on top of Namespaces as follows:

◉ fsdax namespace with DAX filesystem (8-byte atomicity)
◉ fsdax namespace with non-DAX filesystem (8-byte atomicity)
◉ sector namespace with non-DAX filesystem (512-byte or 4,096-byte sector atomicity)

The Direct Access (DAX) operation is important for performance, and the application must tolerate the level of atomicity provided by the Persistent Memory device.

8-Byte Atomicity


Persistent Memory (such as Intel Optane DC Persistent Memory) natively operates byte-by-byte rather than in blocks of data like conventional storage. Data is persisted in chunks of 8-bytes at most in Persistent Memory (again, using the default behavior). For applications (like databases) that are based on a BLOCK construct, the 8-byte atomicity of Persistent Memory can be an issue. Writing 8,192 bytes of data (an 8K block) will get persisted in 1,024 chunks of 8-bytes each. Power failure or other abnormal condition can leave blocks of data "fractured" or "torn" into pieces, with portions of the block containing old data and other portions with new data. Applications (such as databases) need to be changed to tolerate this sort of fracturing or tearing of blocks.  Otherwise, these are effectively corrupted blocks of data.

Sector Atomicity


Persistent Memory can be configured for sector level atomicity by defining Namespaces with the "sector" option, which is also known as BTT (Block Translation Table) mode. Tearing of blocks can still occur if the application (i.e. database) block size does not match the block size defined in the Namespace or if there is a misalignment between database block and Persistent Memory sector boundaries. While conventional Disk and Flash SSD devices might have a different block size than the application, there is often a battery-backed controller cache that eliminates the block fracturing problem.

It's interesting to note that applications often use the word "block", conventional disk devices use what are known as "sectors" (sectors of spinning disk), while in an O/S context we often use the word "page" to refer to a unit of storage or memory. These are all 3 terms that essentially refer to the same concept, and they converge when we are talking about Persistent Memory. Blocks of data in an application (or database) map to O/S pages, map to sectors on storage.

In cases with a mismatch of application block size and Namespace sector size, we would still expect some degree of block fracturing (or tearing in the PMEM terminology) because of the lack of controller cache. Ideally, the application (such as a database) should be made tolerant of 8-byte atomicity because it would be very simple to mis-configure the application block size and Namespace sector size or have a misalignment of blocks to sectors that causes tearing or corruption. Data integrity is paramount for a company such as Oracle, so please continue reading this article for more information.

Application Layer: Block-Based Conventional Storage I/O


Persistent storage such as Disk and Flash SSD has always been block-oriented, and applications (such as databases) have been developed to work in this manner. Applications have traditionally been developed to read and write blocks of data in storage. Applications (or system software) handle buffering of data into memory where it is used and manipulated, then saved (or persisted) back to the storage as blocks once the application is done using it as shown in the diagram below.


Applications that are block-oriented (such as databases) will typically use C/C++ read() and write() system calls to access data in block-oriented storage. Block oriented applications expect to read complete blocks from storage and they are designed to write entire blocks to storage. The storage normally guarantees atomicity (success or failure) of writing the entire block, and this is normally backed by a non-volatile cache in the disk controller of modern systems.

Application Layer: Byte-Oriented Applications


Memory is fundamentally byte oriented, and Persistent Memory (such as Intel Optane DC Persistent Memory) is designed to be used by applications in a byte-oriented manner as well.  Applications can access data directly (in-place) with Persistent Memory just like data that has been placed by the application into DRAM.  The C/C++ function mmap() on Linux maps files in Persistent Memory into the application's address space. If the filesystem is mounted using the DAX option (mount -o dax), system buffers are eliminated and the application operates directly on the data residing in Persistent Memory as shown in the following diagram:


Mapping file contents into the application address space and accessing the data directly on Persistent Memory eliminates the copying of data using C/C++ functions such as memcpy(), which eliminates code-path in the processing of data and therefore improves performance. For example, databases will typically copy data into a shared memory segment (like the Oracle SGA) for access by multiple users on a system. Eliminating copying of data obviously has implications for speed of data access, but also related to the handling of atomicity as well shall see later.

DAX: Direct Access to PMEM


The highest level of performance with Persistent Memory comes when an application is designed to work directly against the data without copying it as shown in the diagram below. Accessing 64 bytes of data takes approximately 300 nanoseconds, which is dramatically faster than reading from Solid State Disk (SSD).


Block Oriented Processing


Applications that are designed for block oriented data access will copy data into memory as shown in the diagram below. Users of Persistent Memory should typically see about 6,000 nanosecond access time for 8K of data. While this is much faster than even locally attached Flash SSD, it's not nearly as fast as DRAM or byte-oriented data access on PMEM.


In this diagram, the entire block is being read from PMEM into DRAM, which is functionally the same as Flash SSD or spinning disk from a READ perspective.  However, writing of data brings some big data integrity implications for a block-oriented application.  Of course databases have traditionally been block-oriented, and this certainly applies to the Oracle database.

Implications for Block Oriented Applications


Persistent Memory is fundamentally different from HDD and SSD because of the way data gets written (or persisted).  An application might write a block of data, but this write() operation gets executed in pieces.  Any failure during the write (power failure, system kernel panic, abnormal shutdown, etc.) will result in fracturing or tearing (i.e. corruption) of blocks shown in the following diagram:


A block of data that's partially written will typically mean data corruption for an application that is block-oriented. The application (such as a database) expects the storage to either succeed or fail the ENTIRE write operation rather than leaving it partially completed. This behavior of the storage is called "atomicity" and the granularity of atomicity is critical. Any mismatch in the size of write vs. granularity of atomicity will result in corruption. Conventional disk storage has typically gotten around this issue using a disk cache, which ensures block-level atomicity.

Alert: File Systems and Devices on PMEM


All current versions of the Oracle Database as of this writing are susceptible to corruption in these configurations as outlined in this My Oracle Support Note: File Systems and Devices on Persistent Memory (PMEM) in Database Servers May Cause Database Corruption (Doc ID 2608116.1) viewable here: https://support.oracle.com/epmos/faces/DocumentDisplay?id=2608116.1  The issues underlying this document do not apply to Persistent Memory operating in what is known as Memory Mode or in the SSD form-factor.


An upcoming feature of Oracle Database will take full advantage of the performance benefits of PMEM, as well as addressing the data corruption issues discussed above.  In addition, an upcoming backport of changes into Oracle Database 19c will address the data corruption issues that are the subject of this MOS note. There are no known file systems, device drivers, or other solutions to this data corruption problem.

The Software Engineering Challenge


As we have seen above, the potential performance gains of Persistent Memory are absolutely stunning, with more than 3,000X faster data access, but taking advantage of those performance gains poses some unique software engineering challenges. The primary "reason for being" of databases is data integrity, so solving these challenges is absolutely fundamental for the Oracle Development organization. While it's possible to retrofit Persistent Memory into an existing environment, this approach also includes data integrity issues as we have seen above. At Oracle, we believe the best approach is to design the Oracle Database and Exadata software to fully integrate with Persistent Memory to gain the maximum performance benefits AND to achieve our primary goal of protecting customers data.

Persistent Memory in Exadata X8M


Exadata X8M uses Persistent Memory in the storage tier to accelerate database block I/O, as well as to accelerate commit-related process in the database transaction log. Persistent Memory is used in AppDirect mode in Exadata, and the Exadata software is designed to address the data integrity issues mentioned above by incorporating software logic that is compatible with the 8-byte atomicity of Persistent Memory. Exadata storage also keeps 2 or 3 redundant copies of all data (depending on redundancy configuration) across storage servers for fault tolerance. 

Source: oracle.com

Saturday, July 18, 2020

File Handling From PL/SQL

Oracle Database Tutorial and Materials, Oracle Database Study Materials, Oracle Database Exam Prep

Using a Java stored procedure it is possible to manipulate operating system files from PL/SQL.

◉ Create the Java Stored Procedure


First we need to create the Java class to perform all file manipulation using the Java File Class.

CREATE OR REPLACE AND COMPILE JAVA SOURCE NAMED "FileHandler" AS
import java.lang.*;
import java.util.*;
import java.io.*;
import java.sql.Timestamp;

public class FileHandler
{
  private static int SUCCESS = 1;
  private static  int FAILURE = 0;
 
  public static int canRead (String path) {
    File myFile = new File (path);
    if (myFile.canRead()) return SUCCESS; else return FAILURE;
  }

  public static int canWrite (String path) {
    File myFile = new File (path);
    if (myFile.canWrite()) return SUCCESS; else return FAILURE;
  }

  public static int createNewFile (String path) throws IOException {
    File myFile = new File (path);
    if (myFile.createNewFile()) return SUCCESS; else return FAILURE;
  }

  public static int delete (String path) {
    File myFile = new File (path);
    if (myFile.delete()) return SUCCESS; else return FAILURE;
  }

  public static int exists (String path) {
    File myFile = new File (path);
    if (myFile.exists()) return SUCCESS; else return FAILURE;
  }

  public static int isDirectory (String path) {
    File myFile = new File (path);
    if (myFile.isDirectory()) return SUCCESS; else return FAILURE;
  }

  public static int isFile (String path) {
    File myFile = new File (path);
    if (myFile.isFile()) return SUCCESS; else return FAILURE;
  }

  public static int isHidden (String path) {
    File myFile = new File (path);
    if (myFile.isHidden()) return SUCCESS; else return FAILURE;
  }

  public static Timestamp lastModified (String path) {
    File myFile = new File (path);
    return new Timestamp(myFile.lastModified());
  }

  public static long length (String path) {
    File myFile = new File (path);
    return myFile.length();
  }
 
  public static String list (String path) {
    String list = "";
    File myFile = new File (path);
    String[] arrayList = myFile.list();
   
    Arrays.sort(arrayList, String.CASE_INSENSITIVE_ORDER);
   
    for (int i=0; i < arrayList.length; i++) {
      // Prevent directory listing expanding if we will blow VARCHAR2 limit.
      if ((list.length() + arrayList[i].length() + 1) > 32767)
        break;
       
      if (!list.equals(""))
        list += "," + arrayList[i];
      else
        list += arrayList[i];
    }
    return list;
  }

  public static int mkdir (String path) {
    File myFile = new File (path);
    if (myFile.mkdir()) return SUCCESS; else return FAILURE;
  }

  public static int mkdirs (String path) {
    File myFile = new File (path);
    if (myFile.mkdirs()) return SUCCESS; else return FAILURE;
  }

  public static int renameTo (String fromPath, String toPath) {
    File myFromFile = new File (fromPath);
    File myToFile   = new File (toPath);
    if (myFromFile.renameTo(myToFile)) return SUCCESS; else return FAILURE;
  }

  public static int setReadOnly (String path) {
    File myFile = new File (path);
    if (myFile.setReadOnly()) return SUCCESS; else return FAILURE;
  }

  public static int copy (String fromPath, String toPath) {
    try {
      File myFromFile = new File (fromPath);
      File myToFile   = new File (toPath);
 
      InputStream  in  = new FileInputStream(myFromFile);
      OutputStream out = new FileOutputStream(myToFile);
     
      byte[] buf = new byte[1024];
      int len;
      while ((len = in.read(buf)) > 0) {
        out.write(buf, 0, len);
      }
      in.close();
      out.close();
      return SUCCESS;
    }
    catch (Exception ex) {
      return FAILURE;
    }
  }
};
/
show errors java source "FileHandler"

◉ Publish the Java Call Specification


Next we publish the call specification using a PL/SQL "wrapper" package. Notice no package body is required since it only contains references to Java stored procedures.

CREATE OR REPLACE PACKAGE file_api AS

FUNCTION canRead (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.canRead (java.lang.String) return java.lang.int';

FUNCTION canWrite (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.canWrite (java.lang.String) return java.lang.int';

FUNCTION createNewFile (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.createNewFile (java.lang.String) return java.lang.int';

FUNCTION delete (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.delete (java.lang.String) return java.lang.int';

FUNCTION exists (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.exists (java.lang.String) return java.lang.int';

FUNCTION isDirectory (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.isDirectory (java.lang.String) return java.lang.int';

FUNCTION isFile (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.isFile (java.lang.String) return java.lang.int';

FUNCTION isHidden (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.isHidden (java.lang.String) return java.lang.int';

FUNCTION lastModified (p_path  IN  VARCHAR2) RETURN DATE
AS LANGUAGE JAVA
NAME 'FileHandler.lastModified (java.lang.String) return java.sql.Timestamp';

FUNCTION length (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.length (java.lang.String) return java.lang.long';

FUNCTION list (p_path  IN  VARCHAR2) RETURN VARCHAR2
AS LANGUAGE JAVA
NAME 'FileHandler.list (java.lang.String) return java.lang.String';

FUNCTION mkdir (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.mkdir (java.lang.String) return java.lang.int';

FUNCTION mkdirs (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.mkdirs (java.lang.String) return java.lang.int';

FUNCTION renameTo (p_from_path  IN  VARCHAR2,
                   p_to_path    IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.renameTo (java.lang.String, java.lang.String) return java.lang.int';

FUNCTION setReadOnly (p_path  IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.setReadOnly (java.lang.String) return java.lang.int';

FUNCTION copy (p_from_path  IN  VARCHAR2,
               p_to_path    IN  VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA
NAME 'FileHandler.copy (java.lang.String, java.lang.String) return java.lang.int';

END file_api;
/
SHOW ERRORS

◉ Grant Privileges to Give JServer Access to the Filesystem


In this example we are granting access to all directories on the server. That is really dangerous. You need to be more specific about these grants and/or be very careful about who you grant access to this functionality.

The relevant permissions must be granted from SYS for JServer to access the file system.

EXEC DBMS_JAVA.grant_permission('SCHEMA-NAME', 'java.io.FilePermission', '<<ALL FILES>>', 'read ,write, execute, delete');
EXEC DBMS_JAVA.grant_permission('SCHEMA-NAME', 'SYS:java.lang.RuntimePermission', 'writeFileDescriptor', '');
EXEC DBMS_JAVA.grant_permission('SCHEMA-NAME', 'SYS:java.lang.RuntimePermission', 'readFileDescriptor', '');
GRANT JAVAUSERPRIV TO SCHEMA-NAME;

The affects of the grant will not be noticed until the grantee reconnects. It is up to the individual to decide on the level of access that is required.

◉ Test It


Finally we call the FILE_API packaged functions from PL/SQL. An example of every syntax can be seen below.

SET SERVEROUTPUT ON
BEGIN
  DBMS_OUTPUT.PUT_LINE('canRead      : ' ||  FILE_API.canRead ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('canWrite     : ' ||  FILE_API.canWrite ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('createNewFile: ' ||  FILE_API.createNewFile ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('delete       : ' ||  FILE_API.delete ('C:\temp\test2.txt'));
  DBMS_OUTPUT.PUT_LINE('exists       : ' ||  FILE_API.exists ('C:\temp\test2.txt'));
  DBMS_OUTPUT.PUT_LINE('isDirectory  : ' ||  FILE_API.isDirectory ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('isFile       : ' ||  FILE_API.isFile ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('isHidden     : ' ||  FILE_API.isHidden ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('lastModified : ' ||  TO_CHAR(FILE_API.lastModified ('C:\temp\test1.txt'), 'DD-MON-YYYY HH24:MI:SS'));
  DBMS_OUTPUT.PUT_LINE('length       : ' ||  FILE_API.length ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('mkdir        : ' ||  FILE_API.mkdir ('C:\temp\dir1'));
  DBMS_OUTPUT.PUT_LINE('mkdirs       : ' ||  FILE_API.mkdirs ('C:\temp\dir2\dir3'));
  DBMS_OUTPUT.PUT_LINE('renameTo     : ' ||  FILE_API.renameTo ('C:\temp\test1.txt','C:\temp\test2.txt'));
  DBMS_OUTPUT.PUT_LINE('setReadOnly  : ' ||  FILE_API.setReadOnly ('C:\temp\test1.txt'));
  DBMS_OUTPUT.PUT_LINE('copy         : ' ||  FILE_API.copy ('C:\temp\test2.txt','C:\temp\test1.txt'));
END;
/

◉ List Files in a Directory


We can use the LIST function in the FILE_API package to list files and sub-directories in a directory. Notice the files are presented as a comma-separated list.

SET SERVEROUTPUT ON
BEGIN
  DBMS_OUTPUT.PUT_LINE('Output : ' ||  File_API.list ('/u01/app/oracle'));
END;
/
Output : admin,audit,cfgtoollogs,checkpoints,diag,product


PL/SQL procedure successfully completed.

SQL>

We can split the list into an array. There are a number of ways to do that, but this method uses the APEX_STRING package.

DECLARE
  l_array APEX_APPLICATION_GLOBAL.vc_arr2;
  l_string varchar2(32767);
BEGIN
  l_array:= APEX_STRING.string_to_table(File_API.list ('/u01/app/oracle'), ',');

  FOR i in 1..l_array.count LOOP
    DBMS_OUTPUT.put_line('Array(' || i || ') : ' || l_array(i));
  END LOOP;
END;
/
Array(1) : admin
Array(2) : audit
Array(3) : cfgtoollogs
Array(4) : checkpoints
Array(5) : diag
Array(6) : product


PL/SQL procedure successfully completed.

SQL>

Friday, July 17, 2020

PDB Point-in-Time Recovery and Flashback in Oracle 20c

Oracle Database Tutorial and Material, Database Certification, Database Learning, Oracle 20c

The most significant point about the Oracle 20c database architecture is that non-CDB Oracle Database upgrades to non-CDB architecture are desupported. Meaning you need a container database in 20c and your data will reside within a pluggable database.

But then how about if you need to restore one PDB to any time in the recent past?

In Oracle database 20c, flashback and PITR (=point-in-time recovery) are supported when recovering PDBs to an ancestor or orphan PDB incarnations. These operations were not possible in 19c and below. Just as a reminder, in Oracle 12.1 flashback database operations were possible on root container level and thus affected all PDBs under the root container. Oracle 12.2 started supporting flashback of a PDB.

There is one restriction though in 20c: you cannot perform PDB flashback or PITR operation to a PDB incarnation within an orphan database incarnation. In other words, you can flashback as long as the CDB incarnation does not change. Or restated: you can flashback a PDB to an orphan PDB incarnation that is either within the same CDB incarnation or in an ancestor CDB incarnation. Also, flashback of a PDB to an orphan incarnation is supported only when the database uses local undo.

Now, this might confusing. Let me first shortly explain what is an ancestor incarnation and an orphan incarnation.

Database incarnations have the following relationships to each other:

– The current incarnation is the one in which the database is currently operating
– The incarnation from which the current incarnation originated after an OPEN RESETLOGS operation is the parent incarnation of the current incarnation
– The parent of the parent incarnation is an ancestor incarnation and any parent of an ancestor incarnation is also an ancestor of the current incarnation
– A noncurrent incarnation that is not a direct ancestor of the current incarnation is called orphan incarnation

During the flashback of the PDB, Oracle modifies only the data files for that PDB. The data in the other PDBs is not impacted. The point in time for the flashback can be one of the following:

– System Change Number
– Specific time in the past
– CDB restore point
– PDB restore point
– PDB clean restore point
– PDB guaranteed restore point

Here is an example of how flashback to any time in the recent past works in Oracle 20c.

We have lost at 2:30pm a table called RDBMS_BRANDS and a materialized zone map RDBMS_ZMAP from a pluggable database called NOVOPDB2. We have a restore point called rp1_novo_pdb2 created before the “disaster” at 8am in the morning. So, let us first flashback and verify we get the 2 objects back:

SQL> select systimestamp from dual;

SYSTIMESTAMP
-------------------------------------------------------------
05-JUL-20 02.35.30.569344 PM +00:00

SQL> SELECT table_name FROM DBA_TABLES where TABLE_NAME like '%RDBMS%';

no rows selected

SQL> ALTER PLUGGABLE DATABASE novopdb2 CLOSE;

Pluggable database altered.

SQL> FLASHBACK PLUGGABLE DATABASE novopdb2 TO RESTORE POINT rp1_novo_pdb2;

Flashback complete.

SQL> ALTER PLUGGABLE DATABASE novopdb2 OPEN RESETLOGS;

Pluggable database altered.

SQL> SELECT table_name FROM DBA_TABLES where TABLE_NAME like '%RDBMS%';

TABLE_NAME
----------------------------------------------------------------------
RDBMS_BRANDS
RDBMS_ZMAP
 
Well, unfortunately, now we notice that slightly before noon time data was loaded into a new table called RDBMS_HISTORY which was not at 8am in the PDB. All SCNs between 8am and the current time are now on an orphan PDB incarnation. We will flahsback again using another restore point created at 12 o’clock.

SQL> ALTER PLUGGABLE DATABASE novopdb2 CLOSE;

Pluggable database altered.

SQL> FLASHBACK PLUGGABLE DATABASE novopdb2 TO RESTORE POINT rp2_novo_pdb2;

Flashback complete.

SQL> ALTER PLUGGABLE DATABASE novopdb2 OPEN RESETLOGS;

Pluggable database altered.

SQL> SELECT table_name FROM DBA_TABLES where TABLE_NAME like '%RDBMS%';

TABLE_NAME
--------------------------------------------------------------------------------
RDBMS_BRANDS
RDBMS_ZMAP
RDBMS_HISTORY

SQL> select systimestamp from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
05-JUL-20 02.48.30.569344 PM +00:00

So, we managed to flashback to a point few hours after we opened with RESETLOGS from the previous flashback. But within the same incarnation of the CDB.

DBAs can follow the process of restore and recovery using the V$SESSION_LONGOPS and V$RECOVERY_PROGRESS views, respectively.

For the restore, the V$SESSION_LONGOPS view’s column OPNAME should be ‘Flashback Database’. Just like this:

SELECT sofar, totalwork, units
FROM v$session_longops
WHERE opname = 'Flashback Database';

The column SOFAR shows the data currently read in megabytes while the column TOTALWORK shows the total number of megabytes of flashback logs that must be read.

Restore points are created with the following command:

SQL> CREATE RESTORE POINT rp17 FOR PLUGGABLE DATABASE novopdb2;

Restore point created.

Wednesday, July 15, 2020

A Look at the Oracle Group-by Bug

Database Tutorial and Material, Database Exam Prep, Database Certification

Oracle introduced a new feature, group by elimination, for queries where the group by column is also the table's unique key. As with many new features this one still has not had all the kinks resolved. The problem arises when key values are manipulated with function calls. The following example will illustrate the issue by using a table with a DATE as the primary key and by extracting the year is extracted using TO_CHAR or EXTRACT.

Read More: Database Certification

A table is created as follows:

create table bug_test_calendar(
        cal_name   char(17),
        bus_dt   date,
        updt_timestamp       timestamp (6) default systimestamp,
        constraint pk_bug_test_calendar
                        primary key (bus_dt)
)
/

insert into bug_test_calendar (bus_dt)
select
        sysdate + 10 * rownum
from
        all_objects
where
        rownum <= 40
/

commit;

When the query shown below is executed, it produces the following results:

select
        to_char(bus_dt,'YYYY') bus_dt, count(*) ct
from
       bug_test_calendar
group by
        to_char(bus_dt,'YYYY')
order by
        to_char(bus_dt,'YYYY')
/

BUS_DF   CT
-------  --
2020      1
2020      1
...
2020      1

40 rows returned

Database Tutorial and Material, Database Exam Prep, Database Certification
Oracle doesn't 'know' that the key values have been manipulated so that they are no longer unique, thus the optimizer applies the unique-key-based group-by elimination with less than stellar results,

EXTRACT fares no better, returning the same results. This behavior is controlled by the "_optimizer_aggr_groupby_elim" parameter, which is set to true by default. As it's a hidden parameter, its setting is not reported by Oracle in either of the V$PARAMEter or V$SPPARAMETER views. The work-around is to simply set this parameter to false. However, having it active might help other group-by queries where the unique key values are not manipulated.

Enter Oracle 19c, where this functionality is partially fixed:

select
        to_char(bus_dt,'YYYY') bus_dt, count(*) ct
from
       bug_test_calendar
group by
        to_char(bus_dt,'YYYY')
order by
        to_char(bus_dt,'YYYY')
/

BUS_DF   CT
-------  --
2020     40

Unfortunately EXTRACT is still broken in 19c:

select
        to_char(bus_dt,'YYYY') bus_dt, count(*) ct
from
       bug_test_calendar
group by
        extract(year deom bus_dt)
order by
        extract(year deom bus_dt)
/

BUS_DF   CT
-------  ==
2020      1
2020      1
...
2020      1

40 rows returned

Obviously given truly unique key values a group-by query would produce a count of 1 for each key. And, just as obvious, Oracle should be able to recognize when values are no longer unique and invoke the proper group-by mechanism. It remains to be seen if versions after 19c will fix the second condition and thus return correct results without having to turn off this feature.

This may not affect every installation of Oracle newer than 12.1, but it is worth knowing about should wrong results start appearing in selected group by queries.

Monday, July 13, 2020

Oracle Database 20c Automatic In-Memory Enhancements

In Oracle Database 20c the Database In-Memory feature Automatic In-Memory (AIM) has been significantly enhanced. I wrote about AIM when it first came out in Oracle Database 18c here. Oracle Database 20c adds a new HIGH option to the INMEMORY_AUTOMATIC_LEVEL initialization parameter. With this setting all objects that do not have a pre-existing INMEMORY setting are automatically set to INMEMORY MEMCOMPRESS AUTO by default. AIM then automatically manages objects populated into the In-Memory (IM) column store using access tracking and column statistics. This is a big change in behavior and addresses one of the most frequent questions that customers have had, which is "How do I determine which objects to populate into the IM column store?"

Oracle Database 20c, Oracle Database Certifications, DB Exam Prep, Database Learning

The previous parameter options of LOW and MEDIUM still exist and function the same as they did when introduced in Oracle Database 18c. However, with the new HIGH option in Oracle Database 20c the database automatically manages the contents of the IM column store. It monitors segment activity using an access tracking and column statistic infrastructure similar to Heat Map data which was introduced as part of Automatic Data Optimization (ADO). With AIM set to HIGH segments are automatically evicted and populated based on usage. You do not have to pick and choose which objects to enable for in-memory. In addition, individual columns may be automatically compressed as well by AIM. All of this has been done to make the most optimal use of the IM column store and provide the best performance possible, automatically!

When the INMEMORY_AUTOMATIC_LEVEL initialization parameter has been set to HIGH all objects that do not have a pre-existing INMEMORY setting are automatically set to INMEMORY MEMCOMPRESS AUTO. This is a new option of the INMEMORY MEMCOMPRESS subclause in 20c that is part of the AIM feature. If you do not want specific objects to be populated in the IM column store then you can still manually set them to NO INMEMORY. Also, segments marked with an INMEMORY PRIORITY setting other than NONE are excluded from automatic eviction.

If you decide that you no longer want to use the HIGH setting of the INMEMORY_AUTOMATIC_LEVEL parameter then when you change the parameter value or unset it all segments with MEMCOMPRESS AUTO will be set to NO INMEMORY.

One other interesting feature of AIM is that you can adjust the window that AIM considers for segment usage. In other words, AIM can be adjusted to only consider the object usage statistics that coincide with your active workload window. That way those statistics won't be skewed by periods of inactivity, other application usage or perhaps even maintenance activities.

How do you tell what AIM has done? AIM runs tasks, similar to ADO, and those tasks are exposed in two data dictionary views. The views DBA_INMEMORY_AIMTASKS and DBA_INMEMORY_AIMTASKDETAILS can be queried to see what actions AIM has performed. Of course you can still query the contents of the IM column store using the view v$im_segments and see how much memory has been used using the view v$inmemory_area.

Saturday, July 11, 2020

What is a Converged Database?

Oracle Database Tutorial and Material, Oracle Database Learning, Oracle Database Certification, Database Exam Prep

As a database administrator or manager, you may have had one or more of these conversations with your application teams?

We need to build a new mobile app so customers can submit and retrieve documents, so I’m going to need a specialized database to store the documents, right?

Oh, wait, next week this other project requires we provide a new payment system that has user fraud protection built in to meet compliance and I am going to store relational data as well.  That requires a specialized Blockchain database and a relational database, right?

These conversations can leave you feeling concerned and frustrated, wondering how you are going to allocate your resources among all of these specialized data stores.

But perhaps the solution to your problem is not more resources but a converged database.

A converged database is a database that has native support for all modern data types and the latest development paradigms built into one product.

Oracle Database Tutorial and Material, Oracle Database Learning, Oracle Database Certification, Database Exam Prep
Converged databases support Spatial data for location awareness, JSON for document stores, IoT for device integration, in-memory technologies for real-time analytics, and of course, traditional relational data. By providing support for all of these data types, a Converged Database can run all sorts of workloads from IoT to Blockchain to Analytics and Machine Learning. It can also handle any development paradigm, including Microservices, Events, REST, SaaS, and CI/CD, to name a few.

Traditionally when new data management technologies first come out, they are implemented as separate products. For example, when Blockchain first came out, it was a separate stand-alone system that required you to use an entirely different, proprietary way to store and access data.

By integrating new data types, workloads, and paradigms as features within a converged database, you can support mixed workloads and data types in a much simpler way. You don't need to manage and maintain multiple systems or worry about having to provide unified security across them.

You also get synergy across these capabilities. For example, by having support for Machine Learning algorithms and Spatial data in the same database, you can easily do predictive analytics on Spatial data. Making it dramatically easier and faster to develop data-driven apps.

Oracle Database Tutorial and Material, Oracle Database Learning, Oracle Database Certification, Database Exam Prep
A good analogy for a converged database is a smartphone. In the past, if you wanted to make phone calls, you would use a phone, and if you wanted to take a picture or video, you would use a camera. If you wanted to navigate somewhere, you would need a map or a navigation system. If you wanted to listen to music, you needed an iPod or other similar device.

But with a smartphone, all of these products have been converted or converged into one. Each of these original products is now a feature of the smartphone. Having all of these features converged into a single product inherently makes your life easier, as you can stream music over the phone's data plan or upload pictures or videos directly to social media sites.

The same ease of use and convenience you get from a smartphone also hold for a converged database.

Oracle Database is an excellent example of a converged database, as it provides support for Machine Learning, Blockchain, Graph, Spatial, JSON, REST, Events, Editions, and IoT Streaming as part of the core database at no additional cost. It allows you to support many diverse projects using a single platform, significantly reducing complexity and management overhead, while minimizing risk.

Friday, July 10, 2020

Oracle Database 20c Preview

Innovations in Oracle Database


In this blog entry we'll take a look at some of the new features inside Oracle Database 20c which has just been refreshed to version 20.3 on the Oracle Cloud's DBCS platform.

Oracle Database 20c is the next release of Oracle's multi-model database. Like the versions that have preceded it, Oracle Database 20c provides industry leading scalability, availability and security for both OLTP and analytical workloads. It supports relational, JSON, XML, spatial, graph, columnar, OLAP and unstructured data, enabling you to focus on building applications without having to worry about how to persist such data.

Support and Previous Releases


You can find details on the significant features in our previous yearly release in the following posts:

Oracle Database 18c
Oracle Database 19c

Oracle Database 20c is a yearly short term support release allowing users to try out new functionality, or take advantage of cutting-edge features for applications that could benefit from it. While we are very excited about this latest release, Oracle recommends that most users should consider upgrading to Oracle Database 19c, since it provides long term support all the way through to April 2026.

Converged Database


Since the initial releases of Oracle Database, Oracle has taken the approach that storing and managing data in a single database platform makes more sense than breaking it up and storing it in single use engines. Using multiple independent engines inevitably results in issues with data integrity, consistency and security. By using a single engine that provides the best of breed support for all of the major data types and development paradigms, users can benefit from all of Oracle Database's key capabilities such as, ACID transactions, read consistency, centralised security model, parallel scans and DML, online backups, point in time recovery etc. - regardless of the approach you take to storing the data.

The decision to centralize your data inside Oracle Database doesn't mean sacrificing the ability to build applications using whatever design approach you think is appropriate. Oracle supports the creation of single database application as well as those adopting event-driven or microservice paradigms. Key to this approach is the use of Oracle's Multitenant Architecture to provide each service with its own virtual database (PDB). This still allows you to manage many PDBs as one, and it simplifies the federation of the database via inter PDB SQL operations.

And, if you need to support millions of concurrent users or geographically distribute your database because of regulatory requirements, Oracle Database Sharding makes it simple to do this while still providing a converged data model.

Oracle Database 20c New Features


Oracle Database 20c introduces several features, far more than is covered in this short blog posting.

Let's go through some of the significant enhancements in Oracle Database 20c Preview release.

Blockchain Tables


Blockchain as a technology has promised much in terms of solving many of the problems associated with the verification of transactions. While considerable progress has been made in bringing this technology to the enterprise, a number of problems exist, with arguably the largest being the complex nature of building applications that can support a distributed ledger.

To simplify the introduction of this exciting technology in Oracle Database 20c we're introducing Blockchain Tables. These tables operate like any normal heap table, but with a number of important differences. The most notable of these being that rows are cryptographically hashed as they are inserted into the table, ensuring that the row can no longer be changed at a later date.

Oracle Database 20c, Oracle Database Exam Prep, Oracle Database Certification, DB Learning

This essentially creates an insert only table. Blockchain Tables don't allow users to update or delete rows. Users are also prevented from truncating the data, dropping partitions or dropping the table within certain time limits.

This important capability means that other users can trust that the data held in the blockchain table is an accurate record of events. Oracle Database 20c enables you to run a process that will verify all of the records are consistent with their hash signature.

SQL Macros


It is not unusual for a SQL statement to grow in complexity as the number of joins increase, or the operations performed on the retrieved data becomes more involved. It is also not uncommon for developers to try and solve this problem by using stored procedures and table functions to simplify these commonly used operations. This works extremely well to simplify code, but can potentially sacrifice some performance as the SQL engine switches context with the PL/SQL Engine. In Oracle Database 20c, SQL Macros solve this problem by allowing SQL expressions and table functions to be replaced by calls to stored procedures which return a string literal to be inserted in the SQL we want to execute. It's an incredibly simple concept and one that C and Rust programmers will be familiar with. The following trivial example shows it in action.

First, let's create a tables and insert a few rows.

CREATE TABLE(id integer, name varchar2(30), item_type varchar2(30), price float );
insert into line_items values (1, 'Red Red Wine', 'ALCOHOL', 15.6)
insert into line_items values (2, 'Its Cold Out There Heater', 'RADIATOR', 200.49);
insert into line_items values (3, 'How Sweet It Is Cake', 'FOOD', 4.56);

The SQL below calculates the value added tax on rows in our LINE_ITEMS table

select id,
  case
    when item_type = 'ALCOHOL' then round(1.2 * price, 2)
    when item_type = 'SOLAR PANEL' then round(1.05 * price, 2)
    when item_type = 'RADIATOR' then round(1.05 * price, 2)
    else price end as total_price_with_tax
from line_items; 

However in Oracle Database 20c we can simplify it by creating a function with the new SQL_MACRO keyword and returning a string.

create or replace function total_price_with_tax(the_price float, the_item_type varchar)
  return varchar2 SQL_MACRO(SCALAR) is
begin
  return q'[case
    when item_type = 'ALCOHOL' then round(1.2 * price, 2)
    when item_type = 'SOLAR PANEL' then round(1.05 * price, 2)
    when item_type = 'RADIATOR' then round(1.05 * price, 2)
    else price end as total_price_with_tax]';
end; 

We can then simply reference the SQL Macro inside of a select statement. The SQL that's executed is exactly the same as the original SQL Statement without the overhead of a context switch each time the row is fetched to execute our function.

SQL > select id, total_price_with_tax(price, item_type) from line_items;

  ID      TOTAL_PRICE_WITH_TAX(PRICE,ITEM_TYPE) 
  ------------------------------------------
   1                                   18.72 
   2                                   210.5145 
   3                                   4.56 

The same approach can be used in parametrised views and Polymorphic tables.

Oracle Database In-Memory Enhancements


Analysing data using a columnar model can result in massive performance improvements when compared to doing the same operations using a row-based model. However, updating data is significantly faster when using data held in rows. Oracle Database In-Memory is unique in that it allows you to benefit from both approaches. With this capability you can run your relational or JSON application unchanged and Oracle Database will maintain a columnar store supporting blazingly fast real-time analytical queries.

Oracle Database 20c introduces three major improvements to enhance performance and ease of use when using Oracle Database In-Memory functionality.

◉ Database In-Memory Vector Joins : Through the use of its newly enhanced Deep Vectorization SIMD Framework, Oracle Database In-Memory can accelerate operations like hash joins on columns held inside of the In-Memory column store. In the case of a hash join, the join is broken down into smaller operations that can be passed to the vector processor. The key-value table used is SIMD optimized and used to match rows on the left and right-hand sides of the join. This approach can result in join performance improvements of up to 10 times over traditional methods.

◉ Self Managing In-Memory Column Store : When Oracle Database In-Memory was first released, you had to explicitly declare which columns were to be populated into the In-Memory Column Store. This gave you a high degree of control if memory was tight. In Oracle Database 18c, we introduced functionality that would automatically place objects in the Column Store if they are actively used and removed objects that weren't. However, you still had to indicate the objects you wanted considered. In Oracle Database 20c when you set  INMEMORY_AUTOMATIC_LEVEL to HIGH, all objects are considered. This automatic memory management significantly simplifies the job of managing the column store.

◉ In-Memory Hybrid Columnar Scans : It is often not possible to have every column of every table populated in the Column Store because memory is limited. In many instances, this isn't an issue but every once in a while you may encounter  a query which needs some of the data(columns) from the Column Store and some data that's only available in the row store. In previous versions of Oracle Database In-Memory, such querys would simply run against the row store. In Oracle Database 20c we can now use both. The optimizer can now elect to scan the Column Store and fetch projected column values from the row store if needed. This can result in a significant improvements in performance.

Oracle Database 20c, Oracle Database Exam Prep, Oracle Database Certification, DB Learning

Hybrid Columnar Scan

Native JSON Datatype


We introduced support for JSON in Oracle Database 12c (12.1.0.2). It allowed JSON to be stored in the database inside of a varchar2 or a LOB (CLOB or BLOB). This meant it was possible to build applications with the flexibility offered by a schemaless design model but benefiting from the power of the Oracle Database. You could query the JSON documents using standard SQL, take advantage of advanced analytics, index individual attributes or whole documents and process billions of JSON documents in parallel. We also provided tools to discover what attributes made up the JSON documents and trivially create relational views on top of the collections. It was also possible for developers to treat the Oracle Database as if it were a NoSQL Database by accessing it with the SODA (Simple Object Data API) APIs available for Java, Node.js, Python, C and REST.

In Oracle Database 20c we are improving our JSON support by offering a Native data type, "JSON". This means that instead of having to parse JSON on read or update operations, the parse only happens on an insert and the JSON is then held in an internal binary format which makes access much faster. This can result in read and update operations being 4 or 5 times faster and updates to very large JSON documents being 20 to 30 times faster.

CREATE TABLE j_order (
   id     INTEGER PRIMARY KEY,
  po_doc JSON );

The new data type wasn't the only change that got introduced for JSON in Oracle Database 20c Oracle also added a new JSON function JSON_TRANSFORM which makes it much simpler to update and remove multiple attributes in a document in a single operation.

UPDATE j_order SET po_doc = JSON_TRANSFORM( po_doc,
                                            SET '$.address.city' = 'Santa Cruz’,
                                            REMOVE'$.phones[*]?(@.type = "office")’ )
 WHERE id = 555;

And of course, we also added compatibility for the new JSON datatype to our drivers and utilities like Datapump and GoldenGate.

Machine Learning for Python and AutoML


Machine Learning has been built into Oracle Database since the release of 8i. It doesn't require analysts or data scientists to extract the data onto a file system or specialist database, but rather allows you to leverage the power of Oracle Database and build models with over 30 Machine Learning algorithms running directly on data held in your tables. This approach of moving the algorithms to the data minimizes or eliminates data movement, achieves scalability, preserves data security, and accelerates time-to-model deployment.

In Oracle Database 20c we are introducing functionality to make it even simpler for users to take advantage of this functionality by providing Python Machine Learning interfaces to Oracle Database. This new client compliments the R and SQL interfaces already available. Data Scientists can now work in an environment they feel comfortable with, and simply treat the Oracle Database as a high performance compute engine. You can use programmatic structures similar to those in the Scikit-learn, and Panda frameworks which become simple proxies for tables in the database and then call the Oracle Database ML algorithms using Python function calls rather than SQL.

Oracle Machine Learning for Python also looks to simplify the process of selecting and tuning the right model which can be a complicated and time-consuming exercise. New AutoML functionality can select the model, the relevant attributes and then tune the hyper parameters. This dramatically reduces the time required to create a model that accurately creates predictions or classifies your data.

Oracle Database 20c, Oracle Database Exam Prep, Oracle Database Certification, DB Learning

In Oracle Database 20c Machine Learning we are also adding support for the MSET-SPRT and XGBoost algorithms, and the Adam Optimization solver for the Neural Network Algorithm.

Other Notable Enhancements


There were over 100+ enhancements to Oracle Database 20c. And whilst I'd love to cover them all in detail I'd simply be repeating a lot of content from Oracle Database New Features Guide. However there are a few more features that I think are of particular note.

Expression based init.ora parameters : It's now possible to base database parameters (init.ora) on calculations made on the configuration of the system, i.e. set the database parameter CPU_COUNT on half the number of CPUs available to the operating system.

Automatic Zone Maps : Exadata can now automatically create Zone Maps based on the predicates used in queries. Previously this was a manual operation requiring you to understand how the data would be accessed. This can dramatically reduce the number of blocks that need to be scanned.

Optimised Graph Models : Graphs can consist of millions or even billions of edges and vertices and so the storage optimisations we've made to the graph capabilities in Oracle Database 20c preview release can result in big space and performance improvements for your models.

Sharding Enhancements : To make it easier to develop Java applications against Oracle Sharding we've introduced a new Java Data Source that makes it simple to obtain connections without having to define the shard key or manage the connection key explicitly. We have also made sharding more fault-tolerant by automatically looking for alternates if the shard you are working on fails during execution.

Persistent Memory Support : In Oracle Database 20c we provide support for Databases running on top of Persistent Memory File Systems. PMEM File systems can offer significant latency and bandwidth improvements over traditional file systems using SSD or mechanical disks. However, the applications using them need to understand how to safely write to them and the most efficient way to use them in conjunction with other OS resources. Oracle Database 20c's implementation provides atomic writes, safe guarding against partial writes during unexpected power outages. It also offers Fast I/O operations using memory copy. In addition it efficiently uses database buffer cache by bypassing and reading directly from PMEM storage.

Source: oracle.com