Since late 2016, Db2 V12 for the z/OS platform has been generally available. Along with many new features, V12 was the last version on the z/OS platform to be delivered as a complete software upgrade. Future features will be delivered using a new system called continuous delivery, where subsets of features are delivered as “function levels”. Sites may then choose which features and functions they wish to implement.
In this article some of the more useful features of Db2 and SQL that have been delivered as new functionality in the latest releases are reviewed.
Analytics processing against a big data solution, a data warehouse, or a combination of data sources requires loading lots and lots of data quickly. Depending upon the methods used, it is possible that data are inconsistent until loading is complete. Consider a day’s worth of online sales data that contains new orders from new customers. If you load your big data application with the new orders first, their corresponding customers will not exist; load the customers first, and analytics for those customers will show no orders!
In general, massive data loads can encounter bottlenecks if done serially. Some issues include network capacity, disk storage capacity, hot spots in data and index pages containing concurrent updates, and logging volume. In the past, the DBA has attempted to alleviate these bottlenecks by creative database object designs, including creative data partitioning schemes, special index clustering, embedding significant free space (i.e., empty space) in objects, as so forth.
In Db2 V12, there is a new method and algorithm to support heavy inserts. This algorithm detects when a mass insert operation is happening that involves non-clustering indexes and manages multiple threads that execute concurrently.
Coupled with this new algorithm, IBM also includes a DRDA Fast Load software product that enhances high-speed data loading from distributed clients. The DBA no longer needs to import data from external sources, convert to a local encoding scheme, re-format and clean it, and then execute a data load utility. The DRDA Fast Load product does all of this directly from a remote process. Data transformations can be accomplished using SQL.
With business analytics (BI) becoming more and more common, it is only natural that companies have implemented BI software that accesses a big data application. On the z/OS platform, IBM’s solution is the IBM Db2 Analytics Accelerator (IDAA). (This formerly stand-alone hardware unit is now available in a version that is embedded in the zSystems hardware chassis.)
With the IDAA as your main big data application, mass-insert operations once again become a concern. Luckily, there are several Db2 V12 features that can address these concerns.
First, Db2 allows any table to exist in any of three states:
1. only in the Db2 database;
2. only in the IDAA; and,
3. existing in both.
Regardless of configuration, the Db2 Optimizer will intercept incoming SQL queries and determine which of the table(s) will be used to generate results. For a typical operational query, the Optimizer should reference the in-Db2 table; alternatively, for an analytic query the best choice will probably be the table in the IDAA. This is because operational queries usually reference only a few tables and return a small set of rows. In contrast, analytic queries can mention tens of tables, and sometimes require accessing millions (or billions) of rows in order to return the desired result.
Enter a new Db2 feature: high-speed multi-row insert for IDAA tables. If the table exists only in the IDAA, this feature provides a method of getting lots of data into IDAA-based tables quickly while avoiding the locking and potential outage of a Load utility.
Long ago, tables were considered large if they contained millions of rows. Later, billions of rows became typical, with total storage sizes in the tens of gigabytes. Today, tables containing hundreds of gigabytes of data are becoming more common, especially with the advent of big data, and terabyte-sized tables are on the horizon.
In versions of Db2 before version 12, the practical limit to the size of a table was 256 gigabytes per partition in a 64-partition tablespace, for a total of 16,000 gigabytes (16 terabytes). Partition sizes and number of partitions per tablespace were not easily changeable. If the DBA mis-sized the tablespace, it might have been possible to alter the tablespace size parameters, but this necessitated a reorganization of the entire tablespace. This reorg would require an amount of work disk space approximately the size of the tablespace; and, during the reorg the entire tablespace would not be available.
In Db2 version 12, IBM implements a new database object. This is a universal tablespace type that permits specification of partition size (DSSIZE) for each partition, increases the maximum partition size from 256 gigabytes to 1 terabyte, and increases the maximum number of partitions allowed. At its greatest extent, a tablespace can now be sized at 4,000 terabytes (4 petabytes), and the table in that tablespace can now contain up to 256 trillion rows.
While it will be rare that such a table would be implemented in native Db2, such tables might exist in the IDAA as accelerator-only tables. Indeed, loading such a table in Db2 would require a massive amount of disk space, as well as large amounts of elapsed time and CPU cycles.
As noted above, IBM is delivering new features and functions using a “continuous delivery” model. Since late 2016, there have been several features delivered in this fashion. Two of the more useful ones are noted here.
Data encryption and data compression. IBM deliberately designed Version 12 of Db2 to integrate with zSystems hardware. The latest hardware, the IBM z14, contains new hardware options for cryptographic processing. This is implemented using special central processor units. Db2 can use this new feature for data encryption using standard SQL language syntax. In addition, if the DBA has specified that a tablespace is to use data compression for disk storage, no software code needs to be executed in order to compress data for storage and decompress for retrieval; instead, the z14 can do this transparently to and from disk. This means a significant reduction in CPU and elapsed time for compressed tables.
Another similar feature involves data compression of Db2 index data. In general, indexes contain entries for each data row that specify the key for the index and a physical pointer to the row in which it resides. For example, an index on the Customer table may contain entries for the column Customer-Number. The DBA would then typically use the CLUSTER parameter to indicate that this index would be used to attempt to maintain data rows in the table in sorted order by the key value. Db2 now allows options that can implement index data compression without losing the sort order of index entries. This is called order-preserving compression.
Attached processor use during Db2 Load and Reorg. IBM allows sites to install several different versions of attached processors to the zSystem hardware. These processors can then function as CPUs for certain specified applications with one important difference: IBM will not charge for that processor’s use. This can be a great benefit for some shops that use any zSystems licensing mechanism that is dependent upon CPU cycles, where typically an average of the peak CPU use during a period is used to charge the customer. Applications that use an attached processor will not incur charges, so DBAs are always on the lookout for ways to do this.
One popular option is called the z Integrated Information Processor, or zIIP. This chip can be installed in most zSystems hardware, and the operating system recognizes it automatically on installation. Db2 version 12 also recognizes the chip, and the latest version of the Load and Reorganization utilities are now eligible for execution on the zIIP.
Typically, Load is used to load data into a table. The Reorg utility unloads table data, sorts it, then loads it back in. Hence, both utilities have a phase where they load data into the table. During this phase, called the reload phase, the utilities also save in work datasets all data required for any table indexes. Once the table is loaded, the index data is then sorted and loaded into the appropriate index objects. It is this phase that is now eligible for zIIP execution.
In these days of big data and expanding table sizes, this can result in a significant cost reduction. IBM estimates that in some cases up to 90 percent of normal cpu usage spent by Load and Reorg can now be eliminated by using zIIP processors.
In this article some of the more useful features of Db2 and SQL that have been delivered as new functionality in the latest releases are reviewed.
High Speed Data Load
Analytics processing against a big data solution, a data warehouse, or a combination of data sources requires loading lots and lots of data quickly. Depending upon the methods used, it is possible that data are inconsistent until loading is complete. Consider a day’s worth of online sales data that contains new orders from new customers. If you load your big data application with the new orders first, their corresponding customers will not exist; load the customers first, and analytics for those customers will show no orders!
In general, massive data loads can encounter bottlenecks if done serially. Some issues include network capacity, disk storage capacity, hot spots in data and index pages containing concurrent updates, and logging volume. In the past, the DBA has attempted to alleviate these bottlenecks by creative database object designs, including creative data partitioning schemes, special index clustering, embedding significant free space (i.e., empty space) in objects, as so forth.
In Db2 V12, there is a new method and algorithm to support heavy inserts. This algorithm detects when a mass insert operation is happening that involves non-clustering indexes and manages multiple threads that execute concurrently.
Coupled with this new algorithm, IBM also includes a DRDA Fast Load software product that enhances high-speed data loading from distributed clients. The DBA no longer needs to import data from external sources, convert to a local encoding scheme, re-format and clean it, and then execute a data load utility. The DRDA Fast Load product does all of this directly from a remote process. Data transformations can be accomplished using SQL.
Enhancing analytics Even Further
With business analytics (BI) becoming more and more common, it is only natural that companies have implemented BI software that accesses a big data application. On the z/OS platform, IBM’s solution is the IBM Db2 Analytics Accelerator (IDAA). (This formerly stand-alone hardware unit is now available in a version that is embedded in the zSystems hardware chassis.)
With the IDAA as your main big data application, mass-insert operations once again become a concern. Luckily, there are several Db2 V12 features that can address these concerns.
First, Db2 allows any table to exist in any of three states:
1. only in the Db2 database;
2. only in the IDAA; and,
3. existing in both.
Regardless of configuration, the Db2 Optimizer will intercept incoming SQL queries and determine which of the table(s) will be used to generate results. For a typical operational query, the Optimizer should reference the in-Db2 table; alternatively, for an analytic query the best choice will probably be the table in the IDAA. This is because operational queries usually reference only a few tables and return a small set of rows. In contrast, analytic queries can mention tens of tables, and sometimes require accessing millions (or billions) of rows in order to return the desired result.
Enter a new Db2 feature: high-speed multi-row insert for IDAA tables. If the table exists only in the IDAA, this feature provides a method of getting lots of data into IDAA-based tables quickly while avoiding the locking and potential outage of a Load utility.
Ultra-large Tables
Long ago, tables were considered large if they contained millions of rows. Later, billions of rows became typical, with total storage sizes in the tens of gigabytes. Today, tables containing hundreds of gigabytes of data are becoming more common, especially with the advent of big data, and terabyte-sized tables are on the horizon.
In versions of Db2 before version 12, the practical limit to the size of a table was 256 gigabytes per partition in a 64-partition tablespace, for a total of 16,000 gigabytes (16 terabytes). Partition sizes and number of partitions per tablespace were not easily changeable. If the DBA mis-sized the tablespace, it might have been possible to alter the tablespace size parameters, but this necessitated a reorganization of the entire tablespace. This reorg would require an amount of work disk space approximately the size of the tablespace; and, during the reorg the entire tablespace would not be available.
In Db2 version 12, IBM implements a new database object. This is a universal tablespace type that permits specification of partition size (DSSIZE) for each partition, increases the maximum partition size from 256 gigabytes to 1 terabyte, and increases the maximum number of partitions allowed. At its greatest extent, a tablespace can now be sized at 4,000 terabytes (4 petabytes), and the table in that tablespace can now contain up to 256 trillion rows.
While it will be rare that such a table would be implemented in native Db2, such tables might exist in the IDAA as accelerator-only tables. Indeed, loading such a table in Db2 would require a massive amount of disk space, as well as large amounts of elapsed time and CPU cycles.
The latest Features and Functions
As noted above, IBM is delivering new features and functions using a “continuous delivery” model. Since late 2016, there have been several features delivered in this fashion. Two of the more useful ones are noted here.
Data encryption and data compression. IBM deliberately designed Version 12 of Db2 to integrate with zSystems hardware. The latest hardware, the IBM z14, contains new hardware options for cryptographic processing. This is implemented using special central processor units. Db2 can use this new feature for data encryption using standard SQL language syntax. In addition, if the DBA has specified that a tablespace is to use data compression for disk storage, no software code needs to be executed in order to compress data for storage and decompress for retrieval; instead, the z14 can do this transparently to and from disk. This means a significant reduction in CPU and elapsed time for compressed tables.
Another similar feature involves data compression of Db2 index data. In general, indexes contain entries for each data row that specify the key for the index and a physical pointer to the row in which it resides. For example, an index on the Customer table may contain entries for the column Customer-Number. The DBA would then typically use the CLUSTER parameter to indicate that this index would be used to attempt to maintain data rows in the table in sorted order by the key value. Db2 now allows options that can implement index data compression without losing the sort order of index entries. This is called order-preserving compression.
Attached processor use during Db2 Load and Reorg. IBM allows sites to install several different versions of attached processors to the zSystem hardware. These processors can then function as CPUs for certain specified applications with one important difference: IBM will not charge for that processor’s use. This can be a great benefit for some shops that use any zSystems licensing mechanism that is dependent upon CPU cycles, where typically an average of the peak CPU use during a period is used to charge the customer. Applications that use an attached processor will not incur charges, so DBAs are always on the lookout for ways to do this.
One popular option is called the z Integrated Information Processor, or zIIP. This chip can be installed in most zSystems hardware, and the operating system recognizes it automatically on installation. Db2 version 12 also recognizes the chip, and the latest version of the Load and Reorganization utilities are now eligible for execution on the zIIP.
Typically, Load is used to load data into a table. The Reorg utility unloads table data, sorts it, then loads it back in. Hence, both utilities have a phase where they load data into the table. During this phase, called the reload phase, the utilities also save in work datasets all data required for any table indexes. Once the table is loaded, the index data is then sorted and loaded into the appropriate index objects. It is this phase that is now eligible for zIIP execution.
In these days of big data and expanding table sizes, this can result in a significant cost reduction. IBM estimates that in some cases up to 90 percent of normal cpu usage spent by Load and Reorg can now be eliminated by using zIIP processors.
0 comments:
Post a Comment