Friday, May 29, 2020

Big Data DBA Support Strategies

Big Data DBA, Database Tutorial and Material, Database Guides, Database Learning

The Database Administrator (DBA) is usually a technical professional who supports one or more hardware and software platforms that provide application solutions. However, technical details such as SQL tuning, hardware and software upgrades, and database designs tend to be tactical in nature.  It is essential that the DBA also maintain a strategic outlook in order to get ahead of potential problems. This article addresses two of these strategies: knowing application breaking points and preparing for future enhancements to big data applications.

Big Data Applications


The database administrator is usually an expert on SQL performance and tuning. With the added responsibility to support a big data application, it is expected that the DBA will monitor and tune business intelligence (BI) queries that users submit. One big difference is that the BI queries are almost always ad hoc queries. The DBA encounters them in real time; hence, it is necessary to be proactive with performance tuning tactics.

One additional issue is that some of these ad hoc BI queries will be changed in minor (or major) ways by users, and may end up as regular reporting tools. The result is that the DBA must not only monitor queries and make appropriate database or application changes (such as partitioning or indexing), but must also coordinate with user departments to determine how their queries may be used in the future.

These concerns should be reviewed regularly as DBAs do capacity planning for the federated data stores that the queries access: the big data application(s), one or more data warehouses, various data marts, and perhaps even some operational data.

Capacity Planning and Application Breaking Points


The first stage where the DBA should get involved is during application and database design. Important items include the following:

◉ Complexity. Some big data applications consist of hundreds of tables or more. Are there common queries that will Join these tables regularly? As more data is accumulated do queries get more complex?  For example, if the DBA decides to purge or archive older data to a separate table, are there user queries that must access all of the data, and hence will need to do an SQL Union operation of the current and purged data?

◉ Speed. Query elapsed time is usually not an issue in big data applications, at least for queries whose data are stored entirely in the big data environment. However, there are usually common data in the data warehouse that must also be Joined to these tables. This is because the warehouse already contains most dimensional data used for aggregation and subsetting. Queries that require a breakdown by geographic area, sales territory, store number, product type, or other dimension will require access to the data warehouse and big data application simultaneously, and this may lead the DBA to do some proactive performance tuning.

◉ Database design. Databases are usually designed to support one or more specific applications. As such, it is possible to embed business rules in the design rather than coding them in SQL or in an application program. Standard examples include parent-child relationships. Does an order require a valid customer? This verification can be coded in the database design as a foreign key relationship or a trigger. Must customer numbers be numeric? This data integrity check can be coded as a trigger or a column edit rule.

◉ Capacity growth. Big data applications start out big and tend to get bigger. There are several unexpected items that can affect the DBA’s storage and CPU calculations. One example is when the initial application specifications call for holding two years of data. After a few months, the business insists that seven years of data retention are necessary. Another example is when the DBA determines that several indexes are required for performance of critical SQL statements. These indexes take up storage space.

Big Data Application Testing


As the users begin to test their first queries, the DBA has the opportunity to gather more information, including capturing SQL statements for later analysis and some preliminary performance monitoring. These are the first indications of the expected stress to be placed on system resources when the application goes into production. In addition to estimates of production table sizes and an initial set of indexes, the DBA can document possible opportunities for future issues.

Some examples of these are the following:

◉ Backup and recovery. Sometimes disasters occur. While complete site-wide catastrophes are rare, more common issues such as invalid data loads or disk storage failures are possible. Big data applications are usually too big to recover by making regular full-sized backup copies of databases. It would take too long! More typically, the support staff implements either a replication backup scheme that posts copies big data changes to a backup site, or a second big data environment that mirrors the first. In either case, the desired result is to have a standby big data application ready in some form in case issues arise.

◉ Object management. Commonly, databases and tables store data in a physical sequence. For example, an order table may store orders in sequence by order number. As inserts, updates and deletes happen, it is possible for the strict sequencing to be broken. The DBA can then run a reorganization utility to restore the sequencing.  This is not an option for big data objects, as they are simply too big to reorganize speedily. Therefore, the DBA must design objects, data loads and data change logic such that reorganizations are not necessary.

◉ Resource usage. System resources include CPUs, memory, disk storage and network bandwidth. In addition, transaction elapsed times and physical I/O operations are usually considered as resources. The DBA should use big data application testing as an opportunity to forecast what resources are required by the application and queries, with an eye towards developing options should one or types of resources be constrained in the future. For example, if it appears that there is insufficient CPU power available to process queries in a timely manner, the DBA can predict when new CPUs must be added. Alternatively, the DBA may be able to implement database changes such as additional indexes that allow queries to use less CPU power.

Big Data in Production


With the application in production the DBA must now look to the future. Varied user communities will now have access to big data, the number of queries will increase dramatically, and the value of the results will be used to make business decisions. Your user community will then begin to diversify into several different categories, each with its own needs.

Sophisticated users with BI tools. Most big data applications are expensive investments, and business management will already have several analyses or use cases to get some immediate payback on their investment. These users will commonly be given one or more business intelligence tools that will build and submit queries using a graphical interface. These queries will be relatively complex in that they will access many tables and will contain requests for aggregations (sums, averages, minima and maxima) and dimensional breakdowns (by customer type, by product type, by time period, by geographical area, etc.).

For the future, the DBA can expect a lot of performance tuning based on massive data access. This leads to strategies such as data partitioning, key clustering, and the like. Luckily, examples of the first queries should already be in the DBA’s hands from the testing phase mentioned earlier.

Simple users. This set of users is the second to arrive. They already know SQL and have a passing familiarity with enterprise data, and will begin by coding their own queries against the big data application. Looking ahead, these users will be making many SQL coding mistakes, with potential performance consequences. Implement SQL reviews for them and consider giving them some training in basic performance tools such as the DB2 Explain facility.

DataMart users. These users are already accessing relational data extracted from the data warehouse or operational systems and stored in local data marts on local platforms. They will eventually see the need to take their specially constructed data and Join it to the big data application. This kind of multi-site access is fraught with performance problems as some queries may cause huge volumes of data to be transferred across the network before it can be determined if the data is really needed.

Here, the DBA must meet with these users as soon as possible to determine what options will work for them. One potential solution is to have multiple copies of certain data available in two or more physical locations. For example, data warehouse dimension tables can be copied into the big data application, preliminary data joins executed, and the results copied to the data mart location. The data mart users can then work with the pre-joined tables locally without forcing massive network data movement.

Wednesday, May 27, 2020

Defragmentation of Large Objects / SecureFiles LOBs in Oracle Database 20c

Oracle Database 20c, Oracle Database Certification, DB Exam Prep, Oracle Database Study Materials

In Oracle 20c, the init.ora parameter DB_SECUREFILE defaults to PREFERRED. This means that all large objects / LOBs are created as SecureFiles unless BASICFILE is explicitly specified in the LOB storage clause or the tablespace is an MSSM (= Manual Segment Space Management) tablespace.

Until 19c, only defragmentation of BasicFile LOBs was possible. Tim Hall showed, that in order to shrink a SecureFile LOB you need to move it.

SecureFiles defragmentation in 20c provides online defragmentation of allocated and freed space in SecureFiles segments for all types of SecureFiles LOBs – compressed, deduplicated and encrypted.

In an Oracle 20.2.0 database, I have a table called BLOGS. Let us turn on compression, deduplication and encryption:

SQL> ALTER TABLE blogs MODIFY LOB (blog_text)
(COMPRESS HIGH ENCRYPT DEDUPLICATE);

Table altered.

Oracle Database 20c, Oracle Database Certification, DB Exam Prep, Oracle Database Study Materials

Defragmentation can be done automatically by a background process and the segment advisor can estimate the fragmentation levels and how much space can be saved. Note that some temp segment space needed to hold intermediate results.

Let us try to defragment the SecureFiles LOB column BLOG_TEXT and use the segment space advisor to see what is forecast vs. reallity.

In order to defragment the SecureFiles LOBs, we need to use the shrink_clause. The shrink_clause lets us (in general) manually shrink space in a table, index-organized table or its overflow segment, index, partition, subpartition, LOB segment, materialized view, or materialized view log. This clause is valid only for segments in tablespaces with automatic segment management.

By default, Oracle Database compacts the segment, adjusts the high water mark, and releases the recuperated space immediately. Compacting the segment requires row movement. Therefore, you must enable row movement for the object you want to shrink before specifying this clause. Further, if your application has any rowid-based triggers, you should disable them before issuing this clause.

SQL> ALTER TABLE blogs ENABLE ROW MOVEMENT;

Table altered.

With release 20c, you can use the shrink_clause on SecureFile LOB segments by using these two ways in order to invoke it:

1. Target a specific LOB column and all its partitions:

ALTER TABLE blogs MODIFY LOB (blog_text) (SHRINK SPACE);

2. Cascade the shrink operation for all the LOB columns of the table and its partitions:

ALTER TABLE blogs SHRINK SPACE CASCADE;

Do not attempt to enable row movement for an index-organized table before specifying the shrink_clause. The ROWID of an index-organized table is its primary key, which never changes. Therefore, row movement is neither relevant nor valid for IOTs.

There are 2 important options/keywords with the shrink space syntax:

COMPACT: If you specify COMPACT, then Oracle only defragments the segment space and compacts the table rows for subsequent release. Meaning Oracle will recover space but will not amend the high water mark (HWM). So, Oracle does not release the space immediately.

CASCADE: If you specify CASCADE, then Oracle performs the same operations on all dependent objects of table, including secondary indexes on index-organized tables. Meaning Oracle will recover space for the object and all dependent objects.

Lat us follow the steps for the BLOGS table:

1. Run the Segment Space Advisor:

DECLARE
seg_task_id   number;
seg_task_name varchar2(100);
seg_task_desc varchar2(500);
BEGIN
seg_task_name := 'SecureFileDefragmentation1';
seg_task_desc := 'Manual Segment Advisor Run for table BLOGS';
dbms_advisor.create_task (
advisor_name := 'Segment Advisor',
task_id      := seg_task_id,
task_name    := seg_task_name,
task_desc    := seg_task_desc);
END;
/

DECLARE
obj_id        number;
BEGIN
dbms_advisor.create_object (
task_name   := 'SecureFileDefragmentation1',
object_type := 'TABLE',
attr1       := 'JULIAN',
attr2       := 'BLOGS',
attr3       := NULL,
attr4       := NULL,
attr5       := NULL,
object_id   := obj_id);
END;
/

BEGIN
dbms_advisor.set_task_parameter(
task_name := 'SecureFileDefragmentation1',
parameter := 'recommend_all',
value     := 'TRUE');
END;
/

exec dbms_advisor.execute_task('SecureFileDefragmentation1');

2. Let us check the findings from DBA_ADVISOR_FINDINGS:

SQL> select message,more_info from dba_advisor_findings where task_name='SecureFileDefragmentation1';

MESSAGE
-------
MORE_INFO
-----------------------------------------------------------------------
The free space in the object is less than 10MB.
Allocated Space:15728640: Used Space:4013928: Reclaimable Space :180376:

3. Now let us defragment the SecureFile LOB:

SQL> select bytes from dba_segments where segment_name='BLOGS';

     BYTES
----------
  15728640

SQL> ALTER TABLE blogs MODIFY LOB (blog_text) (SHRINK SPACE);

Table altered.

SQL> select bytes from dba_segments where segment_name='BLOGS';

     BYTES
----------
  14745600

SQL> ALTER TABLE blogs SHRINK SPACE CASCADE;

Table altered.

SQL> select bytes from dba_segments where segment_name='BLOGS';

     BYTES
----------
   1048576

As you can see, with the simple operations above, we managed to decrease the size of the BLOGs table 15 times: from 15728640 to 1048576 bytes.

The shrink_clause is subject to the following restrictions:

– You cannot combine this clause with any other clauses in the same ALTER TABLE statement.

– You cannot specify this clause for a cluster, a clustered table, or any object with a LONG column

– Segment shrink is not supported for tables with function-based indexes, domain indexes, or bitmap join indexes

– With this clause, Oracle does not shrink mapping tables of index-organized tables, even if you specify CASCADE

– You can specify the shrink_clause for a table with advanced row compression enabled (ROW STORE COMPRESS ADVANCED) but you cannot specify this clause for a table with any other type of table compression enabled

– You cannot shrink a table that is the master table of an ON COMMIT materialized view

– Rowid materialized views must be rebuilt after the shrink operation.

Tuesday, May 26, 2020

Understanding Query Execution in Relational Database System Architecture

The Relational Database Management System (RDBMS) is built with numerous complex algorithms and data structure just to store and retrieve information properly. The complexity is almost akin to an operating system that functions in sync with many features almost in real time. Modern RDBMS has built-in facility for memory management, file buffering, network communication support etc. They form the basic architecture of the RDBMS package. The article provides a glimpse of what goes behind the scene when a user submits a query until the result is obtained from the database.

Understanding RDBMS


An RDBMS package is typically a database server that serves multiple clients via communication pathways under the aegis of network protocol such as such as socket, pipes etc. In a standalone database application client communicates with the database via programmatic interfaces. In such a case the database server becomes part of the client application or vice versa. Sometimes the database is contained within the embedded system as a slave to the host system. Generally, in a large database application, the RDBMS server is separated from the concern on the application by hosting the server in a remote location. The business logic interacts with the database server via network as per requirement. Regardless, the logic for query processing remains the same be it an embedded database application, network application or a standalone application.

Database Connectors


Applications connect to the database using a set of protocols called database connectors. The Open Database Connectivity (ODBC) is a well-known database connector that an application can use to connect to almost any database. There are also vendor specific database connectors for an RDBMS such as MySQL. MySQL supports connectors for Java (JDBC), PHP, Python, .NET etc. These implementations mostly support communication over network protocols. These connectors are designed (API) to transfer SQL commands to the database server and retrieve information upon request by the client. The connectors typically consist of database driver and client access APIs.

Query Interface


Queries are nothing more than questions put to the database according to the syntax and semantics of standard query language called SQL (Structured Query Language). The database server understands the language and replies back as per the query submitted. According to the semantics of SQL, queries can be of two types. The first type of query is a Data Definition Language (DDL) query, which is typically used to create and do things with the dataabse such as creating and altering tables, defining indexes, managing constraints, etc. A second type of query called the Data Manipulation Query (DML) is used to work on the data of the database. This includes actions such as SELECT querying, updating, and deleting data in the database tables.

A typical SELECT query syntax may be written as follows. The square bracket ([]) represents optional parameters and the lowercase notation depicts user-defined variables.

SELECT [ DISTINCT ] columns
FROM tables
[ WHERE expression ]
[ GROUP BY columns ]
[ HAVING expression ]
[ ORDER BY columns ] ;

◉ The DISTINCT keyword removes the duplicate records in the final result.
◉ The FORM clause forms a projection on the references that appear in the other clauses.
◉ The WHERE applies the expression on the referenced table.
◉ The GROUP BY clause groups the result according to the specified attribute.
◉ The HAVING clause applies filter on the groups.
◉ The ORDER BY clause sorts the result.

Query Processing


Once the client submits a database query statement via network protocol to the database server, it is first interpreted and then executed. The interpretation is meant to decipher the meaning of the query. This is done by parsing the SQL statement and breaking it into elements before executing. The interpretation of the query is a two-step process: one, in the logical plan it describes what the query is supposed to do and secondly, in the physical plan, it describes how to implement the query.

The physical plan of the query is handled by the database system’s query execution engine. A tree structure is created where each node represents query operator with number of children. These children represent a number of tables involved in the operation. The query is passed through several phases before execution such as parsing, validation, optimization, plan generation/compilation and finally execution.

◉ Parsing breaks the SQL statement into parts, validates it and translate the logical query (SQL query) into a query tree according to the syntactical scheme of the relational algebra. This is the logical plan.

◉ The logical query is then translated into a physical plan. There can be many such plans, but the query optimizer finds the best one, say, according to the estimated execution performance. This is done by taking on the relational algebra tree into optimizer’s search space and expanding it by forming alternative execution plans and then finally choosing the best among them. The result is akin to the code-generation part the compiling of SQL. The critical resources to optimize the code is obtained from the database system’s catalog that contains the information about number if tuples, and many other things such as stored relations referenced by the query etc. The optimizer finally copies the optimal plan from the memory structure and send it to the query execution engine. The query execution engine executes the plan using database relation as input and generates new table with rows and columns that matches the query criteria.

Oracle Database Tutorial and Material, Database Learning, Database Guides, Database Certification, Database RDBMS

Note that the plan is always optimal or near optimal within the search space of the optimizer. The interpretation of a SQL query by the RDBMS is not that simple after all. Optimization is a costly affair because it analyses on alternative execution plans. A single query can have an infinite number of possibilities. Therefore, it consumes additional processing time impacting on both the query optimizer, query execution engine and overall database response time.

Monday, May 25, 2020

DBMS_JOB Jobs Converted to DBMS_SCHEDULER Jobs in Oracle Database 19c

Database Tutorial and Material, Database Guides, Database Learning, Database Certification, Database Exam Prep

The DBMS_JOB package has been deprecated since 12cR2. Oracle 19c takes the demise of the DBMS_JOB package a step further by converting any DBMS_JOB jobs to DBMS_SCHEDULER jobs.

◉ Create a Job Using DBMS_JOB


In Oracle 19c jobs created using the DBMS_JOB package are implemented as DBMS_SCHEDULER jobs, as demonstrated below.

We can see from the output below we don't have any jobs for this user.

CONN test/test@pdb1

COLUMN what FORMAT A30

SELECT job, what FROM user_jobs;

0 rows selected.

SQL>

COLUMN job_name FORMAT A30
COLUMN job_action FORMAT A30

SELECT job_name, job_action FROM user_scheduler_jobs;

0 rows selected.

SQL>

We create a job using the DBMS_JOB.SUBMIT procedure, but we are not going to issue a COMMIT statement.

DECLARE
  l_job  PLS_INTEGER;
BEGIN
  DBMS_JOB.submit (
    job       => l_job,
    what      => 'BEGIN NULL; END;',
    next_date => TRUNC(SYSDATE)+1,
    interval  => 'TRUNC(SYSDATE)+1'
  );
END;
/

We can see the job is listed in the USER_JOBS and USER_SCHEDULER_JOBS views.

SELECT job, what FROM user_jobs;

       JOB WHAT
---------- ------------------------------
         1 BEGIN NULL; END;

1 row selected.

SQL>

SELECT job_name, job_action FROM user_scheduler_jobs;

JOB_NAME                        JOB_ACTION
------------------------------ ------------------------------
DBMS_JOB$_1                  BEGIN NULL; END;

1 row selected.

SQL>

Notice the JOB_NAME of "DBMS_JOB$_?" for the DBMS_SCHEDULER job that has been generated.

◉ Transactional Jobs


One of the reasons people still use the DBMS_JOB package is it allows you to create jobs that are part of a bigger transaction. If a failure causes an exception, all the current work along with the jobs defined as part of it can be rolled back. We can demonstrate this using the job created above. Remember, we didn't issue a COMMIT, so the job is not visible from another session connected to the same user.

Without closing the original session, open a new connection and check for the jobs.

CONN test/test@pdb1

SELECT job, what FROM user_jobs;

0 rows selected.

SQL>

SELECT job_name, job_action FROM user_scheduler_jobs;

0 rows selected.

SQL>

Now return to the original session and the jobs are still visible.

SELECT job, what FROM user_jobs;

       JOB   WHAT
---------- ------------------------------
         1   BEGIN NULL; END;

1 row selected.

SQL>

SELECT job_name, job_action FROM user_scheduler_jobs;

JOB_NAME                         JOB_ACTION
------------------------------ ------------------------------
DBMS_JOB$_1                   BEGIN NULL; END;

1 row selected.

SQL>

Issue a ROLLBACK, and the job definition will be removed.

ROLLBACK;

SELECT job, what FROM user_jobs;

0 rows selected.

SQL>

SELECT job_name, job_action FROM user_scheduler_jobs;

0 rows selected.

SQL>

As a result, the DBMS_JOB package can still be used to create transactional jobs, that are implemented using the DBMS_SCHEDULER scheduler. This also provides backwards compatibility.

◉ Materialized View Refresh Groups


Up to and including Oracle 18c, materialized view refresh groups were implemented using the kernel APIs exposed by the old DBMS_JOB package. In Oracle 19c things look a little different.

Create a table, materialized and refresh group including that materialized view.

CREATE TABLE t1 (id NUMBER);

CREATE MATERIALIZED VIEW t1_mv
REFRESH FORCE
ON DEMAND
AS
SELECT * FROM t1;

BEGIN
   DBMS_REFRESH.make(
     name                 => 'MINUTE_REFRESH',
     list                 => '',
     next_date            => SYSDATE,
     interval             => '/*1:Mins*/ SYSDATE + 1/(60*24)',
     implicit_destroy     => FALSE,
     lax                  => FALSE,
     job                  => 0,
     rollback_seg         => NULL,
     push_deferred_rpc    => TRUE,
     refresh_after_errors => TRUE,
     purge_option         => NULL,
     parallelism          => NULL,
     heap_size            => NULL);
END;
/

BEGIN
   DBMS_REFRESH.add(
     name => 'MINUTE_REFRESH',
     list => 'T1_MV',
     lax  => TRUE);
END;
/

We don't see a job in the USER_JOBS view, but we do see one in the USER_SCHEDULER_JOBS view.

SELECT job, what FROM user_jobs;

0 rows selected.

SQL>


SELECT job_name, job_action FROM user_scheduler_jobs;

JOB_NAME                       JOB_ACTION
------------------------------ ------------------------------
MV_RF$J_0_S_210           dbms_refresh.refresh('"TEST"." MINUTE_REFRESH"');

1 row selected.

SQL>

But this job is transactional, in that a ROLLBACK will remove the job, along with the refresh group definition.

ROLLBACK;

SELECT job, what FROM user_jobs;

0 rows selected.

SQL>


SELECT job_name, job_action FROM user_scheduler_jobs;

0 rows selected.

SQL>

It would appear the refresh group functionality has been re-implemented using the kernel APIs that sit under the DBMS_SCHEDULER package, but without the implicit commit. Similar to the way the DBMS_JOB interface has been re-implemented. This is not 100% backwards compatible, as the associated job is not visible in the USER_JOBS view. If you have any functionality that relies on the link between the refresh groups and the old scheduler, it will need revisiting. I can't imagine that will be a problem for most people.

You can clean up the test table and materialized view using these commands.

DROP MATERIALIZED VIEW t1_mv;
DROP TABLE t1 PURGE;

◉ Security : The CREATE JOB Privilege is Required?


At first glance the loophole discussed here sounds really bad, but remember that even in Oracle 18c, any user connected to the database could create a job using the DBMS_JOB interface, so this loophole is no worse than what came before. It just breaks the DBMS_SCHEDULER security.

As Connor McDonald pointed out, the conversion means users require the CREATE JOB privilege to allow them to create jobs using the DBMS_JOB package, where previously they didn't. We can see this if we create a user with just the CREATE SESSION privilege and attempt to create a job.

CREATE USER test2 IDENTIFIED BY test2;
GRANT CREATE SESSION TO test2;

CONN test2/test2@pdb1

DECLARE
  l_job  PLS_INTEGER;
BEGIN
  DBMS_JOB.submit (
    job       => l_job,
    what      => 'BEGIN NULL; END;',
    next_date => TRUNC(SYSDATE)+1,
    interval  => 'TRUNC(SYSDATE)+1'
  );
END;
/

Error report -
ORA-27486: insufficient privileges
ORA-06512: at "SYS.DBMS_ISCHED", line 9387
ORA-06512: at "SYS.DBMS_ISCHED", line 9376
ORA-06512: at "SYS.DBMS_ISCHED", line 175
ORA-06512: at "SYS.DBMS_ISCHED", line 9302
ORA-06512: at "SYS.DBMS_IJOB", line 196
ORA-06512: at "SYS.DBMS_JOB", line 168
ORA-06512: at line 4
27486. 00000 -  "insufficient privileges"
*Cause:    An attempt was made to perform a scheduler operation without the
           required privileges.
*Action:   Ask a sufficiently privileged user to perform the requested
           operation, or grant the required privileges to the proper user(s).
SQL>

There is a loophole caused by the refresh group implementation. If we repeat the previous refresh group example, we can see we are able to create a job without the CREATE JOB privilege.

BEGIN
   DBMS_REFRESH.make(
     name                 => 'MINUTE_REFRESH',
     list                 => '',
     next_date            => SYSDATE,
     interval             => '/*1:Mins*/ SYSDATE + 1/(60*24)',
     implicit_destroy     => FALSE,
     lax                  => FALSE,
     job                  => 0,
     rollback_seg         => NULL,
     push_deferred_rpc    => TRUE,
     refresh_after_errors => TRUE,
     purge_option         => NULL,
     parallelism          => NULL,
     heap_size            => NULL);
END;
/

SELECT job_name, job_action FROM user_scheduler_jobs;

JOB_NAME                       JOB_ACTION
------------------------------ ------------------------------
MV_RF$J_0_S_242         dbms_refresh.refresh('"TEST2" "MINUTE_REFRESH"');


1 row selected.

SQL>

That in itself is not devastating because it's for a very specific purpose, but most of Oracle's security is based on you being able to do whatever you want with objects you already own, so what happens if we try to change the attributes?

BEGIN
  DBMS_SCHEDULER.set_attribute (
    name      => 'MV_RF$J_0_S_242',
    attribute => 'job_action',
    value     => 'BEGIN NULL; END;'
  );
END;
/

SELECT job_name, job_action FROM user_scheduler_jobs;

JOB_NAME                         JOB_ACTION
------------------------------ -----------------------------
MV_RF$J_0_S_242          BEGIN NULL; END;

1 row selected.

SQL>

So we can create a job using the DBMS_REFRESH package, then alter it to suit our purpose, giving us the ability to create a job without the need for the CREATE JOB privilege.

It would appear the re-implementation of the DBMS_REFRESH package has not followed the same security rules as that used by the other scheduler implementations. I'm sure this will get fixed in a future release.

Until this issue is resolved, you should probably revoke EXECUTE on the DBMS_REFRESH package from PUBLIC, as you may already do for the DBMS_JOB package.

Note. I raised this issue as "SR 3-20860955641 : Jobs can be created without the CREATE JOB privilege". This is now Bug 30357828 and is being worked on.

Sunday, May 24, 2020

Oracle Database to SQL Server Comparisons

No, this isn’t one of those “why my RDBMS is better than your RDBMS” comparisons. One of the initial stumbling blocks a DBA on one system encounters is learning or trying to figure out where features of your system live or reside on the other (less familiar to you) system. Let’s approach this feature comparison by taking SQL Server and mapping its features back into Oracle. By way of time and constraints, this comparison will not be 100% inclusive of either system, but it will show (as much as possible) the SQL Server to Oracle mapping using what you can see in SQL Server Management Studio (SSMS).

The Object Explorer pane in SSMS serves as a good frame of reference to do the reverse mapping. I could have used Oracle and mapped over to SQL Server, but two reasons negated that. The first is where do you find a convenient one-stop interface in Oracle? Definitely not in SQL*Plus, and it would be hard to argue that Database Control’s interface is anywhere near as utilitarian as SSMS. The other is that most readers of this article are coming from an Oracle background, and we want to take what we know about Oracle and see how that fits or works in MSSQL.

The versions used here are Oracle 10gR2 (platform does not matter) and SQL Server 2005 on Windows Server 2003 (also R2). There are some slight differences between MSSQL 2005 and 2008, but as far as the Windows OS is concerned, it won’t be an issue. Oracle 10g is still in wide use as is MSSQL 2003/Server 2003, so the comparisons made here will work for most of the people most of the time.

Database Engine


Let’s start with a fully collapsed tree in Object Explorer.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

Much like what SQL*Plus can show at the command line interface with respect to who you are logged in as, the same appears at the top showing which database engine I’m connected to. The database engine (SID) is named WIN2003, which also happens to be the name of the Windows server. A one-to-one match like this generally implies that the default instance name chosen during installation is “dot” or the name of the server. The version of MSSQL is also shown, as is who is connected in this particular session. By looking at the connected user or account (and assuming someone wouldn’t create an account explicitly named this way), we know the login was validated via Windows authentication. Can you tell by what is shown if SQL Server authentication is enabled? No – you would have to go into Properties>Security to see which option (Windows only or both) is selected. The authentication mode is analogous to letting Oracle maintain security via its internal mechanisms or using OPS (operating system authentication).

What else could you safely assume about what the connected user can do? By default, the Administrator (the user who started the SSMS session) will have “sa” privileges (sa being analogous to a combination of SYS and SYSTEM). If you look at the various MSSQL-related groups on Windows, you won’t see Administrator explicitly listed. In Oracle, looking at the ORA_DBA group, Administrator is listed as a member of the group.

Right-click the connection and select Properties.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

Many of the settings buried in each “page” correlate to parameters in Oracle; not so much on the General page, but definitely so in the rest. One setting of particular importance has to do with how much memory MSSQL is allowed to use. Click Memory and the display changes to what is shown below.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

The value of 2147483647 MB is for all practical purposes unlimited. You would definitely want to set this to something lower than what the OS has available. Does this memory setting represent ALL of the memory used by MSSQL? No – it represents the buffer cache, which is typically the largest memory component of MSSQL. MSSQL has other things going on outside of the buffer cache, and here is where Oracle and MSSQL are similar: it can be somewhat difficult to determine how much memory overall, each system uses. Microsoft has performance counters, which can be selected in a management console, and the information presented therein can help you come up with an answer to “how much memory is SQL Server using?”

Several items on the Advanced page will also be of some interest.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

Although there are some Net Services-like comparisons within MSSQL’s Surface Area Configuration, the Remote Login Timeout is the closest to what is found in the sqlnet.ora file for connection timeout settings. This may be one of the first places you go to if remote clients (across a slow network) are experiencing timeout issues while trying to connect.

Within Oracle, parallel query or parallel execution operations can greatly help – or hinder – a database’s performance. The same holds true in MSSQL. Two factors come into play: the cost threshold and the max degree of parallelism. MSSQL also has an optimizer, and if it determines the cost is above 5, then parallelism will be considered. Okay, if parallelism is considered, how much is allowed? The zero setting means unlimited (up to the number of CPUs, and is also constrained by the edition, Standard versus Enterprise). Unfortunately (or not, depends on your viewpoint), this is an across the board setting. We can open up Oracle for maximum use of parallelism, but it can also be throttled overall and at the table level.

Databases


Alright, we understand what “database” means, and as far as tables and indexes are concerned, the two systems are not that different. Looking at the list of databases shown below, what do we know about WIN2003?

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

Right off the bat we know that at most, what we are seeing is a slightly more than default installation. The reason we know this is because of the ReportServer (and its temp DB) database being present. SQL Server Reporting Services (SSRS) was either selected during installation or installed during another “setup.exe” session.

The System Databases are of more interest to us right now.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

Four databases are always present in MSSQL: master, model, msdb and tempdb. The easy analogy between systems is that master and msdb correspond to the SYSTEM and SYSAUX tablespaces in Oracle. Oracle does not have a model database, but it does have templates, so model in MSSQL is quite similar to the templates you see in Database Configuration Assistant.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

The tempdb database lives on as the TEMP tablespace in Oracle, matching many of the same features and purposes. The tempdb is re-created whenever the MSSQL instance starts.

How do you create a database in MSSQL? One way is to use the wizard (right-click Databases>New Database…). Once created, add tables, indexes and programmability (triggers, stored procedures) and off you go. Another way is to plug in a database file (and possibly a transaction log file) via an attachment process or from restoring a backup. An attached or restored database isn’t necessarily new (they had to have come from somewhere else), but they may be new to your instance. This process also leads to the idea of cloning.

How would you clone a production database into something else (e.g., a development database)? MSSQL offers several ways to clone a database. One of the simplest methods is to take (or use) a backup of the source and use the BAK file for a database restore (right-click Databases>Restore Database…). Database Control in Oracle offers a Clone Database feature (at extra cost). You can also use RMAN or some type of backup (even export/import, traditional or datapump).

The database level is where a huge departure between Oracle and MSSQL takes place. In Oracle, we are archivelog mode or not, and that applies to everything (minus read only tablespaces). In MSSQL, a database runs in one of three recovery models (full, simple and bulk logged). Instead of having archived redo logs filling up an archived log destination and hanging the instance, with MSSQL you can grow the transaction log (LDF file) to such an extent that you fill up a disk and cause the same problem. Taking periodic backups of the transaction log (and then truncating the log) prevents this problem in the FULL model. It then becomes analogous to a more granular point in time recovery capability.

Database files between the two systems are somewhat similar. Many, if not most MSSQL databases will have two files: the database file (MDF file extension) and a transaction log file (LDF file extension). You can have more than one MDF file and even have file groups. The drill down path to find file information in MSSQL (and yes this is also easily done via T-SQL commands) is found in the properties of a database (on the Files page, shown below).

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

By default, we get the PRIMARY filegroup. Other filegroups can be used/created, and one reason for that would be for use of partitioned tables. For a production database, the Autogrowth rate/setting can be important. You can almost think of this as a percentage based extent growth setting in Oracle. The options for growth are similar to what Oracle offers.

Oracle Database, Oracle Database Study Material, Oracle Database Guides, Oracle Database Certification, Oracle Database Exam Prep

As you can see, MSSQL can grow files in one of two ways, cap the size or not, and determine who should grow a file. Just as users can spend time waiting for file growth in Oracle, users (or the system itself) can spend time waiting for file extend operations in MSSQL. If a file is going to grow a lot, then grow it in larger chunks at a time, but not too large because then you spend (too much) time waiting for that too. Lastly, do you want a file to be able to eat up all remaining disk space? If not, then restrict the file growth, just as you would in Oracle.

Source: databasejournal.com

Friday, May 22, 2020

DB2 V12 Features Supporting Large Databases

Oracle Database Tutorial and Material, DB Guides, DB Certification, DB Exam Prep, DB Study Material

Big data applications were once limited to hybrid hardware/software platforms. Now, recent advances are allowing applications like these to be integrated with and federated into operational systems. In particular, IBM's DB2 for z/OS Version 12 (DB2 V12) delivers new features and functions that allow the DBAs to design, define and implement very large databases and business intelligence query platforms that fulfill some big data expectations. Such applications can then provide value to the enterprise without expending money and time on specialized infrastructure, and can also be the first step towards building out one or more true big data applications.

Dynamic SQL Plan Stability


Many operational systems access DB2 through SQL statements that are hard-coded and never change. These are called static SQL statements. During the program preparation process called “bind”, these static SQL statements are read and stored in DB2 along with the optimal data access path. During execution, DB2 uses the pre-determined access path of the SQL statement. The DBA regularly reviews these stored statements and the corresponding access paths in order to do performance analysis.

Contrast this with SQL statements that arrive at DB2 in an ad hoc fashion. These SQL statements are created real-time in applications or by query tools, and are termed dynamic SQL. DB2 is forced to parse these SQL statements on arrival, determine the optimal access path and then execute the query. This leads to several issues. Repeated execution of the same SQL statement or similar statements may result in different access paths at different times, perhaps as a result of one of the following:

◉ Changes in data distribution statistics;
◉ Changes in DB2 code due to patches, upgrades, or new releases;
◉ Changes in DB2 subsystem execution parameters;
◉ Changes in hardware or software configurations, including memory allocations and CPU availability.

In order to reduce the instability of dynamic SQL access paths, DB2 V12 includes several parameters that can be set to capture and monitor dynamic SQL statements and maintain current access paths despite hardware or software configuration changes.

Dynamic SQL can now be captured, monitored, and reviewed for performance. The DBA can configure DB2 to maintain particular access paths for a set of dynamic SQL statements while allowing others’ performance to change as the system changes. This is extremely useful for applications that construct queries based on a set of user choices.

For example, consider a customer information system that retrieves a set of customer records based on criteria such as customer name, location, account type and account balance. It is possible that only some of the criteria are entered by the user, so the application may construct one of several different SQL statements. As the number of criteria increases the number of possible SQL statements increases geometrically.

With dynamic SQL plan stability, the DBA can capture the various SQL statements along with their access paths and have DB2 continue to use the same access path while making performance changes such as adding memory, adding indexes or updating data distribution statistics. This alleviates the potential problem of these changes potentially influencing DB2 to choose another access path that does not perform as well.  In short, DBAs now have the capability to measure whether their performance changes actually result in better access paths before implementing the changes in production.

DRDA Fast Load


Many applications use the LOAD utility to perform large data loads. (Single rows can easily be loaded using the SQL Insert statement.)  Some examples of data loads include loading or appending today’s transactions to a Transaction table, or loading new customers to a Customer table.  In a data warehouse environment, the LOAD utility is used extensively when extracting daily data from operational systems as part of the extract transform load (ETL) process.

For data originating in remote systems, the ETL process can be quite cumbersome and I/O-intensive. Consider a daily transaction file on a remote system. The file needs to be extracted from the originating system (copy 1), sent to the mainframe (copy 2), transformed and sorted in the proper key sequence (copy 3) then loaded into a DB2 table. The multiple copies and multiple data movements seem redundant.

The data protocol used to communicate between the remote system and mainframe DB2 is called distributed relational database architecture, or DRDA. With the new DRDA Fast Load capability, applications can invoke a remote load process using a local DB2 client application. This application will send the data directly to the mainframe for processing directly by the Load utility.

This has several advantages for the DBA to consider. It reduces mainframe CPU by placing the extraction and transformation logic on the same platform as the originating system. It also bypasses several file creation steps, reducing I/Os, disk space usage and total elapsed time.

Table Partition Limitations Removed


One of the reasons that big data appliances are so popular is that they can store so much data. While this may be obvious, prior versions of DB2 had limits on table sizes that precluded using DB2 tables for very large databases. Thus, big data applications were used to store large tables and the result was integrated into the current DB2 subsystem, usually including the enterprise data warehouse.

Prior to DB2 V12, table partitions were limited as well as the total number of partitions per tablespace. A typical table using 4k pages was limited to 16 terabytes (TB) total size.

DB2 V12 introduces a new type of partitioned table called “partition by range relative page number”. This tablespace type allows each partition to be 1 TB in size, and the total size of a tablespace is increased by a factor of over 100, to 4 petabytes (PB). In terms of rows, the largest number of rows allowed in one of these new tables is approximately 280 trillion.

This massive increase in allowed table size means that DB2 now includes native options for storing large amounts of data. This may be a viable option for shops whose limited budgets preclude leasing a big data appliance, or who wish to implement a big-data-like application in native DB2 first as a proof of concept prior to considering a big data application.

Continuous Delivery


One issue that DBAs and systems support staff have had concerns about in recent years is the complexity of DB2 version upgrades. Many mainframe shops implement multiple DB2 subsystems in a cluster configuration called data sharing. Since each DB2 subsystem is a separate software instance, each must be updated when IBM issues DB2 software updates or a new version. Some of the new version code and system databases are shared by all the subsystems, and some is unique to each subsystem. Thus, the process for upgrading them all can be complex, especially when systems support must take into account undoing the upgrade or falling back to a prior version should issues occur.

With continuous delivery, IBM has simplified their method of software delivery. Upgrades are now provided in smaller pieces that are much easier to implement and faster to test. Shops can choose which subsets of functionality they wish to apply at what time and to what subsystems.

Thursday, May 21, 2020

Big Data Analytics on Current Data

Big Data Analytics, DB Exam Study, Database Tutorial and Material, Database Certification

Typical big data applications have a load phase, where a snapshot of data is extracted from operational systems, transformed and cleaned, and then loaded into your big data appliance. Analysts and data scientists can then execute business analytics (BI) queries against the data store in search of patterns and potential marketing opportunities and cost savings. One disadvantage of this architecture is that your appliance becomes more like a large data warehouse, in that it does not contain the most up-to-date data.

IBM now provides an option to configure its Db2 version 11 for z/OS and complementary IBM Db2 Analytics Accelerator (IDAA) to permit concurrent transactional processing of operational data with analytics processing of data in the appliance. This new feature, zero-latency HTAP (hybrid transactional analytical processing) provides a patented replication process that propagates native Db2 table changes to the IDAA data store. This then allows BI queries to act on up-to-date data, providing more value to the enterprise, and enabling analytics embedded in operational applications.

State of the Art


Early information technology (IT) systems and applications were fairly simple. Applications read and wrote records to and from keyed files or databases, and this held true for both internal applications (such as accounting, shipping and receiving, and various reports) and external or customer-facing ones, which included order entry, bank teller screens and information kiosks.

Over time application complexity grew along with data volumes. IT began to create data stores that were much more than simple keyed files. Daily extracts from order entry systems were sent to data marts that were analyzed to predict product shortages in stores and warehouses, which then sent data to shipping systems. As historical analysis and reporting became more important to the business, daily extracts were accumulated in a data warehouse, providing customer, account and product information that could be aggregated by region or time period. Analysts could then review these data for trends and make predictions about which products sold best in what regions or during which time periods.

Today, there are many more CPU-intensive and data-intensive operations across IT than ever before. Operational systems have grown from simple, in-house programs to include large suites of software from third-path vendors, including enterprise resource planning (ERP) packages and business intelligence (BI) systems for querying the data warehouse. Extract files from operational systems have grown much larger as analysts requested more and more data. These bulk data files became the source for loading into big data appliances, whose proprietary data formats and massively parallel data processing resulted in greatly reduced query response times.

Today’s Requirements


Now, IT is at a crossroads. The business has two new needs, and these needs are extremely difficult to satisfy simultaneously. They are as follows.

The need for analytical queries to access current data. Advances in big data query speed and the value of timely business intelligence made it essential that some analytic queries execute against today’s data. This can occur when real-time data is more important or relevant than historical data, or when the incidence of important events (and the data corresponding to them) is relatively rare. Consider the example of a BI analyst reviewing product purchases. The relevant queries may calculate the total profit realized for all products in every month for every sales region. For popular products with hundreds of orders per day, including today’s orders in the calculation may not make a significant difference; however, for products ordered only once per month per region, today’s data may almost double the totals.

The need for some operational applications to run analytical queries. This is mainly in situations where real-time data is far more valuable than historical data. For example, consider health care equipment monitoring a patient. Historical analysis may indicate the conditions under which you may be able to predict when events such as strokes or heart attacks may be likely; however, this may only be of value to a patient if the prediction is done immediately as data becomes available. Another example is a supply chain system that predicts product shortages and responds by scheduling shipments; as above, this is most valuable when analyses can make predictions based on current data. A third example is an on-line order entry system that needs to detect potential fraud. Fraud patterns may exist in the historical data stored in the warehouse or the big data appliance, and there may be a business requirement to detect fraud as early as possible, even at the point of initial order entry.

Early Issues


The first attempts to merge operational and analytical systems and data met with several problems. One overarching issue was the different performance and resource usage characteristics of the two environments. Owners of operational systems require strict service level agreements (SLAs) that include limits on maximum transaction elapsed times. To address this, DBAs tend to use indexes and data distribution schemes such as data clustering and partitioning to speed query performance. The analytic environment is quite different. Here, massive amounts of data are stored in a proprietary format that is optimized for fast analytic query operations such as aggregation and subsetting. Indexes and classic partitioning schemes are generally not available, with performance depending more upon parallel access to a large disk array.

Other significant issues included the following.

Data latency. Data extracted from operational systems took time to make its way to the warehouse or big data appliance, mostly because the extract, transform and load (ETL) processes needed to pass all data through multiple processes. For example, today’s current orders may require shipment from the system of origin to the analytical environment, passing through several transformation and cleaning jobs, before being loaded. This delay meant that queries could not access up-to-date data.

Data synchronization. Operational data extract jobs were usually executed on a file or database basis; that is, separate files were created for each of products, customers, orders, accounts and so forth. Since each file was then transformed and loaded separately, the analytic environment might contain the most recent data for only some entities, leading to inconsistencies (such as today’s orders but not today’s customers).

Infrastructure complexity. Applications across IT usually exist on multiple hardware platforms, with data in differing formats. Adding cross-system interfaces that couple the operational and analytics environment may require sophisticated hardware or connectivity techniques, special performance and tuning processes, and staff that are experienced in a host of different technologies.

Early Solutions


IT management was able to meet the two conflicting needs (queries in the analytic environment run against current data, and operational applications running analytic queries) in some limited cases. They developed methods of quickly loading selected operational data into the warehouse or big data appliance for operational access. These methods usually selected limited amounts of real-time data, bypassed transformations and used various fast load processes.

One possibility was to use data replication technology to extract current data on a transaction by transaction basis and pass it to the analytics environment for immediate publishing, rather than accumulating all transactions for the entire day for a nightly load.

The next logical step was to determine if there was a combined, or federated, solution that could logically merge the operational and analytic environments. Alternatively, was some kind of hybrid solution possible?

In early 2014, Gartner defined a new category of environment that they called “hybrid transaction / analytical processing” (HTAP). They characterized this as a new “opportunity” for IT to “... empower application leaders to innovate via greater situation awareness and improved business agility”.

This description did not specify how to create an HTAP environment. Businesses and vendors attempted several different configurations, each with their advantages and disadvantages.

A multi-site solution. This is probably the simplest, and is the most likely one to exist currently in your IT shop. Operational and on-line transaction processing are executing in one system, while analytic data is stored and queried in another. If the two systems are physically separate, then you can either develop a federated answer (where a new, central hardware/software solution will access both environments) or develop methods to transfer processing and/or data from one environment to another. For example, you can copy relevant subsets of analytic data to your operational system for applications to do analytics locally. Another method would be to develop pre-coded analytic processes that are installed in your analytics environment, then couple that with a high-speed execution and data transfer method. Stored procedures are one possibility for doing this.

A single-site solution. Merging your two differing environments into a single one may seem like a recipe for disaster; after all, they have two different performance characteristics. In addition, you wouldn’t want resource usage such as CPU cycles used for analytics queries causing performance issues with on-line transactions. Luckily, there are several ways to address this issue. IBM z-series mainframes allow the definition of multiple instances of the operating system called logical partitions (LPARs). Each LPAR can be defined with specific resources and resource caps. For example, the analytics LPAR could have a limit on the amount of CPU it could use.

A hybrid solution. This is one with special-purpose hardware and/or software coupled with dedicated disk storage and large amounts of memory. The intent is to permit sufficient specialized environments for transaction and analytical processing while at the same time defining federated access to processes and data.

Hybrid Solution from IBM


A hybrid solution was recently presented by IBM as a Technical Preview (i.e., not yet generally available). It is a combination of three products: Db2 11 for z/OS, BM Db2 Analytics Accelerator (IDAA) V5.1 for z/OS and IBM InfoSphere Change Data Capture (CDC) for z/OS V10.2.1.

Db2 11 stores operational data in a standard relational database for use by operational applications. The IDAA stores big data for analytics processing. Finally, CDC is used to replicate completed transactions from Db2 to IDAA. There are several advantages to this solution.

◉ All hardware and software are available from a single vendor, thus avoiding the issues of interfacing and maintaining multiple solutions from multiple vendors;

◉ Db2 11 and IDAA run on IBM z-series hardware that easily supports enterprise-sized applications and data;

◉ Operational and analytic workload processes will not compete for CPU, network and I/O resources, as the database and big data appliance are separately maintained.

IBM’s solution will be soon fitted to execute with the latest version of its relational database product, Db2 12 for z/OS.

Wednesday, May 20, 2020

SQL*Loader Enhancements in Oracle Database 12c

Oracle Database 12c, Oracle Database Tutorial and Materials, DB Exam Prep, DB Guides

This article presents an overview of the enhancements to SQL*Loader in Oracle Database 12c Release 1 (12.1).

◉ Setup


The examples in this article require the SCOTT schema. If you are working in a multitenant environment, edit the "$ORACLE_HOME/rdbms/admin/utlsampl.sql" script, amending the connection to the SCOTT schema as necessary, then run the script when connected to the relevant PDB as a privileged user.

conn sys@pdb1 as sysdba
@?/rdbms/admin/utlsampl.sql

Create a CSV file to load later. This should be placed in a suitable NFS mount point. In this case I'm also placing it into the "/tmp" directory so as not to confuse when discussing non-NFS related functionality.

CONN scott/tiger@pdb1

ALTER SESSION SET nls_date_format='DD-MON-YYYY HH24:MI:SS';

-- Create data file.
SET PAGESIZE 0 LINESIZE 100 TRIMSPOOL ON FEEDBACK OFF
SPOOL /nfs/EMP.dat.tmp

SELECT empno || ',' || ename || ',' || job || ',' || mgr || ',' ||
       hiredate || ',' || sal || ',' || comm || ',' || deptno
FROM   emp;

SPOOL OFF
SET PAGESIZE 14 FEEDBACK ON

-- Clean up the file.
HOST cat /nfs/EMP.dat.tmp | grep '[0-9]\{4\}' > /nfs/EMP.dat
HOST cp /nfs/EMP.dat /tmp/EMP.dat

Create directory objects to allow the SCOTT user to load the data file from both locations.

CONN sys@pdb1 AS SYSDBA

-- Create a directory pointing to an NFS location.
CREATE OR REPLACE DIRECTORY nfs_dir AS '/nfs/';
GRANT READ,WRITE ON DIRECTORY nfs_dir TO scott;

-- Create a directory pointing to an non-NFS location.
CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp/';
GRANT READ,WRITE ON DIRECTORY tmp_dir TO scott;

We also need a table to load into, so we will create a copy of the EMP table called EMP2 without any data.

CONN scott/tiger@pdb1

CREATE TABLE emp2 AS
SELECT *
FROM   emp
WHERE  1=2;

◉ Direct NFS (DNFS) Support


SQL*Loader now supports the use of Direct NFS (DNFS) to improve performance when the data being loaded resides on NFS filers. DNFS is used by default for files over 1G in size. Explicit control is possible using the DNFS_ENABLE and DNFS_READBUFFERS parameters, with the DNFS_READBUFFERS parameter defaulting to 4.

Create a file called "/nfs/EMP.ctl" with the following contents.

OPTIONS (
  BADFILE '/nfs/EMP.bad'
  LOGFILE '/nfs/EMP.log'
  DNFS_ENABLE TRUE
  DNFS_READBUFFERS 10
)
LOAD DATA
INFILE '/nfs/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
   (empno,
    ename,
    job,
    mgr,
    hiredate   DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

Run the following SQL*Loader command to load the data into the EMP table, replacing the existing data.

$ cd /nfs
$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

◉ Extended Data Type Support


When a database is configured to use Extended Data Types, SQL*Loader operations against the database will support the extended data types also.

◉ SQL*Loader Express


SQL*Loader Express is really just a simple extension of the command line interface, such that all the default values are targeted at handling basic CSV files that match the structure of a table. For example, if we want to load data into the EMP2 table, the default settings will look for a dump file called "EMP2.dat" that internally matches the structure of EMP2 table. As a result, we only need to specify command line arguments if the derived and default settings are not acceptable.

$ cd /tmp
$ cp EMP.dat EMP2.dat
$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\"
SQL*Loader: Release 12.1.0.2.0 - Production on Sat Sep 13 22:18:46 2014

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Express Mode Load, Table: EMP2
Path used:      External Table, DEGREE_OF_PARALLELISM=AUTO

Table EMP2:
  14 Rows successfully loaded.

Check the log files:
  EMP2.log
  EMP2_%p.log_xt
for more information about the load.
$

◉ SQL*Loader Command Line Changes


SQL*Loader Express is the main change for the command line, but there are some additional changes.

The TRIM parameter allows you to override the default LDRTRIM functionality with one of LRTRIM, NOTRIM, LTRIM, RTRIM.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  TRIM=LRTRIM

The DEGREE_OF_PARALLELISM parameter can be set to an integer value, DEFAULT, AUTO or NONE, with AUTO being the default value.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  DEGREE_OF_PARALLELISM=4

The BAD, DISCARD, and LOG parameters can be specified using just a directory object name.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  BAD=TMP_DIR DISCARD=TMP_DIR LOG=TMP_DIR

◉ SQL*Loader Control File Changes


Not surprisingly, most of the SQL*Loader control file changes are similar to those introduced for ORACLE_LOADER access driver for external tables.

The INFILE clause now accepts wildcards, where "*" matches multiple characters, while a "?" matches a single character.

INFILE '/nfs/EMP*.dat'

INFILE '/nfs/EMP?.dat'

Files using CSV (comma-separated-values) format can use the simplified FIELDS CSV clause. The
default settings for this can be modified as described here.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS CSV WITH EMBEDDED
   (empno,
    ename,
    job,
    mgr,
    hiredate    DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

A default datetime format can be specified for all datetime fields using the DATE FORMAT clause.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
DATE FORMAT "DD-MON-YYYY HH24:MI:SS"
   (empno,
    ename,
    job,
    mgr,
    hiredate DATE,
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

A default NULLIF can be specified that applies to all character fields. A field-specific NULLIF overrides the default NULLIF. The NO NULLIF clause can be used against a field to prevent the default NULLIF applying to it.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
NULLIF = "NONE"
DATE FORMAT "DD-MON-YYYY HH24:MI:SS"
   (empno,
    ename,
    job,
    mgr NO NULLIF,
    hiredate    DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal NULLIF job="PRESIDENT",
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

The FIELD NAMES clause allows you to specify the order of the fields in the data file, saying they are either in the first line of the file, or in a separate file.

◉ Audit SQL*Loader Direct Path Loads


Oracle 12c allows SQL*Loader direct path loads to be audited by creating an audit policy.

CREATE AUDIT POLICY policy_name
  ACTIONS COMPONENT=DIRECT_LOAD [LOAD | ALL];

I'm having some trouble getting this auditing to work using the "ALL" option. Thanks to Ronan for pointing out the "LOAD" method was working as expected.

When this policy is applied to a user, their SQL*Loader direct path operations will appear in the audit trail. The following policy audits all SQL*Loader operations. The policy is applied to the SCOTT user.

CONN sys@pdb1 AS SYSDBA
CREATE AUDIT POLICY audit_sl_load_policy ACTIONS COMPONENT=DIRECT_LOAD LOAD;
AUDIT POLICY audit_sl_load_policy BY scott;

Use the following control file to run SQL*Loader.

OPTIONS (DIRECT=TRUE)
LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
   (empno,
    ename,
    job,
    mgr,
    hiredate   DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

Checking the audit trail shows the SQL*Loader run was audited.

CONN sys@pdb1 AS SYSDBA

-- Flush audit information to disk.
EXEC DBMS_AUDIT_MGMT.FLUSH_UNIFIED_AUDIT_TRAIL;

SET LINESIZE 200
COLUMN event_timestamp FORMAT A30
COLUMN object_schema FORMAT A10
COLUMN object_name FORMAT A30
COLUMN action_name FORMAT A20

SELECT event_timestamp,
       object_schema,
       object_name,
       action_name,
       direct_path_num_columns_loaded
FROM   unified_audit_trail
WHERE  audit_type = 'Direct path API'
ORDER BY 1;

EVENT_TIMESTAMP   OBJECT_SCH OBJECT_NAME  ACTION_NAME          DIRECT_PATH_NUM_COLUMNS_LOADED
------------------------------      ----------    ------------------------- -------------------- ------------------------
19-OCT-2015 07:50:08.335507    SCOTT      EMP2                           LOAD                     8

2 rows selected.

SQL>

Monday, May 18, 2020

Database Migration - It's More than Running exp and imp

DB Exam Prep, Database Tutorial and Material, Database Certifications, DB Prep

The new server using Red Hat is up and running, Oracle RDBMS software has been installed, a new instance is running with all of the new features you can possibly imagine (at long last, run_fast=true is available), and "all" that's left to do is migrate the source database (or schema) to the new/target database. It is midnight Friday and you have an 8-hour planned outage/maintenance period available to perform the migration. What can you do prior to midnight Friday to make the migration as painless as possible? Let's look at four areas where planning can make a difference: general administration, the export phase, the import phase, and a handoff phase.

General administration/project planning


You are the one in charge of the database migration. What do you and Hannibal from the A-Team have in common? Answer: "I love it when a plan comes together." To help make the plan come together, fire up Visio or PowerPoint and diagram a workflow process. As a minimum, you can take the low-tech route and come up with a timeline. Even if you start with brainstorming and writing down ideas as they come to mind, you will be much better off having everyone on the same sheet of music (or, "One band, one sound"). Items to consider include:

◉ Diagramming the workflow/process, coordination meetings

◉ Assign responsibilities among team members, establish roles and responsibilities

◉ Create and distribute a contact list, include how to get in touch with other key personnel (managers, system administrators, testing, developers, third party/application providers, customers, account managers, etc.)

◉ Hours of operation for Starbucks (some of them open an hour later on Saturdays)

◉ After hours building access for contractors (meet at a designated place and time?)

◉ Janitorial services – do they alarm the building/office when they are done? There is nothing like an alarm going off, as you walk down the hall, to add a little excitement to the evening.

◉ Notification to security/police regarding after hours presence ("Really Officer, we work here, we're not just sitting here looking like we work here")

◉ Establishing a transfer point on the file system and ensuring there is enough disk space for the export

◉ Acquiring a complete understanding of schema changes (how and when key tables get altered/modified, to include data transformation processes)

◉ Establish a work schedule (does every DBA need to be present the entire time, or can schedules be staggered?)


Pre-export and export phase


Aside from a shortage of time, there is very little to prevent you (or the person in charge of export) from practicing the export several times over and ensuring there are no glitches in this part of the plan. Does the export have to be a one-step/export everything from A to Z process? How about phasing the export by functional groups? Consider breaking up the export into functional groups: support tables, main tables, altered tables, and historical/static tables.

By grouping tables in this manner, you can interleave export and import. Once the export of a group is complete, you can start its corresponding import. It may take two hours to export and four hours to import, but that does not mean it takes six consecutive hours. Why is there a time difference between export and import? Export and import are not one to one. Export will run quite a bit faster than import, and both can run faster if optimized a bit. Do not forget that indexes are not being exported. Indexes will be re-built after the data is loaded in the target database.

How are you driving the exports: interactive mode or use of shell scripts and parameter files? Shell scripts should have four key features:

◉ An interview process

◉ Feedback or a summary of what was entered

◉ Existence checks (includes parameter files, ability to write to the dump and log file locations, and database connectivity)

◉ Bail out mechanisms ("Do you want to continue?") after key steps or operations

One script can drive the entire export process, and the bail out points can be used as signals (accompanied by extensive use of echo statements which denote where you are in the process). A key metric to be determined while practicing and refining the scripts is that of the time it takes to perform all exports.

If a schema migration is taking place (as opposed to a full database migration), what are the dependencies among schemas? Look for names/items such as build_manager, process_logger, and stage (more germane to a warehouse). "Build_manager" (as an example of a name) may contain common or public functions, procedures and packages. Process_logger may be the owner of process logs for all schemas (fairly common if you see "pragma autonomous_transaction" in the text of a source; it is a way of capturing errors during failed transactions). Unless the new schema incorporates these external or associated schemas, some or all of these otherwise "Left Behind" schemas need to be accounted for in the target database.

While the export is taking place, what is happening with the non-exported schemas? You may need to disable connections, change passwords, disable other processes, and suspend crons while the export is taking place. Web applications connections tend to be like crabgrass (i.e., hard to kill), and an effective way of stopping them is to change a password. Finally, what is the disposition of the source database, that is, assuming your plan comes together?

Import phase


Practice creating schemas and associated physical/logical objects such as tablespaces and datafiles. End result desired here is no ORA-xxxxx errors whatsoever, and all create scripts should be re-runnable. With respect to import parameter files, ensure fromuser marries up to touser. Using what was gleaned from the indexfile, pre-create tables in the target database.

For tables undergoing a modification, questions to ask include where, when and how does that take place? Do the changes occur within the user's schema, or within a temporary or migration schema, followed by "insert into new version of table as select from temp table?"

Fully understand how major tables are being changed – you may take for granted what appear to be ash and trash "not null" constraints, but application changes may completely rely upon them. In other words, it may not be enough to take care of PK, FK and unique constraints when trying to rebuild a table on the fly because there was some hiccup in the process.

What about cron and database jobs? How are you migrating/exporting all of those? Something which frequently goes hand in hand with cron jobs is email. Is the new server configured for email notification? Are there any database links to create?

Do you need logging turned on while the import is taking place? Is it even necessary to log everything being imported? What about triggers, especially the "for each row" kind? Millions of rows inserted via import equals millions of times one or more triggers fired on a table with that kind of trigger. If the trigger on a table back on the source database already took care of formatting a name, does it need to be fired again during an import?

You can be clever and disable quite a few automatic functions to help speed up the import, but don't be too clever by half, that is, do not forget to re-enable whatever it is you disabled. At 5:30 in the morning, having worked all day Friday (in addition to coming back at 11 to get ready for the midnight starting gun), sleep deprivation can introduce a significant amount of human error. If you have to go off your game plan, have someone double check your work or steps, especially if the object being manipulated is of key importance to a database or schema.

Post import considerations


Did everything work? Breathe a sigh of relief, but the job is not finished. What are you using for a baseline backup, once everything is up and running after the migration? Are you transitioning from export/cold/hot backups to RMAN? Has RMAN backup and recovery been practiced yet?

Plan A, obviously, is a success from start to finish. However, despite all best intentions and planning, what is Plan B? What if, for some undeterminable reason, applications fail to work properly after the migration? Thorough testing minimizes this, but what if no large scale testing took place? What does it take to revert to the source database? Do you have the time to try again? A second attempt will not take as long assuming you trust what took place in the export(s).

Do not assume everyone knows or understands what just took place. For example, do customer support personnel know how to point desktop CRM applications to the new database? Or are they opening trouble tickets a few weeks after the fact to complain how their changes are not being made or are not taking effect? What may be blindingly obvious to you as a DBA may be completely obscure to people who don't speak "database."

In Closing


The tips and steps covered in this article are based on real events, places, and persons. I have personally witnessed the customer service rep complaining about how his changes were not showing up, and it was because he had no idea whatsoever about pointing his desktop CRM application to the new database. Was that the DBA's or his manager's responsibility to propagate that information to him? I have seen key tables have problems with the insertion of transformed data and workarounds such as "create table as select" from the stage or transformation table implemented, but alas, the stage table did not have all of the not null constraints as did the new "real" table, and there goes the Web application down the drain.

The sad truism about a database migration is that if you do not have the time to test beforehand and wind up failing (the reason why is immaterial), it is amazing how time magically appears to perform testing before the second attempt. The tips mentioned in this article should give you a good perspective regarding some the external factors which come into play during a migration.