Friday, March 6, 2020

The Demise of the DBA

As recently as a decade ago, database administrators had to execute many complex and interesting tasks such as capacity planning, SQL performance tuning and most recently supporting business analytics in the big data environment. Today, however, most of these tasks are performed by artificial intelligence systems or the database itself. With so few important tasks left to perform, are DBAs really needed anymore?

The Classic DBA


As databases and their supporting software matured in the 1980s, IT departments had to grow or hire support staff to handle a large variety of technical tasks. Some of these were basic, such as database backups, while some like database performance tuning were quite involved. As databases grew to meet the demands of businesses, their complexity grew as well. Databases began to differentiate themselves in multiple ways:

◉ Structure -- Hierarchical, network, relational
◉ Size / Capacity -- Small, medium, large, very large
◉ Interface -- Structured Query Language (SQL), navigational, proprietary
◉ Third-party tools -- Database object creation, support (backup, reorg, etc.), data queries

The DBA’s job matured as well. The following is a short list of the most important DBA tasks prior to the year 2000:

Database backup and recovery. While data security or query performance might seem to be more important, database recovery is the primary task of the DBA. Many regulations require that businesses have created and demonstrated business continuity plans, including disaster recovery for IT systems and business data. To support this function, the DBA ensures that database backup jobs are run on a regular basis, and that database recovery can be executed within a specified time window.

Query performance tuning. Many development teams code their SQL and test it against small subset of test data. Regrettably, it is sometimes only in production that they find out how well (or poorly) their applications perform. To prevent issues the DBA can provide SQL syntax recommendations as well as make certain database design decisions such as indexes.

Database management system (DBMS) performance tuning. Here, the DBA is concerned with the configuration parameters of the DBMS. These include memory allocation for data buffering and sorting, CPU allocation and other ways that the DBMS must share resources with other system applications.

Database design. There are many ways to specify the definitions of database objects such as tables and indexes. These include decisions about column data types (should an identifier be numeric or alphabetic?), data element sizing (is 8 digits enough, or should we use 10 digits to account for future expansion?) and how to enforce various integrity rules (e.g. Customer-ID should be unique, Salary must be non-zero, Birth-Date must be a valid date, and so on). In addition, the DBA may need to consider various database structure alternatives such as whether to split a table into multiple tables (for example the Customer table might be split into Current Customer and Previous Customer for performance reasons). Some of these rules are considered standard practices, such as creating a unique index to support a unique column. Others might involve choosing between several options.

Automation support. The database administration staff almost always creates regularly-scheduled processes to handle repetitive tasks. These usually include regular database backups, database reorganizations and health checks. Health checks include gathering measures such as database size, table growth, CPU usage, high resource using tasks, and other resource measures. These are plotted over time to predict future resource needs, as well as to detect potential problems that need to be addressed.

How have things changed?

How Classic DBA Tasks Changed Over Time


As today’s DBA already knows, most of the above tasks are no longer done by the DBA. Instead, they execute an entirely new set of tasks that did not exist a decade ago.

First, a quick review of how classic DBA tasks are now handled.

Database backup and recovery. It is no longer feasible to do a manual database backup that will be used in a disaster to recover a database. First, many databases have grown too large to recover from a backup copy in a short time. Second, in many cases applications are constantly updating your database, making it inconvenient to schedule an outage to create a completely up-to-date copy. Finally, the number of interrelated objects in applications is grown such that recovering a single object (say, the Order table) will also require recovering all related objects (such as the Address table, the Customer table, the Payments table, and so on).

Instead, the most common practice is to create a separate physical site where you keep a copy of your production data. This copy (sometimes called the secondary copy or disaster recovery copy) is then continuously updated at the same time as your production data. So, if a disaster occurs, the site with the secondary data copy has up-to-date data and can be used almost immediately.

Query performance tuning. In years past, SQL query tuning was one of the DBA’s favorite tricks. Many DBAs shared what they learned in articles, conference presentations, books and white papers. After a time, this list of tips and recommendations found its way to the developer community, and designers and coders became more adept at coding SQL that did not require tuning. On another front, third-party vendors created software tools that codified these tips into programs that analyzed SQL and reported on issues and potential solutions. Finally, DBMS vendors added processes to the database software that accumulated SQL performance information and used machine learning techniques to do some SQL performance tuning automatically. Today, most developers use software development toolkits. These toolkits provide a visual interface that allows the user to specify data elements and processes, while the kit codes the required SQL behind the scenes.

In short, the days of the DBA doing SQL tuning are fast disappearing.

Database management system (DBMS) performance tuning. As with SQL tuning, the DBMS has grown and matured to the point that it can almost tune itself. One example is IBM’s new product IBM Db2 IT Operational Analytics (Db2 ITOA). This software gathers z/OS operating system resource usage information (called systems management facility or SMF records) related to the Db2 DBMS and can be used to visualize the health status of one or more Db2 subsystems. It can monitor Db2 health changes over time, highlight abnormal indicators and predict future issues in real time.

Database design. While a database design depends a lot upon the business needs of the applications, most of today’s operational systems already use databases. There are few new applications or systems that require entirely new or previously unknown data structures or relationships. While there are always new types of data being ingested into business databases such as audio clips, videos or extended markup language (XML), ways to efficiently handle these data types have already found their way into the application development world. Consequently, there is not much basic database design for the DBA (or anyone else) to do.

Automation support. Standard processes such as data loads and reporting are already implemented in most IT departments. In fact, it is quite common for IT to have an automated or semi-automated method of implementing new processes. This may be as simple as a document that describes exactly what processes to create for a new system or might be as complex as software that generates these new jobs on demand.

Let’s return to the big question: What does today’s DBA do?

The Modern DBA

Interestingly, today’s DBA rarely works directly with databases. Instead, most of their tasks relate to supporting, monitoring and reacting to systems and issues that support database processes. In a sense, software is now doing database administration, and the DBA is managing that software. Here are some examples.

Monitor autonomics. Autonomics is the ability of a software system to gather information about itself and then act upon that information. Db2 for z/OS is a good example in that there are several features that measure the status of various key indicators in both the database management configuration and in database elements themselves.

For example, Db2 actively monitors the clustering of database tables (i.e., whether or not table data is currently physically clustered by a key value.) For example, it may be a good for query performance if the Order table is physically clustered by Order-ID. Over time, as new orders are added and processed orders are deleted, the table may no longer be perfectly clustered. The DBA can set up a process that regularly interrogates the current clustering of critical tables and, if the amount of clustering is below a specific threshold, execute a reorganization utility.

There are many different indicators that can be used as warning flags to signify when a fix may be required. Today’s DBA must search for instances where these various autonomic processes can be defined, implement them, and generate regular reports on how often indicators are used to execute processes.

Managing advanced tools. The DBA’s power tools of today involve machine learning (ML) and artificial intelligence (AI). As noted previously, tasks such as SQL performance tuning and DBMS performance tuning are now being done by ML and AI software, or even combinations of both. For example, IBM’s Db2 ITOA mentioned above is artificial intelligence software that uses as its underlying engine the IBM Machine Learning for z/OS product. The ML software gathers, filters and correlates performance data. Db2 ITOA uses these data to gain insights into the health of one or more Db2s by looking for patterns.

The DBA’s job is to review the rules and indicators identified by ITOA, review patterns, manage the various self-monitoring and self-retraining functions and respond to any predictions of upcoming issues.

Advanced problem resolution. This is the most challenging area for the DBA. Today’s database landscape has grown in multiple directions, including:

◉ Addition of big data appliances and hybrid hardware / software solutions for both historical and real-time data analytics;

◉ Database as a Service (DBaaS), where some or all of your databases and most of their management tasks are supplied by a third party at one or more external sites or in the cloud;

◉ Multi-site network processing, including some analytics solutions as Apache Hadoop, that involves hundreds (or thousands) of computer notes in a network;

◉ NoSQL databases that use a technique called graph analysis to link data elements and relationships to allow analysis of networked data;

◉ Data federation, which presents to users a single interface to enterprise data that may exist in multiple physical locations.

Whoever is charged with addressing problems and issues in this environment must have a great deal of technical knowledge and experience.

Not Really a DBA


As we can see from our quick look at the modern database environment, very little of the DBA’s work directly involves the database. Instead, due to the diversity of the technology, IT management is now tending toward growing or hiring IT specialists. Some areas such as big data have re-named the database support role as ‘data scientist’.

Related Posts

0 comments:

Post a Comment