Achieving optimal performance of data warehouse queries doesn't happen by accident. This article provides the top 5 tips to achieve consistent, reliable performance of your Oracle Analytics reports.
1. Optimizer statistics must be representative.
2. Implement not null, primary, and foreign key constraints on the dimension and fact tables to allow for query transformations.
3. Partition the fact table to achieve partition pruning.
4. Data types on fact and dimension table join keys must be the same.
5. Implement parallel execution.
Optimizer statistics must be representative
Consider how and when to gather statistics
For a data warehouse, the best time to gather statistics is most likely after all the data is loaded. Be aware that in Oracle 12c and later releases, statistics are collected automatically when using direct path loads on empty tables, which might eliminate the need to do it manually on interim tables.
Avoid out-of-range predicates
Predicates are out-of-range when they're not contained between the low_value and high_value of the column statistics in all_tab_col_statistics. Recalculate statistics on the table to resolve this issue.
Consider data skew
The database determines that a column is a candidate for a histogram when the data is skewed, and it is used in query predicates that benefit. This usage information is stored in the database table SYS.COL_USAGE$.
Histograms are created automatically when statistics are gathered using the default, and recommended AUTO option in METHOD_OPT of DBMS_STATS is used.
For example:
EXEC DBMS_STATS.GATHER_TABLE_STATS(...METHOD_OPT => 'FOR ALL COLUMNS SIZE AUTO'...)
Consider column correlation
Multiple predicates on a table that are related, for example, city and state, are considered correlated. By default, the selectivity is calculated by multiplying the individual selectivity of ALL filter predicates. To solve this problem, create extended statistics on correlated columns to improve the estimate of the number of rows returned after all the filter predicates are applied. This is also known as the cardinality estimate.
◉ IMPORTANT – If the columns in the column group have a histogram, a histogram also needs to be created on the column group for it to be used. You might need to select from the table first with those predicate columns before recalculating statistics for the histogram on the column group to be created.
Implement constraints to help the optimizer
Constraints are information about the relationships between the tables in your schema that enable the optimizer to perform optimizations such as join elimination and other query transformations.
Sometimes DBAs are concerned about the performance impact of data loads with constraints in place. Mitigate this by creating the foreign keys on the fact table using the disabled, novalidate, and rely constraint states. This tells the database not to validate the relationship when the constraint is created (disabled) or when data is inserted or updated (novalidate).
To tell the optimizer to trust the constraint relationships in the RELY state, set the parameters, query_rewrite_integrity=TRUSTED and query_rewrite_enabled=TRUE (default).
◉ IMPORTANT - Set the query_rewrite_integrity parameter to trusted only if the quality of the data in the fact table is reliable to avoid wrong results. To determine if the data is reliable, check the fact table for rows that contain join keys that don't exist in the dimension tables using, LEFT OUTER JOIN OR IS NULL, sql syntax.
Partition the FACT table for partition pruning
Choose a column that's frequently used to filter the data. Often the fact table is partitioned by range or interval on a date column. If the table is joined with the date dimension using a numeric WID that represents the date, consider partitioning by range using the WID foreign key column to the date dimension.
Because data warehouse queries often filter using columns on the dimension tables, without implementing the not null, primary, and foreign key constraints, partition pruning desired for performance might not occur because the database doesn't understand the relationships within the schema.
Data types on fact and dimension table join keys must be the same
The data types on the fact and dimension table join keys must be the same, including length and precision.
Parallel execution
Once the foundations described above are in place, validate the performance and check the execution plan. Now you can add parallel execution to speed up the report. Without the appropriate foundation however, the single report might use more resources than would otherwise be required and the database might experience inconsistent performance.
Source: oracle.com
0 comments:
Post a Comment