设为首页 收藏本站
查看: 619|回复: 0

[经验分享] Oracle 11g Extension Statistics and Expression Statistics(原创)

[复制链接]
YunVN网友  发表于 2016-8-15 06:09:48 |阅读模式
  Overview Extended Statistics
In this release, Oracle has introduced major new capabilities in statistics gathering, which are referred to as extended statistics, to make the optimizer statistics reflect the true selectivity of the data. There are two types of extended statistics: multi- column statistics, which involve collecting statistics for column groups, and expression statistics. Extended statistics include the statistics collected for both column groups and expressions and use the following new procedures:
CREATE_EXTENDED_STATS function
DROP_EXTENDED_STATS procedure
SHOW_EXTENDED_STATS_NAME function
Multicolumn Statistics (Column groups)
The selectivity of a column is a crucial optimizer statistic, playing a key role in the execution plan that the cost optimizer creates for a SQL statement. Currently, Oracle collects statistics by computing the selectivity of each of a table’s columns separately, and ignores the relationship between the columns. However, the relationship between certain columns may be so strong that it can affect the combined selectivity of the two columns. In most cases, the optimizer assumes that the values of the different columns in a complex predicate are independent. Based on this assumption, the optimizer simply multiplies the selectivity of individual predicates to arrive at the selectivity of a conjunctive predicate, which usually leads to an underestimation of the selectivity. In Oracle Database 10g, when figuring out the selectivity of multiple predicates, the query optimizer took into account the correlation between related columns only under a limited set of circumstances, as I summarize here:

  • The optimizer used the number of distinct keys in an index to estimate selectivity provided all columns of a conjunctive predicate match all columns of a concatenated index key. In addition, the predicates must be equalities used in equijoins.
  • If you set DYNAMIC_SAMPLING to level 4, the optimizer used dynamic sampling to estimate the selectivity of predicates involving multiple columns from a table.Because the sampling size is quite small, the results are dubious in most cases.
  With the exception of the two cases presented here, the optimizer always assumed that the values of all columns in a table that were used in a complex predicate were independent of each other. Based on this naïve assumption, the optimizer simply multiplied single column selectivity estimates to arrive at the selectivity of a conjunctive predicate involving multiple columns. The end result of this strategy was a severe underestimation of the real selectivity of those types of predicates in a SQL statement. Oracle Database 11g attempts to alleviate this major problem by letting you collect the following types of statistics on multiple columns in a table, which it refers to as a group of columns:

  • Number of distinct values
  • Density
  • Number of nulls
  • Frequency histograms
  The idea behind the capturing of statistics for a group of columns as a single entity is to capture the underlying functional dependency between related columns in a table. The database collects the number of distinct values, the number of null values, frequency histograms, and density for groups of columns. Let’s use an example from the CUSTOMERS table in the SH schema to drive home this point. In this table, the two columns CUST_STATE_PROVINCE and COUNTRY_ID are strongly correlated. The CUST_STATE_PROVINCE column determines the value of the COUNTRY_ID column for a customer. The following query using California as the value for the CUST_STATE_PROVINCE column shows this:
  SQL> select count(*)  from sh.customers
     where cust_state_province = 'CA';
COUNT(*)
----------
    3341
The query returns the value 3341. That is, there are a total of 3341 customers in the customers table who are from the state of California. Of course, if you issue the following query, which asks how many customers are from the state of California and the U.S. (country_id=52790), you get the same result as before:
SQL> select count(*)  from customers
     where cust_state_province = 'CA'
     and country_id=52790;
COUNT(*)
----------
    3341
But it is clear that if you repeat this query for any COUNTRY_ID other than the U.S., the result would be, in all likelihood, zero because California is a state in the U.S. but not in the other countries. In cases such as these, it makes sense for the optimizer to rely not merely on the selectivity of the individual columns, but on the selectivity for the group of related columns as well. Oracle Database 11g lets you do precisely that— you can now gather statistics on related columns as a group, called a column group. The optimizer uses the statistics on column groups to account for the correlation between two columns. If, for example, your query has the predicates c1=1 and c2=1 and if you collect statistics on (c1, c2) as a single group, the optimizer will use the column group statistics for estimating the combined selectivity of the two predicates.
  Manage extention statistics
  Oracle creates column groups for related columns based on its analysis of the database workload. You can, however, create a column group yourself using the DBMS_STATS package.
  -- Create a columnn group based on EMP(JOB,DEPTNO).
SET SERVEROUTPUT ON
DECLARE
  l_cg_name VARCHAR2(30);
BEGIN
  l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
                                                tabname   => 'EMP',
                                                extension => '(JOB,DEPTNO)');
  DBMS_OUTPUT.put_line('l_cg_name=' || l_cg_name);
END;
/
l_cg_name=SYS_STU3VG629OEYG6FN0EKTGV_HQ6

PL/SQL procedure successfully completed.
The column group name is returned using the SHOW_EXTENDED_STATS_NAME function.
-- Display the name of the columnn group.
SELECT DBMS_STATS.show_extended_stats_name(ownname   => 'SCOTT',
                                           tabname   => 'EMP',
                                           extension => '(JOB,DEPTNO)') AS ame
FROM dual;
CG_NAME
------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6
 1 row selected.
Manually created column groups can be deleted using the DROP_EXTENDED_STATS procedure.
-- Drop the columnn group.
BEGIN
  dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
                                 tabname   => 'EMP',
                                 extension => '(JOB,DEPTNO)');
END;
/
PL/SQL procedure successfully completed.
Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather statistics on all existing column groups for the specified object.
BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for all columns size auto');
END;
/
Alternatively, set the METHOD_OPT parameter to "FOR COLUMNS (column-list)" and the group will automatically be created during the statistics gathering.
BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for columns (job,mgr)');
END;
/
The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the multi-column statistics.
COLUMN extension FORMAT A30
SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';
EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STULPA1A#B6YL4KQ59DQO3OADQ ("JOB","MGR")
 2 rows selected.
COLUMN col_group FORMAT A30
SELECT e.extension col_group,
       t.num_distinct,
       t.histogram
FROM   dba_stat_extensions e
       JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';
COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 FREQUENCY
("JOB","MGR")                             8 FREQUENCY
2 rows selected.
  Expression Statistics
  In Oracle Database 10g, the optimizer can collect expression statistics on some types of expressions on columns, thus deriving more accurate selectivity estimates. This functionality applies only to certain special cases where a function preserves the data distribution characteristics of the original column, as is the case when you use an expression such as TO_NUMBER. In addition, the database in the previous release used dynamic sampling to get better estimates of built-in functions on columns. In Oracle Database 11g, the database uses expression statistics that include user-defined functions as well as function-based indexes. The new feature relies on the virtual column infrastructure to create expression statistics, that is, statistics on predicates involving expressions on columns.
  The optimizer has no idea what the affect of applying a function to column has on the selectivity of the column. Using a similar method to multi-column statistics, we can gather expression statistics to provide more information. Expression statistics can be created explicitly using the CREATE_EXTENDED_STATS procedure, or implicitly by specifying the expression in the METHOD_OPT parameter of the GATHER_% procedures when gathering statistics.
DECLARE
  l_cg_name VARCHAR2(30);
BEGIN
  -- Explicitly created.
  l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
                                                tabname   => 'EMP',
                                                extension => '(LOWER(ENAME))');

  -- Implicitly created.
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for columns (upper(ename))');
END;
/
Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather existing expression statistics.
BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for all columns size auto');
END;
/
The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the expression statistics, as well as the multi-column statistics.
COLUMN extension FORMAT A30
SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';
EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STULPA1A#B6YL4KQ59DQO3OADQ ("JOB","MGR")
SYS_STU2JLSDWQAFJHQST7$QK81_YB (LOWER("ENAME"))
SYS_STUOK75YSL165W#_X8GUYL0A1X (UPPER("ENAME"))
4 rows selected.
COLUMN col_group FORMAT A30
SELECT e.extension col_group,
       t.num_distinct,
       t.histogram
FROM   dba_stat_extensions e
       JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';
COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 NONE
("JOB","MGR")                             8 NONE
(LOWER("ENAME"))                         14 NONE
(UPPER("ENAME"))                         14 NONE
4 rows selected.
Expression statistics are dropped using the DROP_EXTENDED_STATS procedure.
-- Drop the columnn group.
BEGIN
  dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
                                 tabname   => 'EMP',
                                 extension => '(UPPER(ENAME))');
END;
/
PL/SQL procedure successfully completed.

  
  参 考至:http://www.oracle-base.com/articles/11g/statistics-collection- enhancements-11gr1.php 《McGraw.Hill.OCP.Oracle.Database.11g.New.Features.for.Administrators.Exam.Guide.Apr.2008》
  本文原创,转载请注明出处、作者
  如有错误,欢迎指正
  邮箱:czmcj@163.com

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-257667-1-1.html 上篇帖子: Hibernate one -to -one mapping for Oracle auto-increment key 下篇帖子: oracle中的service_name,instance_name,db_name,oracle_sid,sid_name的用法及区别
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表