Thursday, November 26, 2009

Aggregates outside OBIEE - materialized views and query rewrite

In this post I explained how to use aggregates in OBIEE. Then we did manually create aggregate tables on the database and set each logical table source to triggering only on the certain level of dimension.

We then used aggregate tables SALES_MONTHS, SALES_YEAR_CAT and SALES_MONTHS_CAT_CH and dimension tables CATEGORIES, MONTHS i YEARS. Here, we don't need that.

In this post we'll try to explain and set up materialized views and the query rewrite to get the same queries as in the post above, but without setting anything in the BMM in the logical table sources.

I'll use oracle 10g database, oracle SH schema and the measure from the SALES fact table.

For using dbms_mview.explain_rewrite we need to have rewrite_table table (file utlxplan.sql) and for dbms_mview.explain_mview table mv_capabilities_table (file utlxmv.sql).

First set:

ALTER SYSTEM SET QUERY_REWRITE_ENABLED=TRUE
ALTER SYSTEM SET QUERY_REWRITE_INTEGRITY='TRUSTED'

This is the level of the query rewrite. I set TRUSTED only to be able to test this. You should see other options as well. In TRUSTED mode, the optimizer trusts that the relationships declared in dimensions and RELY constraints are correct. In this mode, the optimizer also uses prebuilt materialized views or materialized views based on views, and it uses relationships that are not enforced as well as those that are enforced. In this mode, the optimizer also trusts declared but not ENABLED VALIDATED primary or unique key constraints and data relationships specified using dimensions. This mode offers greater query rewrite capabilities but also creates the risk of incorrect results if any of the trusted relationships you have declared are incorrect (Text reference:
Oracle Database Data Warehousing Guide 11g Release 1 (11.1)).

RELY constraints:

We use SALES table as the reference. Existing constraints we need to modify to RELY. RELY only affects those constraints that are ENABLE NOVALIDATE. Parameter QUERY_REWRITE_INTEGRITY is set to TRUSTED (TRUSTED informations are constraints (NOVALIDATE RELY) and dimensions). Oracle will not do the check whether relationships defined with RELY constraints are TRUE. That refers to primary key and unique key constraints (RELY ENABLE NOVALIDATE). Query rewrite also use joinback method for recognition attribute that is not in the materialized view query but can be retrieved with joinback. For example, the query rewrite materialized view has CALENDAR_MONTH_ID and we want to group by CALENDAR_MONTH_DESC and then the query optimizer make the join between materialized view and the TIMES table one more to get CALENAR_MONTH_DESC. TIMES table is joinback table.

Because of the connection with the higher levels we need to have dimensions:



I didnt create them, they are already on the oracle SH schema.

Modify all SALES table constraints to RELY ENABLE NOVALIDATE:

alter table sales modify constraint sales_product_fk RELY ENABLE NOVALIDATE
alter table sales modify constraint sales_channel_fk RELY ENABLE NOVALIDATE
alter table sales modify constraint sales_time_fk RELY ENABLE NOVALIDATE
alter table products modify constraint products_pk RELY ENABLE NOVALIDATE
alter table times modify constraint times_pk RELY ENABLE NOVALIDATE
alter table channels modify constraint channels_pk RELY ENABLE NOVALIDATE

We create materialized view to support all queries like in the
post:

create materialized view mv_sales_all
build immediate
refresh force on demand
with primary key
enable query rewrite
as
select t.calendar_month_id,
s.prod_id,
s.channel_id,
grouping_id(t.calendar_month_id, s.prod_id, s.channel_id) as gr_id,
sum(s.amount_sold) as amount_sold,
sum(s.quantity_sold) as quantity_sold
from sales s, times t
where s.time_id=t.time_id
group by
grouping sets
(
(t.calendar_month_id),--gr_id 3
(t.calendar_month_id,s.prod_id), --gr_id 1
(t.calendar_month_id,s.prod_id,s.channel_id)--gr_id 0
)

In the grouping sets we support all three combinations like in the
post.

Grouping_id function will get the decimal interpretation of the binary. If the attribute gives the contribution to aggregation then the value is 0, otherwise it is 1.

For example, calendar_month_id has value 3 because it's in the combination:

(0, 1, 1) = (calendar_month_id, prod_id, channel_id)

Check:

select bin_to_num(0, 1, 1) from dual--3 decimal
select bin_to_num(0, 0, 0) from dual--0 decimal
select bin_to_num(0, 0, 1) from dual--1 decimal

Example of combinations:



To explain materialized view query we use the table mv_capabilities_table and the procedure dbms_mview.explain_mview.

BMM (clean model):



The focus is on how this works with queries that OBIEE generates, not how to refresh materialized views during the part of the job of the ETL process.

To test this we need to refresh materialized view:

begin
dbms_snapshot.refresh('MV_SALES_ALL','C');
end;

Get schema statistics:

begin
dbms_stats.gather_schema_stats('SH', CASCADE=>TRUE);
end;

Now, if we choose:



NQQuery.log:



Explain plan, table plan_table:



See the joinback to TIMES table to get the CALENDAR_MONTH_DESC.

If we instead of CALENDAR_MONTH_DESC put the CALENDAR_MONTH_ID there is no joiback to TIMES because we use CALENDAR_MONTH_ID which is already in the materialized view query.



To verify that the query did rewrite we can use dbms_mview.explain_rewrite, and the table rewrite_table:



If we choose:



NQQuery.log:



Explain plan, table plan_table:



If we choose:



NQQuery.log:



Explain plan, plan_table:



We see that in all three queries the query rewrite works correctly, query has been rewritten.

I really try to show how this works when you are using OBIEE queries. If you have any question or suggestion please post the comment.

Thursday, November 19, 2009

Variables in direct database requests

In Answers we have possibility to write SQL directly to the database.



I'll show how to use repository, session and presentation variables in direct database request and whether this is possible or not and compare this with normal Answers request.

Repository initialization block:
select to_char(min(time_id), 'dd.mm.yyyy') from sales

Repository variable (dynamic):
rv_test_date_to_char

Session initialization block:
select 'Photo' from dual

Non-system session variable:
CAT, enable any user to set the value checked, without default initializer

Dashboard prompt fields:
PRODUCTS.PROD_CATEGORY, drop-down list, set request variable CAT
CHANNELS.CHANNEL_DESC, drop-down list, set presentation variable pv_channel_desc

The first one re-sets session variable and the second one sets presentation variable.

Normal Answers request columns and filter:
PRODUCTS.PROD_CATEGORY
VALUEOF(NQ_SESSION.CAT)
VALUEOF(rv_test_date_to_char)
'@{pv_channel_desc}'
PRODUCTS.PROD_CATEGORY is prompted

Direct database request:



SQL statement:
select
'VALUEOF(NQ_SESSION.CAT)' session_variable,
'VALUEOF(rv_test_date_to_char)' repository_variable,
'@{pv_channel_desc}{Internet}' presentation_variable,
channel_desc
from channels
where channel_desc='@{pv_channel_desc}{Internet}'


We see inside the statement what is the syntax for referencing variables, for that I know that works correctly.

Now if we put all three objects in the dashboard page at initial we get this:



We change values from the prompt and re-set session and presentation variable:



Everything works fine in the direct database request except we cannot view new value of non-system session variable no mather how many times we refresh (re-set) it, it only takes value that we defined in the initialization block code. New (refreshed) value affects only Answers request.

Sunday, November 8, 2009

Denormalized HR employees table, dimension hierarchy and level drills in OBIEE

I looked at the table of employees which is placed on standard Oracle HR schema and I thought that it would be great to see it in the denormalized shape in which every row contains data about parent levels.

Table employees has a self referential join (employee_id - manager_id) which describes us relation. Each employee have its parent employee (manager) except employee_id 100 which is president and placed on the top of hierarchy.

Employee hierarchy is unbalanced, and balanced hierarchy means that each branch has the same number of levels (for example products dimension table, sh.products). We'll denormalize employee table (dimension) to get user friendly hierarchy. After we make denormalized structure we'll build dimension with levels in obiee and use the second copy of employees table as a fact table to find out what is the sum of the salary of all employees first level down for the current level of employee/manager.

First we want to find out what is a maximum level depth in employees table:

select
max(LEVEL)
from employees
start with email='SKING'
connect by prior employee_id=manager_id

The maximum level depth is 4. Four levels we'll have in the denormalized shape. We use email column instead of first name and last name to identify each employee and this would be a primary key in each level.

Number of employees on each level:

select
count(email),
LEVEL
from employees
start with email='SKING'
connect by prior employee_id=manager_id
group by LEVEL
order by 2




This is employee data in the basic form:

select LEVEL,
employee_id,
email,
job_id,
manager_id
from employees
start with email='SKING'
connect by prior employee_id=manager_id
order by 1,2




In this example we don't have loops in the data, value below the parent cannot be his parent, so we don't need NOCYCLE parameter in the CONNECT BY condition.

So, employees are placed on each level. If you look at level 3 for example there are lots of employees whose job_id is ST_CLERK and they managers are at level 2 having ST_MAN as job_id. The similar is with level 4, IT_PROG, FI_ACCOUNT and AC_ACCOUNT having managers with job_id IT_PROG, FI_MGR and AC_MGR. Some employees ends at level 2, for example I add new employee which job_id is PRES_ASST (add this job_id into table jobs with president personal assistant as job title), and he doesn't have levels below him.

The first step is going into obiee Administrator physical level and make new physical table as select table type:



I use this code for employee table levels denormalization:

select
employee_id,
employee_level,
value_below,
nvl(level4_id, nvl(level3_id, nvl(level2_id, level1_id))) level4_id,
nvl(level4_desc, nvl(level3_desc, nvl(level2_desc,level1_desc))) level4_desc,
nvl(level3_id, nvl(level2_id, level1_id)) level3_id,
nvl(level3_desc, nvl(level2_desc,level1_desc)) level3_desc,
nvl(level2_id, level1_id) level2_id,
nvl(level2_desc,level1_desc) level2_desc,
level1_id,
level1_desc,
salary,
department_id,
manager_id
from
(
select
LEVEL,
concat('LEVEL ', cast(LEVEL as varchar2(1))) as employee_level,
case when CONNECT_BY_ISLEAF =1 then 'No' else 'Yes' end as value_below,
employee_id,
manager_id,
email,
salary,
department_id,
decode(LEVEL,1,to_char(employee_id),substr
(
SYS_CONNECT_BY_PATH(employee_id, '/'),
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,1)+1,
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,2)-instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,1)-1
)
) level1_id,
decode(LEVEL,1,to_char(email),substr
(
SYS_CONNECT_BY_PATH(email, '/'),
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,1)+1,
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,2)-instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,1)-1
)
) level1_desc,
--Level2
decode(LEVEL,2,to_char(employee_id),substr
(
SYS_CONNECT_BY_PATH(employee_id, '/'),
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,2)+1,
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,3)-instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,2)-1
)
) level2_id,
decode(LEVEL,2,to_char(email),substr
(
SYS_CONNECT_BY_PATH(email, '/'),
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,2)+1,
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,3)-instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,2)-1
)
) level2_desc,
--Level 3
decode(LEVEL,3,to_char(employee_id),substr
(
SYS_CONNECT_BY_PATH(employee_id, '/'),
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,3)+1,
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,4)-instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,3)-1
)
) level3_id,
decode(LEVEL,3,to_char(email),substr
(
SYS_CONNECT_BY_PATH(email, '/'),
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,3)+1,
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,4)-instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,3)-1
)
) level3_desc,
--Level 4
decode(LEVEL,4,to_char(employee_id),substr
(
SYS_CONNECT_BY_PATH(employee_id, '/'),
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,4)+1,
instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,5)-instr(SYS_CONNECT_BY_PATH(employee_id, '/'),'/',1,4)-1
)
) level4_id,
decode(LEVEL,4,to_char(email),substr
(
SYS_CONNECT_BY_PATH(email, '/'),
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,4)+1,
instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,5)-instr(SYS_CONNECT_BY_PATH(email, '/'),'/',1,4)-1
)
) level4_desc,
SYS_CONNECT_BY_PATH(email, '/') "desc_all",
SYS_CONNECT_BY_PATH(employee_id, '/') "id_all",
SYS_CONNECT_BY_PATH(job_id, '/') "job_all"
from employees
start with employee_id=100
connect by prior employee_id=manager_id
order by employee_id
)


For employees that are identified at the high levels I'll propagate current level id and description to lowest levels and for that I used:

nvl(level4_id, nvl(level3_id, nvl(level2_id, level1_id))) level4_id,
nvl(level4_desc, nvl(level3_desc, nvl(level2_desc,level1_desc))) level4_desc,
nvl(level3_id, nvl(level2_id, level1_id)) level3_id,
nvl(level3_desc, nvl(level2_desc,level1_desc)) level3_desc,
nvl(level2_id, level1_id) level2_id,
nvl(level2_desc,level1_desc) level2_desc


This is not mandatory.

For example SKING is identified only at level 1 and the same value is propagated till the level 4.

Before examples, one thing is left to do. In the physical layer make alias of Employee Hierarchy View that we build and name it as Employee Hierarchy Normal. We'll use this as complete employees denormalization data view. Make another alias for employees table, we'll use it as a fact table to get summ of the first level down for the current employee.

So our physical diagram looks like this:



Foreign keys:

"Employee Hierarchy View".employee_id = "Employee Fact Sum First Level Down".MANAGER_ID
"Employee Hierarchy View Normal".employee_id = "Employee Fact Sum First Level Down".MANAGER_ID

In the BMM we make left join from Employee Hierarchy Drills to Employee Fact to return all record even if there is no sum salary of employees first level down for the current employee (manager):



Example - employees dimension hierarchy and level drills using multiple logical table sources



Logical table sources in order:

















Little explanation why we put in the filter of each logical table source current level, for example if we are on the high level (SKING) and want to drill down to second level this query will be generated:

select
distinct
level1_desc,
level2_desc
from Employee Hierarchy View
where level1_desc='SKING'


You will get 16 rows, including the row that has level1_desc='SKING' and level2_desc='SKING' which is actually data placed on the first level (values propagated to lowest level). So with the logical table source filter we are excluding these rows.

Employee hierarchy and drills test in Answers

Report structure:



First level:



Value below column tell us if this level has level below it. Notice measure sum salary first level down and salary current employee dimension attribute.

Drill to the second level:



Now if you try to drill from the TTEST which doesn't have level below him you would get:



This is because when crossing from the second to the third logical table source it trying to execute where condition:

select
distinct
level1_desc,
level2_desc,
level3_desc
from Employee Hierarchy View
where level1_desc='SKING'
and level2_desc='TTEST'
and employee_level='LEVEL 3'


There is no level 3 for TTEST.

Drill now to the third level for any value that has level down:



Not bad, NGREENBE has a salary 12000 and all employees which he's a manager have sum of the salary 39600. Chech this when drilling from him to lowest level, the last fourth level:



The sum works fine.

What if we want to show in the report all employees with denormalized structure and the fact measure without starting from the higher level and without using drills?

Lets try that:



The result is not good because the query goes to the last logical table source which contains all columns on the level 4 and the employee_level='LEVEL 4' filter is applied:





So we must think something else in order to see all employees.

Options to show detailed (complete) view of all employees

1. Change the filter of the last logical table source dynamic with non-system session variable and re-set it from the dashboard prompt

We'll make two reports, one starts with highest level and used for drill down, other is detailed with all levels shown in the report but without using drills.

Create initialization block and non-system session variable:



Create dashboard prompt which shows only Complete and Dimension hardcoded values and set request (session) variable SV_EMPLOYEE_HIERARCHY. Create two sections, each has guided navigation report to show or hide it according to session variable value that user sets from the prompt. I'll not show how to implement that, just the result.

In the logical table source for level 4 write:

orcl."".HR."Employee Hierarchy View".employee_level=CASE WHEN VALUEOF(NQ_SESSION."SV_EMPLOYEE_HIERARCHY")='Dimension' THEN 'LEVEL 4' ELSE orcl."".HR."Employee Hierarchy View".employee_level END



Test

Choose dimension view with drills enabled:



Don't forget to enable Drill in place option in the section properties.

Now you are able to drill down to the lowest level (level 4) like we have shown and described before, everything is the same.



And if you drill to the level 4 it used valid code from the last logical table source:



Now choose complete view with drills disabled.



The result:



In complete view all employees are included:



NQQuery.log:



2. Make the separate employees logical table without dimension drills

Make new logical table using Employee Hierarchy View Normal alias from the physical layer that we made before:



In the BMM join this table to Employee fact logical table:



Don't forget left join:



You can use this logical table for detailed (complete) view to show all employees and still use previous logical table with dimension and drills. In that case you don't need this filter in the Employee Hierarchy Drills logical table in the logical table source for level 4:

orcl."".HR."Employee Hierarchy View".employee_level=CASE WHEN VALUEOF(NQ_SESSION."SV_EMPLOYEE_HIERARCHY")='Dimension' THEN 'LEVEL 4' ELSE orcl."".HR."Employee Hierarchy View".employee_level END

Make detailed report using Employee Hierarchy Normal logical table.