Understanding DB2 OLAP By Example

The OLAP features in DB2 are very cool however I don't see a lot of people using them. In addition, sometimes reading the docs on these features are overwhelming so hopefully these examples will make it easy to understand.

We'll look at the following OLAP features in particular. There are more available if you navigate to the IBM website for your DB2 version.
 First we need to create a sample table and data:

CREATE TABLE sales(
     item VARCHAR(20), 
     state CHAR(2),
     store VARCHAR(20),
     amount DECIMAL);

Then lets insert some sample data:

INSERT INTO sales VALUES('Watch', 'IL', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'MN', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Buymore', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Buymore', 15);

INSERT INTO sales VALUES('Phone', 'MN', 'Buymore', 150);
INSERT INTO sales VALUES('Phone', 'IL', 'Buymore', 150);
INSERT INTO sales VALUES('Phone', 'NY', 'Buymore', 150);
INSERT INTO sales VALUES('Phone', 'NY', 'Buymore', 150);
INSERT INTO sales VALUES('Phone', 'NY', 'Buymore', 150);
INSERT INTO sales VALUES('Phone', 'MN', 'Buymore', 150);

INSERT INTO sales VALUES('Watch', 'NY', 'Amazing', 15);
INSERT INTO sales VALUES('Watch', 'IL', 'Amazing', 15);
INSERT INTO sales VALUES('Watch', 'NY', 'Amazing', 15);

INSERT INTO sales VALUES('Phone', 'MN', 'Amazing', 150);
INSERT INTO sales VALUES('Phone', 'NY', 'Amazing', 150);
INSERT INTO sales VALUES('Phone', 'MN', 'Amazing', 150);

Now lets look at the result of a common group by clause:
SELECT 
     store,
     SUM(amount)
FROM
     sales
GROUP BY
     store
STORE                                     TOTAL_SALES 
-------------------- --------------------------------- 
Buymore                                           1020 
Amazing                                            495
As we can see the group by clause took the sum of sales by store. Now lets look at the sum of sales by state using a group by store and state.

SELECT 
     store,
     state,
     SUM(amount) AS total_sales
FROM
     sales
GROUP BY
     store,
     state
STORE                ST                       TOTAL_SALES 
-------------------- -- --------------------------------- 
Amazing              MN                               300 
Buymore              NY                               525 
Buymore              IL                               165 
Amazing              IL                                15 
Amazing              NY                               180 
Buymore              MN                               330 

These are some pretty common group by scenarios but what if you needed a grouping by store and state, and also to display the total sales for each store, as well as for all stores. Here is where rollup comes into play.

Rollup

Rollup is similar to a group by clause except that it also outputs running totals for your rollup groups.
SELECT 
     store,
     state,
     SUM(amount) AS total_sales
FROM
     sales
GROUP BY
     ROLLUP(store, state)
STORE                ST           TOTAL_SALES 
-------------------- -- ---------------------- 
Amazing              IL                     15 
Amazing              MN                    300 
Amazing              NY                    180 
Amazing                                    495 <-Total Amazing
Buymore              IL                    165 
Buymore              MN                    330 
Buymore              NY                    525 
Buymore                                    1020 <-Total Buymore
                                           1515 <-Total

You could also swap the rollup fields and rollup on state and then store:

SELECT 
     store,
     state,
     SUM(amount) AS total_sales
FROM
     sales
GROUP BY 
     ROLLUP(state,store)
 STORE               ST                       TOTAL_SALES 
-------------------- -- --------------------------------- 
Amazing              IL                                15 
Buymore              IL                               165 
                     IL                               180
Amazing              MN                               300 
Buymore              MN                               330 
                     MN                               630
Amazing              NY                               180 
Buymore              NY                               525 
                     NY                               705
                                                     1515

Cube

The cube feature is similar to the rollup feature except it aggregates the data into all combinations in your cube clause. For example this will list the total sales by all combinations of store and state:
  • Store per State Total
  • Store Total
  • State Total
SELECT 
     store,
     state,
     SUM(amount) AS total_sales
FROM
     sales
GROUP BY 
     CUBE(store, state)
STORE            ST       TOTAL_SALES 
---------------- -- ----------------- 
Amazing          CT                15 
Amazing          IL                15 
Amazing          MD                 5 
Amazing          MN               300 
Amazing          NY               180 
Amazing                           515 <-Total Amazing
Buymore          IL               165 
Buymore          MN               315 
Buymore          NY               525 
Buymore                          1005 <-Total Buymore  
                                 1520 <-Grand Total    
                 MD                 5 <-Total by states
                 IL               180                  
                 MN               615                  
                 CT                15                  
                 NY               705                       

Grouping Set

You can also do aggregates based off grouping sets. For example this will list the total sales by store and item, as well as state and item.
SELECT 
     store,
     state,
     item,
     SUM(amount) AS total_sales
FROM
     sales
GROUP BY 
     GROUPING SETS((store, item), (state, item));
STORE                ST ITEM                 TOTAL_SALES 
-------------------- -- ----------- -------------------- 
Buymore                 Watch                        105 
Amazing                 Watch                         60 
Buymore                 Phone                        900 
Amazing                 Phone                        450 
Amazing                 Pen                            5 
                     MD Pen                            5 
                     NY Phone                        600 
                     CT Watch                         15 
                     IL Watch                         30 
                     MN Watch                         15 
                     MN Phone                        600 
                     IL Phone                        150 
                     NY Watch                        105 

Rank

Now lets say you want to rank/order the stores by state of most sales. Here is where the rank function comes into play:
SELECT 
     store,
     state,
     SUM(amount) AS total_sales,
     RANK() OVER (PARTITION BY store ORDER BY SUM(amount) DESC) AS rank
FROM
     sales
GROUP BY 
     state,
     store;
STORE        ST TOTAL_SALES    RANK 
------------ -- -------------- ----- 
Amazing      MN 300            1 <-Highest sale state for Amazing
Amazing      NY 180            2 
Amazing      IL  15            3 
Buymore      NY 525            1 <-Highest sale state for Buymore
Buymore      MN 330            2 
Buymore      IL 165            3 
Rank basically gives an ordering to your results based on how you want that ordering to work. "Partition by store" means your ordering is specific to each store. "order by sum(amount) desc" means for each store, order the results by the highest sales in descending order.

Please note that if two rows tie for a ranking, the ranking number will not increment on the tied row.

Dense Rank

There is also a slightly different rank type; dense rank that differs from rank in how subsequent rows after a tie will increment.

To demonstrate, lets add a new state, CT that ties with IL for Amazing sales. Lets also add another state with lower sales than CT and IL.
INSERT INTO sales VALUES('Watch', 'CT', 'Amazing', 15);
INSERT INTO sales VALUES('Pen', 'MD', 'Amazing', 5);
 
SELECT 
     store,
     state,
     SUM(amount) AS total_sales,
     DENSE_RANK() OVER (PARTITION BY store ORDER BY SUM(amount) DESC) AS dense_rank,
     RANK() OVER (PARTITION BY store ORDER BY SUM(amount) DESC) AS rank
FROM
     sales
GROUP BY 
     state,
     store;
STORE                ST      TOTAL_SALES        DENSE_RANK                 RANK 
-------------------- -- ---------------- ----------------- -------------------- 
Amazing              MN              300                 1                    1 
Amazing              NY              180                 2                    2 
Amazing              CT               15                 3                    3
Amazing              IL               15                 3                    3 
Amazing              MD                5                 4                    5
Buymore              NY              525                 1                    1 
Buymore              MN              315                 2                    2 
Buymore              IL              165                 3                    3 
In this example, after a tie ranking (ranking #3), rank skipped over ranking #4 and went straight to 5 while dense rank incremented one count to 4. Rank is nice if you need a total count of items in the rank.

Row Number

Row number is similar to rank however instead of ranking the row, just gives the row a number starting at 1 and incrementing by 1. In the following example, we assign a row number to each row and partition the row number by store. This basically means that each store will have a row number assigned starting at 1.
SELECT 
     store,
     state,
     SUM(amount) AS total_sales,
     ROW_NUMBER() OVER (PARTITION BY store) AS row_number
FROM
     sales
GROUP BY 
     state,
     store;
STORE                ST      TOTAL_SALES       ROW_NUMBER
-------------------- -- ---------------- ----------------- 
Buymore              NY              525                 1 
Buymore              IL              165                 2 
Buymore              MN              315                 3 
Amazing              MN              300                 1 
Amazing              CT               15                 2
Amazing              NY              180                 3 
Amazing              IL               15                 4
Amazing              MD                5                 5

Comments

Popular posts from this blog

AutoItX4Java - Java AutoIt Bridge

The SQL Server and .Net equivalent of PHP and MySQL's SHA1 function

Automated web testing with Java, Selenium RC, LoggingSelenium, HtmlUnit and TestNG