AddThis Social Bookmark Button

Print

Emulating Analytic (AKA Ranking) Functions with MySQL

by Stephane Faroult
03/29/2007

One of the most hailed extensions brought to SQL in recent years has been these functions that Oracle calls analytic functions, DB2 calls OLAP functions, and SQL Server 2005 calls ranking functions--but which MySQL, so far, still lacks. The good news is that they can be (relatively) easily and efficiently emulated.

Oracle introduced these functions early (with version 8.1.6, back in 1999 or about) and still offers a greater number of such functions than the other major players (who usually provide the most basic functions, from which the others can be derived.) Therefore, I will take my references from Oracle, and will show how you can, in most cases, obtain the same result with MySQL.

What Are Analytic Functions?

Anyone who has dabbled with SQL is familiar with aggregate functions such as SUM() or COUNT(). But as their name implies, when you use these functions and perform a group by, the detail is lost in the aggregate. Very simply, analytic functions allow you to return the detail (simple, unaggregated rows) and at the same time, on the very same row, a result that is computed from operations on several rows that are related in one way or another to the current one. Let me give an example. Suppose we have the unavoidable Oracle EMP table:

SQL> desc emp
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------
 EMPNO                                     NOT NULL NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)

We can easily compute with a group by the total salary amount, department by department:

SQL> select deptno, sum(sal)
  2  from emp
  3  group by deptno;

    DEPTNO   SUM(SAL)
---------- ----------
        30       9400
        20      10875
        10       8750

By doing so, we lose the detail. However, if we use SUM() in an analytical way, we can return each row, and compute on the same row the total salary amount, specifying in the over clause that the SUM() function actually applies to all rows that refer to the same department number as the current one:

SQL> select deptno, ename, sal, sum(sal) over (partition by deptno)
  2  from emp
  3  order by 1, 3
  4  /

    DEPTNO ENAME         SAL     SUM(SAL)OVER(PARTITIONBYDEPTNO)
---------- ---------- ---------- -------------------------------
        10 MILLER           1300                            8750
        10 CLARK            2450                            8750
        10 KING             5000                            8750
        20 SMITH             800                           10875
        20 ADAMS            1100                           10875
        20 JONES            2975                           10875
        20 SCOTT            3000                           10875
        20 FORD             3000                           10875
        30 JAMES             950                            9400
        30 MARTIN           1250                            9400
        30 WARD             1250                            9400
        30 TURNER           1500                            9400
        30 ALLEN            1600                            9400
        30 BLAKE            2850                            9400

14 rows selected.

SQL>

Such a result can be very useful in computing, for instance, what percentage of the salary mass of a department does the Pointy Haired Boss's salary represent? Needless to say, all we need to do with MySQL is to join a regular select to the aggregate to obtain a similar result:

mysql> select a.DEPTNO, a.ENAME, a.SAL, b.TOTSAL
    -> from EMP as a
    ->       inner join (select DEPTNO, sum(SAL) TOTSAL
    ->                   from EMP
    ->                   group by DEPTNO) as b
    ->               on a.DEPTNO = b.DEPTNO
    -> order by 1, 3;
+--------+--------+------+--------+
| DEPTNO | ENAME  | SAL  | TOTSAL |
+--------+--------+------+--------+
|     10 | MILLER | 1300 |   8750 |
|     10 | CLARK  | 2450 |   8750 |
|     10 | KING   | 5000 |   8750 |
|     20 | SMITH  |  800 |  10875 |
|     20 | ADAMS  | 1100 |  10875 |
|     20 | JONES  | 2975 |  10875 |
|     20 | SCOTT  | 3000 |  10875 |
|     20 | FORD   | 3000 |  10875 |
|     30 | JAMES  |  950 |   9400 |
|     30 | WARD   | 1250 |   9400 |
|     30 | MARTIN | 1250 |   9400 |
|     30 | TURNER | 1500 |   9400 |
|     30 | ALLEN  | 1600 |   9400 |
|     30 | BLAKE  | 2850 |   9400 |
+--------+--------+------+--------+
14 rows in set (0.00 sec)

mysql>

Pages: 1, 2, 3, 4

Next Pagearrow




-->