जगदीश खोलिया: RANK v DENSE_RANK v ROW_NUMBER

Friday, May 25, 2012

RANK v DENSE_RANK v ROW_NUMBER

RANK v DENSE_RANK v ROW_NUMBER

RANK, ROW_NUMBER and also DENSE_RANK are very useful for taking a set of rows and ordering them in a defined manner, while giving each row a “position value”. They differ based on the approach taken to define the value of their position in the set of output rows. In some circumstances, they may all give the same value however, dependent on the data, they may differ.
An example based on the Employee table,

SELECT EmpId
,      salary
,      RANK() OVER(ORDER BY salary) rank_position
,      DENSE_RANK() OVER(ORDER BY sal) dense_rank_position
,      ROW_NUMBER() OVER(ORDER BY sal) row_number_position
FROM   emp
/

EmpId        SAL RANK_POSITION DENSE_RANK_POSITION ROW_NUMBER_POSITION
---------- ---------- ------------- ------------------- -------------------
7369        800             1                   1                   1
7900        950             2                   2                   2
7876       1100             3                   3                   3
7521       1250             4                   4                   4
7654       1250             4                   4                   5
7934       1300             6                   5                   6
7844       1500             7                   6                   7
7499       1600             8                   7                   8
7782       2450             9                   8                   9
7698       2850            10                   9                  10
7566       2975            11                  10                  11
7788       3000            12                  11                  12
7902       3000            12                  11                  13
7839       5000            14                  12                  14 
 
Notice that RANK has given the two employees with SAL = 1250, the same position value of 4 and the two employees with SAL=3000, the same position value of 12. Notice also that RANK skips position values 5 and 13 as it has two entries for 4 and 12 respectively. RANK uses all numbers between 1 and 14, except 5 and 13. RANK has both repeats and gaps in it’s ordering.
DENSE_RANK is similar to RANK, in that it gives the two employees with SAL=1250, the same position value of 4, but then it does not skip over position value 5 – it simply carries on at position 5 for the next values. DENSE_RANK uses, for the position values, all numbers between 1 and 12, without leaving any out, and using 4 and 11 twice. DENSE_RANK has no gaps in it’s ordering, only repeats.
ROW_NUMBER gives each row a unique position value and consequently uses all the numbers between 1 and 14. ROW_NUMBER has no gaps or repeats in it’s ordering. Note that the position value on ROW_NUMBER is not deterministic, since the ORDER BY clause only has SAL in it. If you want to ensure the order is the same each time, you need to add further columns to the ORDER BY clause.

No comments: