Window Functions using JSON



Window functions execute on a set of rows and return a single value for each row from the query. The term window has the meaning of the set of rows for the function.

A Window function in a query, defines the window using the OVER() clause. This OVER() clause has the following capabilities −

  • Defines window partitions to form groups of rows. (PARTITION BY clause)
  • Orders rows within a partition. (ORDER BY clause)

Aggregate Window Functions

The Aggregate window function can be defined over a partition by and order by clause.

Avg()

The following program shows the query for this function −

Query

0: jdbc:drill:zk = local> select mark1,gender,
   avg(mark1) over (partition by gender ) as avgmark1 from
   dfs.`/Users/../workspace/Drill-samples/student_list.json`;

Result

mark1 gender avgmark1
70    female   83.0
100   female   83.0
90    female   83.0
75    female   83.0
80    female   83.0
70     male    55.0
40     male    55.0
60     male    55.0
50     male    55.0
55     male    55.0

This result shows that partition by clause is used for the gender column. So, it takes the average of mark1 from female gender which is 83.0 and then replaces that value to all the male and female gender. The mark1 avg result is now 55.0 and hence it replaces the same to all genders.

Count(*)

The following program shows the query for this function −

Query

0: jdbc:drill:zk = local> select name, gender, mark1, age,
   count(*) over(partition by age) as cnt from
   dfs.`/Users/../workspace/Drill-samples/student_list.json`;

Result

name  gender  mark1 age cnt
Adam   male    70   12   7
Amit   male    40   12   7
Bob    male    60   12   7
David  male    50   12   7
Esha  female   70   12   7
Ganga female  100   12   7
Leena female   90   12   7
Jack   male    55   13   3
Mary  female   75   13   3
Peter female   80   13   3

Here, there are two age groups 12 and 13. The age count of 12 is for 7 students and 13 age count is for 3 students. Hence count(*) over partition by age replaces 7 for 12 age group and 3 for 13 age group.

MAX()

The following program shows the query for this function −

Query

0: jdbc:drill:zk = local> select name,age,gender,mark3,
   max(mark3) over (partition by gender) as maximum from
   dfs.`/Users/../workspace/Drill-samples/student_list.json`;

Result

name  age gender mark3 maximum
Esha  12  female  65     98
Ganga 12  female  98     98
Leena 12  female  95     98
Mary  13  female  90     98
Peter 13  female  88     98
Adam  12   male   60     70
Amit  12   male   40     70
Bob   12   male   70     70
David 12   male   70     70
Jack  13   male   45     70

From the above query, maximum mark3 is partitioned by gender, hence female gender max mark 98 is replaced to all female students and male gender max mark 70 is replaced to all male students.

MIN()

The following program shows the query for this function −

Query

0: jdbc:drill:zk = local> select mark2,min(mark2)
   over (partition by age ) as minimum from
   dfs.`/Users/../workspace/Drill-samples/student_list.json`;

Result

mark2 minimum
 50     50
 50     50
 80     50
 70     50
 60     50
 95     50
 85     50
 45     45
 85     45
 85     45

SUM()

The following program shows the query for this function −

Query

0: jdbc:drill:zk = local> select name,age,sum(mark1+mark2)
   over (order by age ) as summation from
   dfs.`/Users/../workspace/Drill-samples/student_list.json`;

Result

name   age summation
Adam   12    970
Amit   12    970
Bob    12    970
David  12    970
Esha   12    970
Ganga  12    970
Leena  12    970
Jack   13    1395
Mary   13    1395
Peter  13    1395

Here mark1+mark2 result is replaced separately to each male and female student.

Ranking Window Functions

Following is the table listed out with ranking window functions.

Function Return Type Description

CUME_DIST()

DOUBLE Calculates the relative rank of the current row within a window partition (number of rows preceding or peer with current row) / (total rows in the window partition)

DENSE_RANK()

BIGINT Rank of a value in a group of values based on the ORDER BY expression and the OVER clause. Each value is ranked within its partition. Rows with equal values receive the same rank. If two or more rows have the same rank then no gaps in the rows.

NTILE()

INTEGER The NTILE window function divides the rows for each window partition, as equally as possible, into a specified number of ranked groups.

PERCENT_RANK()

DOUBLE Calculates the percent rank of the current row using the following formula: (x - 1) / (number of rows in window partition - 1) where x is the rank of the current row.

RANK()

BIGINT The RANK window function determines the rank of a value in a group of values. For example, if two rows are ranked 1, the next rank is 3.

ROW_NUMBER()

BIGINT Gives unique row numbers for the rows in a group.
Advertisements