You are currently viewing SQL Window Functions
SQL Window Functions

SQL Window Functions

Introduction

The window function is an advanced feature of SQL that allows you to manipulate and perform advanced analytics without the need to write complex queries. In this article we will talk about Windows functions, the different types, and when to use the window function.

What is Window Function

Window functions perform calculations across a group of relevant table rows to the current row. (“More on Window Functions with PostgreSQL – Coding Ninjas”)

The window function possesses similar features to the aggregation in the GROUP BY clause. However, rows are not combined into a single row; rather, each row maintains its individuality. In other words, a window function may only return one value per row.

Since rows are not squashed together while using window functions, you can work with aggregate and non-aggregate values simultaneously.

Difference Between Aggregate Function and Window Function

AGGREGATE FUNCTION WINDOW FUNCTION
Calculations are based on a set of rows and return a single value for the entire collection. Calculations are based on a set of rows and return values for each row in the set.
Can group rows using the GROUP BY clause. Can partition rows using the PARTITION BY clause.
Standard syntaxes are SUM, AVG, COUNT, MAX, and MIN. Common syntaxes are ROW_NUMBER, RANK, LAG, LEAD, and SUM with PARTITION BY.

 

Aggregate vs Window Functions

From the image above you will notice how the GROUP BY aggregation function groups all three rows into a single row, while on the other hand, the aggregate produces the output of each row. This process reduces the need to write complex queries.

When to Use Window Functions

The list below is some of the practical use cases of Window Functions.

  • Used to calculate aggregates, such as the average, sum, and maximum, over a set of rows. Comparing the performance of several groups or seeing trends over time can both benefit from this.
  • When ranking rows by several metrics, such as sales or customer happiness. This can be helpful for figuring out outliers or for figuring out the best achievers.
  • Moving averages, which are important for spotting trends over time, can be calculated using window functions.

Different Types of Window Functions

The Window functions can be broken into 3 major categories, each with its individual subsets.

  • Aggregate Window Function
  • Ranking Window Function
  • Value Window Function

Overview of Window Functions Categories

  • Aggregate: Operates on a set of rows and returns a single value for the entire dataset. Examples include SUM(), AVG(), MAX(). The aggregate window function is usually used in calculating the running total and other statistical calculations.
  • Ranking: A function that operates on a set of rows and returns the ranking for each row. Examples include RANK(), DENSE_RANK(), ROW_NUMBER() and others.
  • Value: operate on a set of rows and return a value for each row in the set. Examples include LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE() and others.

Window Function Syntax

Now, let’s look at the Window Function syntax to better understand how the function works.


<window_function>(expression)OVER(
PARTITION BY <partition_list>
ORDER BY <order_list>
FROM <table_name>;

Syntax Window Function Breakdown

Let’s go over the syntax one after the other.

  • PARTITION BY Clause: This is used in dividing the rows into different partitions.
  • Window_function: This indicates the name of the window function you want to use.
  • Expression: This is the name column we want the window function to operate on.
  • ORDER BY clause: Used to order rows by partition.
  • ROWS clause: The number of rows on which the window function will run is specified in this section.
  • RANGE clause: The number of rows that the window function will work on is specified by this clause. The rank of each row in a sorted sequence can be determined using the RANGE clause.

Create a Database and Table in PostgreSQL

“PostgreSQL is a powerful, open-source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads”. (“PERFORMANCE BENCHMARK POSTGRESQL / MONGODB – EnterpriseDB”)

Create Database

“A database is a structured group of data that is often electronically stored in a computer system. Although there are numerous methods to structure the data, most databases employ a relational model. Data are usually kept in a table organized in a row-and-column format similar to a spreadsheet. Each row represents a unique record, and each column represents a field in the record”. (“Tables – SQL Server | Microsoft Learn”)

To create a database in PostgreSQL, use the command provided below.


CREATE DATABASE UrBizEdge_Window_Functions;

view raw

create_db.sql

hosted with ❤ by GitHub

Create Table

Now that we have created the database, let’s continue by creating a table in our database that will help store information. For this article, we will be using a table called customer_order.

The query below will be used in creating the table used in housing the entire data.


Create table customer_order_window(
Customer_id Varchar(50) primary key unique,
customer_name varchar(50),
segment varchar(50),
age int,
country varchar(50),
city varchar(50),
state_ varchar(50),
postal_code bigint,
region varchar(50),
order_num int,
sales_total numeric,
quantity_total int,
profit_total numeric
); 
Select * from customer_order_window;

Import Data into Created Table

PostgreSQL provides you with a variety of options for importing data into the database table.

The following steps listed below will help in importing a CSV file of UTF-8 into the PostgreSQL table.

Step 1: Import Wizard

In your Postgre software (PgAdmin), expand the database created and locate the table. At the table section, right-click and select “Import/Export Data,” this should take you to another window.

 

Step 2: Set Directory

For the Filename, select the right directory in your local machine and ensure the format is in the right form. Leave the Encoding to UTF-8.

 

In the Option tab ensure the Header is on and Delimiter is set as the comma. Lastly, for the Columns ensure all the right columns are appearing, then click on OK.

If all is successful, you should have the following message below.

Aggregate Window Functions

Now, that the data are all in the table starts start with the Aggregate window function.
Note:  When using the Aggregate Window Function avoid using the ORDER BY clause.

AVG() Window Function

This returns the average value for the input expression values. This function works with only numerical values and ignores NULL values.

Q) Write a query that displays the customer_id, state, and profit, showing the average profit based on the states.


select customer_id, state_, profit_total,
avg(profit_total) over(partition by state_) as AVG_
from customer_order_window;

view raw

avg().sql

hosted with ❤ by GitHub

MAX() and MIN() Window Function

This returns the maximum and minimum values for the input expression values respectively.  It also works on numerical values and ignores NULL values.

Q) Write a query that displays the customer_id, state, and profit, showing the max and min profit based on the states.


select customer_id, state_, profit_total,
max(profit_total) over( Partition by state_) as Max_profit,
min(profit_total) over( Partition by state_) as Min_profit
from customer_order_window;

view raw

max_min.sql

hosted with ❤ by GitHub

SUM() Window Function

This returns the sum of the input expression values. The SUM function works with numeric values and ignores NULL values.

Q) Write a query that displays the customer_id, state, and profit, showing the sum profit based on the states.


select customer_id, state_, profit_total,
sum(profit_total)over(partition by state_) as state_profit_total
from customer_order_window;

view raw

sum().sql

hosted with ❤ by GitHub

Ranking Window Functions

Used to give each row in the partition of the resulting set a rank. Usually, the rank of the first row within a partition is represented as one.

RANK() Window Function

This function is used in assigning rank to each row within a partition of a result set, based on the ORDER BY expression.

Note when using the RANK() function, you will notice C-3 and C-6 have the same number of orders. The Rank() function will give both the same number and skip the 3rd row.

Q) Write a query that displays the customer_id, state, and profit, showing the ranking of profit based on the states.


select customer_id, state_, profit_total,
rank() over(partition by state_ order by profit_total desc) as Rank_row
from customer_order_window;

view raw

rank().sql

hosted with ❤ by GitHub

DENSE_RANK() Window Function

This function assigns a rank to each row within a partition of a result set, based on the ORDER BY expression in the OVER clause. The rows with equal values for the ranking criteria receive the same rank. The rank of the first row within a partition is one, and the ranks are consecutive.

Note when using the RANK() function, you will notice C-3 and C-6 have the same number of orders. The Rank() function will give both the same row number.

Q) Write a query that displays the customer_id, state, and profit, showing the dense_rank of profit based on the states.


select customer_id, state_, profit_total,
dense_rank() over(partition by state_ order by profit_total desc) as Rank_row
from customer_order_window;

ROW_NUMBER() Window Function

This function assigns a unique number to each row in a partition of a result set, based on the ORDER BY expression in the OVER clause.


select customer_id, state_, profit_total,
row_number() over(partition by state_ order by profit_total desc) as Rank_row
from customer_order_window;

view raw

row_no.sql

hosted with ❤ by GitHub

Value Window Functions

The Value window function is used to calculate the value for each row in a partition of a result set. Based on the current row and the rows in the adjacent partitions, the value is determined.

LAG() Window Function

This function returns the value of the expression from the previous row in the partition. Data from previous rows can be accessed along with data from the current row using the LAG function.


select customer_id, state_, profit_total,
LAG(profit_total) over(partition by state_ order by profit_total desc) as Lag_row
from customer_order_window;

view raw

lag.sql

hosted with ❤ by GitHub

 

LEAD() Window Function

This returns the value of the expression from the next row in the partition.


select customer_id, state_, profit_total,
LEAD(profit_total) over(partition by state_ order by profit_total desc) as Lead_row
from customer_order_window;

view raw

lead.sql

hosted with ❤ by GitHub

FIRST_VALUE()

The First_Value() returns the first value of the expression in a partition.


select customer_id, state_, profit_total,
first_value(profit_total) over(partition by state_ order by profit_total desc) as first_row
from customer_order_window;

view raw

first_value.sql

hosted with ❤ by GitHub

Interview Question

Write a query and provide the name of the customer_id, the state that had the highest profit for each individual state.


select * from(
select customer_id, state_, profit_total,
rank() over(partition by state_ order by profit_total desc) as Rank_row
from customer_order_window) as rank_info
where rank_row = 1;

view raw

quest.sql

hosted with ❤ by GitHub

Conclusion

In this article, we have learned what the window function is and when to use it. We also discuss the different types of window functions and the additional clauses associated with them.

To learn more about Power BI, you should sign up for our classes, keep up with the blog, and subscribe to our YouTube channel.

 

 

Leave a Reply