SQL LAG window function returns the previous value from the current row. Confused? Keep reading and I will clarify what this means with examples.
SQL LAG Window Function Summary: SQL LAG window function returns the previous value from the current row. Confused? Keep reading and I will clarify what this means with examples.
The easiest way to remember SQL LAG Window function is that it’s the opposite of SQL LEAD Window Function. I did a blog post and video on SQL LEAD Window Function. Go and check it out.
It’s easier to explain what LAG window function is with an example. So, I will
If you prefer to watch video instead of reading, here's a video covering the exact same topic.
SQL LAG window functions has this syntax. I know this syntax looks ambiguous, but don’t gloss over it yet. Take the time to actually understand what each component of the syntax is doing. If you don’t understand the syntax or explanations below, take heart cause the followed examples will clarify any ambiguity or confusion you may have.
LAG(return_value, offset, default_value) OVER (
PARTITION BY expr1, expr2,...
ORDER BY expr1 [ASC | DESC], expr2,...
)
Here’s a detailed explanation of what the syntax keywords and expressions mean.
LAG
= That’s just the SQL keyword clause to identity the function.
return_value
= The column or SQL query that you want used as the basis for the LAG Function.
offset
= How many rows from the current row should be returned. Default value is 1.
In SQL, the default offset is 0 which means that query results are returned starting from the 1st row. But, you can choose to return query results starting from the 2nd row, or 3rd row, or 4th row, etc. That’s what offset
in SQL means, and the same concept applies here. I will show you an example of using/not using SQL Offset Below.
If you want query result returned starting from 2nd row, then set offset = 1, 3rd row? offset = 2. etc.
default
= LAG window function returns the “previous” row from the current row within the same partition. If there is no value in the “previous” row, it returns the value defined in “default”. The default
value is NULL
. You can specify any string or integer as the default value. I will show example of this below.
PARTITION BY expr1
= Window function utilizes partitions. Partitions is a way to group related data together using categorical data from one or more columns (not to be confused with group by). What is defined in expr1
is the column/datapoint to be used for the partition.
When nothing is specified for partition, then the entire dataset is treated like a single partition.
You can use multiple columns/expressions in your partition that’s why you see PARTITION BY expr1, expr2,...
The data will be partitioned in the order the columns were specified in. Below, I will show you an example of using/not using a partition.
The simplest way to think of partition is that you are creating subsets of your data based on a category like department.
ORDER BY expr1
= Order by is exactly what it sounds like. How should the resulting data be ordered? expr1
here represents the column to be used for ordering the data.
Once again just like partition, you can have multiple columns as part of the order by
clause. And again, it will be ordered sequentially in the order the columns were specified in.
There are some window functions that are order-dependent which means that how the data is ordered impacts the results. Order by is essentially required for these window functions.
But, LAG SQL window function IS NOT order-dependent which means that the order by clause can be omitted and you will still get a result. This is possible because regardless of how the data is ordered, the result is determined by the offset from the current row.
Don’t worry if non of these make sense yet, this syntax and explanations will make more sense once you see the examples below.
Below, I have included a few example of how SQL LAG Window Function can be used and most importantly, the use-case for it. I know it’s fun to learn new stuff, but more important than learning new stuff is learning WHEN to use the new stuff you learned.
Let’s start by looking at an example without specifying any partition
, offset
, or default
parameter values.
This query shows an employee hire date followed by the date the previous employee was hired.
SELECT
first_name,
last_name,
hire_date as original_hire_date,
LAG(hire_date, 1) OVER (
ORDER BY hire_date
) AS hire_date_previous
FROM
employees;
LAG Window Function no partition, offset by 1, and no default example results.
What does it do?: LAG window function basically gives you the value in the previous row after the current row from the column of interest (What is defined in return_value). The value it provides depends on the offset value.
Let’s use the example above to explain this.
The query above has no partition, so the entire dataset is used as a single partition. You will see the difference between partition and no partition later on. The query above also has 1 as the offset value.
original_hire_date
is the original column and hire_date_previous
is the LAG window function column.
The value is row 2 hire_date_previous
column is the same value as what is in row 1 original_hire_date
column
Once again, Row 4 hire_date_previous
is the same as what is on row 3 original_hire_date
column.
That’s how LAG Window function works. It gives you the value in the previous row depending on the offset value defined. Since the offset value for this query is 1, the “previous value” is one row above.
It could also give you value from the previous 2/3/4/5/6/… rows depending on what value is specified for the offset.
Speaking of offset, let’s look at an example of using Offset of 2
This example uses an offset of 2 instead of the default value of 1. This is how you define “2” for the offset LAG(hire_date, 2)
SELECT
first_name,
last_name,
hire_date as original_hire_date,
LAG(hire_date, 2) OVER (
ORDER BY hire_date desc
) AS hire_date_previous
FROM
employees;
LAG Window Function OFFSET by 2 Example Results
This is the same query as the one above, but now, we are using the offset of 2.
The value in hire_date_previous
column row 3 is same as original_hire_date
column row 1.
And once again, the value in hire_date_previous
column row 5 is same as original_hire_date
column row 3.
This is a clear example of the offset determines the value. The offset determines how many rows from the current row the data will be pulled from.
I think you got it by now, but just in case you haven’t, let’s look at one more example, this time using an offset of 5.
And This one, an offset by 5
SELECT
first_name,
last_name,
hire_date as original_hire_date,
LAG(hire_date, 5) OVER (
ORDER BY hire_date desc
) AS hire_date_previous
FROM
employees;
LAG Window Function OFFSET by 5 Example Results
Same thing again. Single partition, no defaults specified and the value in hire_date_previous
column corresponds to the value in original_hire_date
column 5 rows above.
So far, we have been using a single partition in our examples. Let’s explore using more than one partition to see what the results look like.
This example partitions by department, uses 1 as the offset, and doesn’t provide any value for “default”.
SELECT
first_name,
last_name,
department_name,
hire_date as original_hire_date,
LAG(hire_date) OVER (
PARTITION by department_name
ORDER BY hire_date
) AS hire_date_previous
FROM
employees e
INNER JOIN departments d ON
d.department_id = e.department_id;
Remember what I said above, when default
is not provided, the resulting value is NULL
(We will get into this below).
LAG Window Function - Partition by Department Example Results.
This is the main thing to remember about partitions. Partitioning is like creating subset of your data. You apply data operations to that subset. At the end of that subset, the data operation starts over. That’s the key thing to remember about partitioning.
What exactly is going on with the yellow & green boxes?
So, there’s a couple of things going on here.
If you look in the department_name column, everything is organized by department.
This is important to remember; with partitioning, the Function restarts with each partition window.
That means that hire_date_previous (Which represents the LAG Window Function results) is only applied to the Accounting department.
When it gets to Administration department, it starts over.
And when it gets to Executive department, it starts over; Same goes for finance department.
That’s what partitioning does. It breaks things into groups and for each groups, the calculation starts over. That’s what you see in the yellow & green boxes.
What’s going on with the green boxes?
If you remember earlier, I mentioned that the LAG Window Function Clause “default” has a default value of NULL. That’s what’s highlighted in the green boxes.
When the function comes to the end of the partition, there is no value in the “previous” row, so it returns whatever is specified in the “default” clause and since we didn’t specify any value for “default”, it returns NULL.
What if you wanted to provide a value for “default” instead of NULL? Let’s discuss that next.
Like the subheading suggests, this example below shows you how to specify a value for “default”
SELECT
first_name,
last_name,
department_name,
hire_date as original_hire_date,
LAG(hire_date, 1, 'Not Applicable') OVER (
PARTITION by department_name
ORDER BY hire_date
) AS hire_date_previous
FROM
employees e
INNER JOIN departments d ON
d.department_id = e.department_id;
LAG Window Function - Replace NULL default Value Example Results.
This query and result is the exact same one as in Example 4, but now, we are replacing the NULL Values with “Not Applicable”.
The replacement doesn’t have to be a string. You can could also specify a number as the default value.
So, learning LAG Window Function is cool and all, but…how do you actually know when and why to use it.
See, whenever I learn something new, I want to know the applicability of it. Like…this new thing I learned is cool, but WHEN do I actually use it? and WHY would I want to use it?
Let’s talk about that next.
Now you might be wondering, SQL LAG window function sounds cool, but when will I ever use it? What use-case will this be an effective solution? Don’t worry, I got you covered 😆
A good use-case for SQL LAG Window Function is calculating the sum or difference between the current row and the offset row.
Let’s look at an example to clarify this.
This example shows subtracting the hire date of employee in row X from the hire date of employee on row Y as a way to check the length of time between hire dates.
with hiring as
(SELECT
first_name,
last_name,
hire_date as original_hire_date,
LAG(hire_date) OVER (
ORDER BY hire_date
) AS hire_date_previous
FROM
employees)
select *,
julianday(original_hire_date) - julianday(hire_date_previous) as hire_diff
from hiring;
julianday is how sqlite converts date (string-format) into number of days since beginning of the julian period.
LAG Window Function Use Case Example Results.
As you can see, the column hire_diff shows the resulting difference between when the previous employee was hired (represented by hire_date_previous
) and the hire_date of the current employee which is represented by original_hire_date
.
Let’s use row 1 as an example.
In row 2, original_hire_date
represents the date Jennifer was hired and hire_date_previous
represents when Steven was hired.
Hire_diff is the number of days between when Steven (Row 1) and Jennifer (Row 2) was hired.
I am sure there are many more use cases for SQL LAG window function, but I hope this example gets you excited to try SQL LAG window function.
SQL LAG Window function gives you the value in the “previous row”. What the previous row is, is determined by the offset value. The default offset value is 1, but you can set it to any number.
When there is no value in the previous row, NULL is returned. But, instead of NULL being returned when there is no value, you can specify any string or number to be returned instead. For example, you can return “Not applicable” or the number “10” instead of NULL.
You can also specify how to divide and group the data using partition. When partition is not specified, the entire dataset is treated like a single partition.
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
If you want to replicate these results on your local computer,
Follow me on Youtube
Follow me on Linkedin
Follow me on Github
Follow me on Twitter
If you want to contact me, speak to me here
Want me to create a similar technical content for your brand? Contact me here.
Want to get updates every time I publish new content? Join my FREE Technical Newsletter here
Categories: : SQL, SQL Window Functions