SQL Order of Operations determines what order SQL queries & clauses are evaluated and executed. Understanding this is useful for query optimization.
Most commonly used SQL Clauses are covered in this video & blog, but of course, not every SQL Clause is present in this video & blog.
Here's a video on SQL order of operations. The blog and video basically covers the same material. Feel free to watch the video instead of the blog or both.
SQL is distinct from traditional programming languages in that it is declarative, meaning that users specify the desired data rather than detailing how to retrieve it. The SQL order of execution is crucial for query optimization.
This order determines how query clauses are evaluated and can impact query performance. Understanding this order helps in troubleshooting and optimizing queries.
While modern SQL query planners can enhance efficiency using various techniques, they ultimately must produce the same result as a query adhering to the standard SQL order of execution.
WHAT: The FROM clause is identifying what database, schema, and table(s) to use for the query.
WHY: It makes sense SQL will start it’s operations with the FROM Clause. No table, no query operation.
For example, in this query, we are selecting the Contact Name and Customer Id from the Customer’s Table
Select
ContactName, CustomerID
from Customers;
Results of the FROM CLAUSE
In SQL, * is a wildcard indicating everything. Here, we are selecting everything from the Order Details Table
SELECT * from
[Order Details];
Result
FROM Clause is not the only way to select what tables to query with. Sometimes, we join-in other tables using the JOIN Clause.
WHAT: The JOIN Clause typically comes soon after the FROM Clause or as part of the FROM Clause. JOIN involves combining 2 or more tables together.
WHY: It makes sense that this operation will be one of the first first. The query engine need to identify all tables that’s part of this operation before it begins executing other operations.
Let’s look at a join example, but before we get to that….
Individually, this is what the tables look like
SELECT * from
[Order Details];
SELECT * FROM ORDERS;
Results
SELECT * from [Order Details]; | SELECT * FROM ORDERS; | |
Now, let’s join the tables.
Select OD.ProductID, OD.OrderId, OD.UnitPrice,
Orders.CustomerID, orders.OrderDate
from [Order Details] OD
Join Orders on OD.ORDERID = Orders.OrderID
After joining the tables, here’s the final result
It makes sense that we will do a JOIN before a SELECT because the SELECT statement is using information/columns from BOTH tables. So, the tables need to be available before we can select columns from them.
WHAT: WHERE clause is used to filter the data by columns. You can use multiple columns as filters or a single column. Let’s demonstrate using where clause. It will help to solidify the information.
Before we do that though, let’s explore WHY the WHERE clause is used in this step of the sql query operation.
WHY: It makes perfect sense the WHERE clause will be used at this stage. After the operation has selected the tables for the query. It makes sense the next natural order of operation is to filter out all the data that’s not being used for the remainder of the query.
No point in carrying excess data that’s not needed through the data operation process which can slow down the query operation performance. That’s why any data that’s not included in the WHERE clause is filtered out before any operations proceed. This way, only necessary data is included in further operations.
Let’s start by getting a preview of what the data looks like before doing our WHERE transformations.
SELECT * from
[Order Details]
Subsect of results. Just the raw table before filtering out some data. As you can see, there unitprices
below the value of 10. In the next section, we are going to filter out unitprices
below 10 using WHERE statement.
This code is simply selecting everything where unitprice is greater than 10
SELECT * from
[Order Details]
WHERE UNITPRICE > 10;
This is a subsect of the results and as you can see, all UnitPrice here is more than 10.
This one is selecting Where unitprice
is greater
than 10
and quantity
of item is less
than 10
SELECT * from
[Order Details]
WHERE UNITPRICE > 10 AND Quantity < 10;
This is the subsect of the result. As you can see, all quantities
is less than 10 in addition to unitprice
being over 10.
This one is doing a string search to find all the customers whose name begins with ‘L’ or ‘R’
SELECT * FROM ORDERS
WHERE CUSTOMERID LIKE 'L%' or CUSTOMERID LIKE 'R%';
Here’s a subsect of the query results. As you can see, all CustomerID’s in this result set has a starting Letter of R & L
An aggregate data cannot be used as part of a “WHERE” filter because aggregations happen AFTER the where clause is executed. Instead of using WHERE clause with aggregation, we use HAVING clause (Explained more below)
This query is going to fail because we are using an aggregate value as input for a WHERE clause.
Select
City, count(city) as city_count
from Customers
where city_count > 1
group by City;
As expected, the query failed with the result below.
Keep reading to learn how to make this query work under the “HAVING CLAUSE” section.
WHAT: Group by is used to group a column by it’s distinct values.
Traditionally, when a group by clause is used, all columns in the SELECT statement has to be part of the group by clause or be part of an aggregate clause. I said “traditionally” because in some RDMS systems, this is not enforced (Example being MySQL and SQLite).
The consequence of not using all column values in your SELECT clause within group by and/or aggregate clause is that the data returned will be indeterminate. Let me explain further…
If you don’t use all columns in SELECT clause in your aggregate or group by clause, then a random value will be selected for the columns that is not aggregated and has no grouping. This means that the values returned for not-grouped/not-aggregated columns will be indeterminate (randomly chosen). SQL tries to be determinate in it’s results sets.
In RDMS system like snowflake, you won’t be able to get away with this. Snowflake forces you to use all columns in SELECT clause as part of the group by clause or an aggregate clause.
Grouping data is like putting them into distinct buckets. A GROUP BY Can be used with or without an aggregate function. Also, aggregate functions are not allowed in the GROUP BY clause.
Let’s look at a few examples including how to use a group by with & without an aggregate function.
First let’s see what the data looks like without grouping
select
customerid, country, city
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro','Paris','London')
Here’s the results
Now, when we apply grouping to the data above, you can see the data is condensed. The Country has been used to “bucket” the data, followed by city. Now, there is one row for each country and City.
This is an example of using GROUP BY WITHOUT an aggregate function
select
customerid, country, city
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
Here’s the results
Group by basically says…give me the unique values in this column
Now let’s take a look at an example of a group by statement WITH an aggregate function
Here, I am using count() function to count the number of times a city appears in each country for this dataset.
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
Here’s the results. It shows how many times each city appears for each country.
So, you can use GROUP BY statement with or without an aggregate function, but you cannot use the aggregate function as part of the group by clause.
For example, If we run the above query again but use city_count in the group by statement, it will throw an error.
The query
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city, city_count
The error
You saw that we used the aggregate function count() above, let’s go deeper into aggregate functions.
WHAT: Aggregation is used to summarize data. Some aggregation syntax include COUNT, MIN, MAX, AVG, SUM. Aggregation simply means taking in multiple values and returning a single value.
Let’s look at some aggregate functions. Aggregate functions typically ignores nulls values except for count().
Here’s a sample of the raw data before we do any aggregations.
select * from [Order details];
Subsect of the results.
Let's start with COUNT() aggregate function.
Let’s get a count of number of items in this table where the discount = 0
But first, if we run
select count(*) from [Order details];
“*” is a wildcard indicating “everything”. So, basically we are getting a count of everything on this table. This is how many rows/records exists in this table.
Now count of items with no discount
select count(*) from [Order details]
where Discount=0;
As we can see, most items in this table does not have a discount
Let’s checkout another aggregate value - AVERAGE.
Let’s look at the average Price of an item
SELECT round(AVG(UnitPrice), 2) as [Avergae Price]
FROM [Order Details]
The average price of an item is $28.85
And the average Discount of items that do have a discount
round(AVG(Discount) * 100, 2)
FROM [Order Details]
where Discount>0
For items that do have a discount, they collectively have an average discount of 14.44%
Let’s move on to maximum and minimum. These functions do exactly what they sound like. They give you the MAX & MIN value of a column.
Select max(UnitPrice), min(UnitPrice)
FROM [Order Details];
Here are the results.
And finally, lets look at the sum of total inventory available which is represented by quantity.
SELECT SUM(Quantity)
FROM [ORDER DETAILS]
Results of SUM aggregate function.
Like I mentioned earlier, you cannot use an AGGREGATE function in a WHERE clause filter, but you can use it with a HAVING clause filter. So, let’s move on and checkout how to use the HAVING clause
WHAT: HAVING Clause is the equivalent of WHERE statement, but for aggregate data. HAVING clause is used to filter aggregate data.
WHY: WHERE Clause occurs BEFORE data aggregation, therefore it can’t be used to filter aggregate data. That’s where HAVING clause comes into play. HAVING Occurs AFTER data aggregation.
Here’s an example to solidify the concept.
To make the WHERE clause query above work, we replace “WHERE” with “HAVING”
This query basically gives a count of how many times each city appears in the table more than once.
Select
City, count(city) as city_count
from Customers
group by City
having city_count > 1; #instead of `where city_count > 1`
Result. Same query as the one above, but now it works.
We have looked at HAVING, WHERE, FROM, and AGGREGATION clauses. All those are normal SQL statements you might encounter daily. Let’s look at some SQL Clauses that may not be part of your daily SQL routine starting with windows functions.
WHAT: Window Functions are like the cool kids of SQL. It can be used to partition data, rank the data, and do other cool stuff.
There are many different types of windows functions like
In a different post, I go more in depth into window functions. For today, we will just use a simple example to illustrate window functions.
WHY Does it make sense to have windows functions at this level of the SQL operation. To explain this, you have to understand how window functions work.
At the most fundamental level, window functions operate over a small window of data called “partitions”. For example, assuming you have data that look like this.
Department | Employee | Salary |
---|---|---|
Tech | A | 10000 |
Education | B | 15000 |
Agriculture | C | 28000 |
Education | D | 12000 |
Tech | E | 8000 |
Tech | F | 60000 |
In the data above, if we used a window function and did a partition on “Department”, then we would have 3 window “data partitions” to work with/operate over. This is the core of window functions. So, if we were using the row_number window function, this is what the data will look like.
Department | Employee | Salary | row_number |
---|---|---|---|
Tech | A | 10000 | 1 |
Tech | E | 8000 | 2 |
Tech | F | 60000 | 3 |
Education | B | 15000 | 1 |
Education | D | 12000 | 2 |
Agriculture | C | 28000 | 1 |
As you can see with row_number function, within every partition, the row number restarts. If no partition is defined, then the entire data set is a single partition. Again, more in-depth tutorial on window functions later.
It makes sense that window functions will be executed at this level of operation because SQL wants to do all the operations on all the data before breaking the data apart into smaller partitions.
It wants to select and join the table(s) for the operation, filter the data down using a WHERE statement, make grouping decision based on the entire dataset, aggregate data where necessary and filter the aggregated data if needed BEFORE partitioning the data into smaller chunks.
Here’s another example of using row_number and using aggregate function (average) as a window function.
In this code, I am using the average function as an aggregate function within a window function.
This average function is calculating the average unitprice of an item within each order_id,
And using row_number function to assign row numbers to each record/row in each partition (OrderId).
select *,
round(avg(UnitPrice) over(partition by OrderID),2) as avg_unit_price_per_Order_id,
row_number() over(PARTITION by OrderID) as row_number
from [order details]
Here’s the results. As you can see. It shows us the average unit price per item for each other OrderID as well as assigning row number to each item in each OrderID. For each unit of OrderID (Partition key “window”), it restarts the row_number and re-calculates the average order Id.
Hopefully this gives you a window into window functions 😆
WHAT: Select is exactly what it sounds like. We use it to select the columns we want from the table produced by the operations above.
WHY: It makes sense that we will do all the grouping, aggregations, table selections, window functions, before we select which columns we want for the final output.
Here’s an example of a select statement.
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
And here are the results
You could also do
select
*, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
To select all columns in customers table, and additionally select the city_count aggregate data
In SQL, “*” is a wildcard indicating everything. As in…select all columns from this table.
WHAT: Distinct allows us to select unique values from a table and disregard the duplicates.
While group by is used to group aka “bucket” same data together into their unique sections depending on which column is chosen, distinct doesn’t bucket the data.
Distinct just chooses the unique records from the data and disregards the duplicates. Distinct can be used in conjunction with Group by.
Distinct gets rids of duplicate data, that’s the main thing you need to remember
It makes sense that it’s down her in the order of operations. After selecting the columns you want, then you filter the column of choice to remove duplicate data and only return unique values.
Let’s look at a few examples.
Here’s a code sample WITHOUT distinct.
select
Country, City
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris',
'London', 'Toulouse', 'Versailles', 'Campinas', 'Cowes' )
As you can see, there are duplicate values.
Here is the same code, but this time with DISTINCT CLAUSE
select
Distinct (Country), City
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris',
'London', 'Toulouse', 'Versailles', 'Campinas', 'Cowes' )
As you can see, All the duplicates has been removed and is now a single row per record type.
Now that we have an understanding of DISTINCT, Let’s look at the next item in our order of operations. UNION and UNION ALL
WHAT IS IT? Union vs Union All. Union ALL joins everything including the duplicate records, Union joins everything and removes duplicates.
Union is when you take 1 table and combine it with another table row-wise.
What does “row-wise” combination mean?
For example, if table A has 100 rows and table B has 105 rows. When you union these tables, the new A+B Table will have 205 rows.
For union to work, both tables need to have the same number of columns
WHY HERE: It makes sense that this Operation is down here because you want to combine the 2 tables after you have made operations in each table.
For example, here’s how union works.
Table A
Employee | Salary |
---|---|
Angie | 80000 |
Max | 65000 |
Table B
Employee | Salary |
---|---|
Tiger | 15000 |
Lion | 24000 |
Max | 65000 |
Table C
Employee | Salary | Location |
---|---|---|
Omen | 90000 | Seattle |
Eggs | 26000 | New York |
If you want to UNION Table A + Table B, here’s the results
Employee | Salary |
---|---|
Angie | 80000 |
Max | 65000 |
Tiger | 15000 |
Lion | 24000 |
If you want to UNION ALL table A + Table B, here’s the results (No duplicates removed)
Employee | Salary |
---|---|
Angie | 80000 |
Max | 65000 |
Tiger | 15000 |
Lion | 24000 |
Max | 65000 |
Traditionally, If you want to “Union” or “Union all” table A or Table B to Table C, it will throw an error because they do not have the same number of columns.
But again, this depends on the RDMS system. Some of them will allow you to UNION ALL for 2 tables WITHOUT same column number. It will just return null for the extra columns. At the same time, it won’t allow you to UNION with 2 tables of varying column number.
I hope this gives you a brief overview of how UNION clause works.
At some point in your SQL Query, you may want to order your resulting table/data. That’s where Order By Clause comes into play.
WHAT IS IT? Order by is used to sort the order of a column or multiple columns. You can sort ascending (which is the default) or descending.
WHY HERE: It makes sense that you would want to combine both tables first before you sort and order the columns. Also makes sense to perform all operations before ordering how the final table will be presented.
Here’s an example.
Without using an ORDER BY clause,
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
the city_count is ordered randomly
But…with ORDER BY clause,
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc;
the city_count is ordered descending
And Last but not least, after we have selected our tables and columns, aggregated and grouped our data, removed duplicates and ordered our table, we can also limit how much of the resulting data we retrieve. That’s where LIMIT and OFFSET comes in.
LIMIT Clause basically tells the processor how many results to return. If a table has 1000 rows, but you run a query and put a limit of 50, then only 50 rows will be returned instead of 1000 rows.
OFFSET Clause determines the starting row to start returning values.
Normally, with Limit, the number of rows returned starts from the beginning which is 1. But, you can instruct the query to return results starting in row 5.
Let’s look at some examples.
If I run this query with NO LIMIT & OFFSET …
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc;
This is the resulting table and Execution window output.
Resulting table without limit or offset | Query Execution window shows 5 rows returned | |
And If I run the same query and LIMIT the result to 4…
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc
LIMIT 4;
Here’s the resulting table and execution window output .
Result of LIMIT=4 | Query Window shows only 4 rows returned instead of 5 | |
And If I add OFFSET to the mix…
select
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro',
'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc
LIMIT 4 OFFSET 2;
Here are the results. As you can see, OFFSET takes priority over Limit.
So, technically, the order of operations here is OFFSET, then LIMIT.
Here, we can clearly see that the first 2 results (rows) were ignored because I set the OFFSET to 2.
Then…I set the limit to 4, but there is not enough data to return 4 values, so only 3 were returned.
Combination of OFFSET and LIMIT. OFFSET takes priority to LIMIT | Query window shows 3 rows returned because offset eliminated first 2 rows | |
-----------------------------------------
-----------------------------------------
I hope you enjoyed this blog. Feel free to get access to my FREE Technical Resources where I give away stuff worth 10X more than this blog for FREE.
And you will get emails whenever I publish new content.
Or maybe you don’t want access to my juicy free technical resources but you still wanna be in the know, feel free to join my free technical newsletter.
Follow me on Youtube
Follow me on Linkedin
Follow me on Github
Follow me on Twitter
If you want to contact me, speak to me here
Want me to create a similar technical blog for your brand? Contact me here.
If you want to run these queries on your own computer,
Categories: : Data Analysis, Data Engineering, Database, SQL