SQL - 12 Order of Operations and WHY

Posted On Dec 02, 2023 |

SQL Order of Operations determines what order SQL queries & clauses are evaluated and executed. Understanding this is useful for query optimization.

This is the 12 Order of Operations in SQL. This is the Order SQL executes queries. Let’s go deeper and find out why.

  1. FROM
  2. JOIN
  3. WHERE
  4. GROUP BY
  5. AGGREGATION
  6. HAVING
  7. WINDOW FUNCTIONS
  8. SELECT
  9. DISTINCT
  10. UNION/UNION ALL
  11. ORDER BY
  12. LIMIT & Offset

Most commonly used SQL Clauses are covered in this video & blog, but of course, not every SQL Clause is present in this video & blog.

SQL Order of Operations Video

Here's a video on SQL order of operations. The blog and video basically covers the same material. Feel free to watch the video instead of the blog or both.


What is SQL Order of Operation and Why does it Matter?

SQL is distinct from traditional programming languages in that it is declarative, meaning that users specify the desired data rather than detailing how to retrieve it. The SQL order of execution is crucial for query optimization.

This order determines how query clauses are evaluated and can impact query performance. Understanding this order helps in troubleshooting and optimizing queries.

While modern SQL query planners can enhance efficiency using various techniques, they ultimately must produce the same result as a query adhering to the standard SQL order of execution.

SQL From

WHAT: The FROM clause is identifying what database, schema, and table(s) to use for the query.

WHY: It makes sense SQL will start it’s operations with the FROM Clause. No table, no query operation.

For example, in this query, we are selecting the Contact Name and Customer Id from the Customer’s Table

Select 
ContactName, CustomerID
from Customers;

Results of the FROM CLAUSE

FROM CLAUSE SQL Query Results
FROM CLAUSE SQL Query Results


In SQL, * is a wildcard indicating everything. Here, we are selecting everything from the Order Details Table

SELECT * from 
[Order Details];

Result

Select everything from order tables results
Select everything from order tables results


FROM Clause is not the only way to select what tables to query with. Sometimes, we join-in other tables using the JOIN Clause.

SQL JOIN

WHAT: The JOIN Clause typically comes soon after the FROM Clause or as part of the FROM Clause. JOIN involves combining 2 or more tables together.

WHY: It makes sense that this operation will be one of the first first. The query engine need to identify all tables that’s part of this operation before it begins executing other operations.

Let’s look at a join example, but before we get to that….

Individually, this is what the tables look like

SELECT * from 
[Order Details];

SELECT * FROM ORDERS;

Results

SELECT * from [Order Details];
SELECT * FROM ORDERS;
Result of
Result of "SELECT * from [Order Details]"
Result of
Result of "SELECT * FROM ORDERS;"


Now, let’s join the tables.

Select OD.ProductID, OD.OrderId, OD.UnitPrice,
Orders.CustomerID, orders.OrderDate
from [Order Details]  OD
Join Orders on OD.ORDERID = Orders.OrderID

After joining the tables, here’s the final result

Result of a Table Join
Result of a Table Join


It makes sense that we will do a JOIN before a SELECT because the SELECT statement is using information/columns from BOTH tables. So, the tables need to be available before we can select columns from them.

SQL WHERE

WHAT: WHERE clause is used to filter the data by columns. You can use multiple columns as filters or a single column. Let’s demonstrate using where clause. It will help to solidify the information.

Before we do that though, let’s explore WHY the WHERE clause is used in this step of the sql query operation.

WHY: It makes perfect sense the WHERE clause will be used at this stage. After the operation has selected the tables for the query. It makes sense the next natural order of operation is to filter out all the data that’s not being used for the remainder of the query.

No point in carrying excess data that’s not needed through the data operation process which can slow down the query operation performance. That’s why any data that’s not included in the WHERE clause is filtered out before any operations proceed. This way, only necessary data is included in further operations.

Let’s start by getting a preview of what the data looks like before doing our WHERE transformations.

SELECT * from 
[Order Details]

Subsect of results. Just the raw table before filtering out some data. As you can see, there unitprices below the value of 10. In the next section, we are going to filter out unitprices below 10 using WHERE statement.

Results of table WITHOUT a WHERE Clause
Results of table WITHOUT a WHERE Clause


This code is simply selecting everything where unitprice is greater than 10

SELECT * from 
[Order Details]
WHERE UNITPRICE > 10;

This is a subsect of the results and as you can see, all UnitPrice here is more than 10.

Result of WHERE clause where UnitPrice is More than 10
Result of WHERE clause where UnitPrice is More than 10


This one is selecting Where unitprice is greater than 10 and quantity of item is less than 10

SELECT * from 
[Order Details]
WHERE UNITPRICE > 10 AND Quantity < 10;

This is the subsect of the result. As you can see, all quantities is less than 10 in addition to unitprice being over 10.

Results of Unit Price More than 10 and Quantities less than 10
Results of Unit Price More than 10 and Quantities less than 10


This one is doing a string search to find all the customers whose name begins with ‘L’ or ‘R’

SELECT * FROM ORDERS
WHERE CUSTOMERID LIKE 'L%' or CUSTOMERID LIKE 'R%';

Here’s a subsect of the query results. As you can see, all CustomerID’s in this result set has a starting Letter of R & L

Customer Id that Begins with R & L
Customer Id that Begins with R & L


An aggregate data cannot be used as part of a “WHERE” filter because aggregations happen AFTER the where clause is executed. Instead of using WHERE clause with aggregation, we use HAVING clause (Explained more below)

This query is going to fail because we are using an aggregate value as input for a WHERE clause.

Select 
City, count(city) as city_count
from Customers
where city_count > 1
group by City;

As expected, the query failed with the result below.

Failed WHERE statement using Aggregate Filter
Failed WHERE statement using Aggregate Filter


Keep reading to learn how to make this query work under the “HAVING CLAUSE” section.

GROUP BY

WHAT: Group by is used to group a column by it’s distinct values.

Traditionally, when a group by clause is used, all columns in the SELECT statement has to be part of the group by clause or be part of an aggregate clause. I said “traditionally” because in some RDMS systems, this is not enforced (Example being MySQL and SQLite).

The consequence of not using all column values in your SELECT clause within group by and/or aggregate clause is that the data returned will be indeterminate. Let me explain further…

If you don’t use all columns in SELECT clause in your aggregate or group by clause, then a random value will be selected for the columns that is not aggregated and has no grouping. This means that the values returned for not-grouped/not-aggregated columns will be indeterminate (randomly chosen). SQL tries to be determinate in it’s results sets.

In RDMS system like snowflake, you won’t be able to get away with this. Snowflake forces you to use all columns in SELECT clause as part of the group by clause or an aggregate clause.

Grouping data is like putting them into distinct buckets. A GROUP BY Can be used with or without an aggregate function. Also, aggregate functions are not allowed in the GROUP BY clause.

Let’s look at a few examples including how to use a group by with & without an aggregate function.

First let’s see what the data looks like without grouping

select 
customerid, country, city
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro','Paris','London')

Here’s the results

Results of Data WITHOUT Grouping
Results of Data WITHOUT Grouping


Now, when we apply grouping to the data above, you can see the data is condensed. The Country has been used to “bucket” the data, followed by city. Now, there is one row for each country and City.

This is an example of using GROUP BY WITHOUT an aggregate function

select 
customerid, country, city
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city

Here’s the results

Group by basically says…give me the unique values in this column

Results of Data AFTER GROUP BY
Results of Data AFTER GROUP BY


Now let’s take a look at an example of a group by statement WITH an aggregate function

Here, I am using count() function to count the number of times a city appears in each country for this dataset.

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city

Here’s the results. It shows how many times each city appears for each country.

How many cities in each country
How many cities in each country


So, you can use GROUP BY statement with or without an aggregate function, but you cannot use the aggregate function as part of the group by clause.

For example, If we run the above query again but use city_count in the group by statement, it will throw an error.

The query

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city, city_count

The error

Error trying to use aggregate function with group by clause
Error trying to use aggregate function with group by clause


You saw that we used the aggregate function count() above, let’s go deeper into aggregate functions.

AGGREGATION

WHAT: Aggregation is used to summarize data. Some aggregation syntax include COUNT, MIN, MAX, AVG, SUM. Aggregation simply means taking in multiple values and returning a single value.

Let’s look at some aggregate functions. Aggregate functions typically ignores nulls values except for count().

Here’s a sample of the raw data before we do any aggregations.

select * from [Order details];

Subsect of the results.

Subsect of Table before Aggregation
Subsect of Table before Aggregation


Let's start with COUNT() aggregate function.

Let’s get a count of number of items in this table where the discount = 0

But first, if we run

select count(*) from [Order details];

“*” is a wildcard indicating “everything”. So, basically we are getting a count of everything on this table. This is how many rows/records exists in this table.

AGGREGATION: Count of Everything
AGGREGATION: Count of Everything


Now count of items with no discount

select count(*) from [Order details]
where Discount=0;

As we can see, most items in this table does not have a discount

AGGREGATION: Count of Discount=0
AGGREGATION: Count of Discount=0


Let’s checkout another aggregate value - AVERAGE.

Let’s look at the average Price of an item

SELECT round(AVG(UnitPrice), 2) as [Avergae Price]
FROM [Order Details]

The average price of an item is $28.85

AGGREGATION: Average Price
AGGREGATION: Average Price


And the average Discount of items that do have a discount

round(AVG(Discount) * 100, 2)
FROM [Order Details]
where Discount>0

For items that do have a discount, they collectively have an average discount of 14.44%

AGGREGATION: Average Price of Discounted Items
AGGREGATION: Average Price of Discounted Items


Let’s move on to maximum and minimum. These functions do exactly what they sound like. They give you the MAX & MIN value of a column.

Select max(UnitPrice), min(UnitPrice)
FROM [Order Details];

Here are the results.

AGGREGATION: Max & Min
AGGREGATION: Max & Min


And finally, lets look at the sum of total inventory available which is represented by quantity.

SELECT SUM(Quantity)
FROM [ORDER DETAILS]

Results of SUM aggregate function.

AGGREGATION: Sum()
AGGREGATION: Sum()


Like I mentioned earlier, you cannot use an AGGREGATE function in a WHERE clause filter, but you can use it with a HAVING clause filter. So, let’s move on and checkout how to use the HAVING clause

HAVING

WHAT: HAVING Clause is the equivalent of WHERE statement, but for aggregate data. HAVING clause is used to filter aggregate data. 

WHY: WHERE Clause occurs BEFORE data aggregation, therefore it can’t be used to filter aggregate data. That’s where HAVING clause comes into play. HAVING Occurs AFTER data aggregation.

Here’s an example to solidify the concept.

To make the WHERE clause query above work, we replace “WHERE” with “HAVING”

This query basically gives a count of how many times each city appears in the table more than once.

Select 
City, count(city) as city_count
from Customers
group by City
having city_count > 1; #instead of `where city_count > 1`

Result. Same query as the one above, but now it works.

Replace WHERE with HAVING results
Replace WHERE with HAVING results


We have looked at HAVING, WHERE, FROM, and AGGREGATION clauses. All those are normal SQL statements you might encounter daily. Let’s look at some SQL Clauses that may not be part of your daily SQL routine starting with windows functions.

WINDOW FUNCTIONS

WHAT: Window Functions are like the cool kids of SQL. It can be used to partition data, rank the data, and do other cool stuff. 

There are many different types of windows functions like

  • Value window functions,
  • Aggregate window functions. and
  • Ranking window functions.

In a different post, I go more in depth into window functions. For today, we will just use a simple example to illustrate window functions.

WHY Does it make sense to have windows functions at this level of the SQL operation. To explain this, you have to understand how window functions work.

At the most fundamental level, window functions operate over a small window of data called “partitions”. For example, assuming you have data that look like this.

Department Employee Salary
Tech A 10000
Education B 15000
Agriculture C 28000
Education D 12000
Tech E 8000
Tech F 60000

In the data above, if we used a window function and did a partition on “Department”, then we would have 3 window “data partitions” to work with/operate over. This is the core of window functions. So, if we were using the row_number window function, this is what the data will look like.

Department Employee Salary row_number
Tech A 10000 1
Tech E 8000 2
Tech F 60000 3
Education B 15000 1
Education D 12000 2
Agriculture C 28000 1

As you can see with row_number function, within every partition, the row number restarts. If no partition is defined, then the entire data set is a single partition. Again, more in-depth tutorial on window functions later.

It makes sense that window functions will be executed at this level of operation because SQL wants to do all the operations on all the data before breaking the data apart into smaller partitions.

It wants to select and join the table(s) for the operation, filter the data down using a WHERE statement, make grouping decision based on the entire dataset, aggregate data where necessary and filter the aggregated data if needed BEFORE partitioning the data into smaller chunks.

Here’s another example of using row_number and using aggregate function (average) as a window function.

In this code, I am using the average function as an aggregate function within a window function.

This average function is calculating the average unitprice of an item within each order_id,

And using row_number function to assign row numbers to each record/row in each partition (OrderId).

select *,
round(avg(UnitPrice) over(partition by OrderID),2) as avg_unit_price_per_Order_id,
row_number() over(PARTITION by OrderID) as row_number
 from [order details]

Here’s the results. As you can see. It shows us the average unit price per item for each other OrderID as well as assigning row number to each item in each OrderID. For each unit of OrderID (Partition key “window”), it restarts the row_number and re-calculates the average order Id.

Window Functions: AVG & Row Number Results
Window Functions: AVG & Row Number Results


Hopefully this gives you a window into window functions 😆

SELECT

WHAT: Select is exactly what it sounds like. We use it to select the columns we want from the table produced by the operations above. 

WHY: It makes sense that we will do all the grouping, aggregations, table selections, window functions, before we select which columns we want for the final output.

Here’s an example of a select statement.

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1

And here are the results

SELECT columns from table results.
SELECT columns from table results.


You could also do

select 
*, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1

To select all columns in customers table, and additionally select the city_count aggregate data

In SQL, “*” is a wildcard indicating everything. As in…select all columns from this table.

DISTINCT

WHAT: Distinct allows us to select unique values from a table and disregard the duplicates.

While group by is used to group aka “bucket” same data together into their unique sections depending on which column is chosen, distinct doesn’t bucket the data.

Distinct just chooses the unique records from the data and disregards the duplicates. Distinct can be used in conjunction with Group by.

Distinct gets rids of duplicate data, that’s the main thing you need to remember

It makes sense that it’s down her in the order of operations. After selecting the columns you want, then you filter the column of choice to remove duplicate data and only return unique values.

Let’s look at a few examples.

Here’s a code sample WITHOUT distinct.

select 
Country, City
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris',
                'London', 'Toulouse', 'Versailles', 'Campinas', 'Cowes' )

As you can see, there are duplicate values.

Duplicate Data WITHOUT DISTINCT function.
Duplicate Data WITHOUT DISTINCT function.


Here is the same code, but this time with DISTINCT CLAUSE

select 
Distinct (Country), City
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris',
                'London', 'Toulouse', 'Versailles', 'Campinas', 'Cowes' )

As you can see, All the duplicates has been removed and is now a single row per record type.

NO Duplicate Data WITH DISTINCT function.
NO Duplicate Data WITH DISTINCT function.


Now that we have an understanding of DISTINCT, Let’s look at the next item in our order of operations. UNION and UNION ALL

UNION/UNION ALL

WHAT IS IT? Union vs Union All. Union ALL joins everything including the duplicate records, Union joins everything and removes duplicates.

Union is when you take 1 table and combine it with another table row-wise.

What does “row-wise” combination mean?

For example, if table A has 100 rows and table B has 105 rows. When you union these tables, the new A+B Table will have 205 rows.

For union to work, both tables need to have the same number of columns

WHY HERE: It makes sense that this Operation is down here because you want to combine the 2 tables after you have made operations in each table.

For example, here’s how union works.

Table A

Employee Salary
Angie 80000
Max 65000

Table B

Employee Salary
Tiger 15000
Lion 24000
Max 65000

Table C

Employee Salary Location
Omen 90000 Seattle
Eggs 26000 New York

If you want to UNION Table A + Table B, here’s the results

Employee Salary
Angie 80000
Max 65000
Tiger 15000
Lion 24000

If you want to UNION ALL table A + Table B, here’s the results (No duplicates removed)

Employee Salary
Angie 80000
Max 65000
Tiger 15000
Lion 24000
Max 65000

Traditionally, If you want to “Union” or “Union all” table A or Table B to Table C, it will throw an error because they do not have the same number of columns.

But again, this depends on the RDMS system. Some of them will allow you to UNION ALL for 2 tables WITHOUT same column number. It will just return null for the extra columns. At the same time, it won’t allow you to UNION with 2 tables of varying column number.

I hope this gives you a brief overview of how UNION clause works.

At some point in your SQL Query, you may want to order your resulting table/data. That’s where Order By Clause comes into play.

ORDER BY

WHAT IS IT? Order by is used to sort the order of a column or multiple columns. You can sort ascending (which is the default) or descending.

WHY HERE: It makes sense that you would want to combine both tables first before you sort and order the columns. Also makes sense to perform all operations before ordering how the final table will be presented.

Here’s an example.

Without using an ORDER BY clause,

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1

the city_count is ordered randomly

Random City_Count Ordering WITHOUT ORDER BY Clause
Random City_Count Ordering WITHOUT ORDER BY Clause


But…with ORDER BY clause,

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc;

the city_count is ordered descending

Ordered City_Count WITH ORDER BY Clause
Ordered City_Count WITH ORDER BY Clause


And Last but not least, after we have selected our tables and columns, aggregated and grouped our data, removed duplicates and ordered our table, we can also limit how much of the resulting data we retrieve. That’s where LIMIT and OFFSET comes in.

LIMIT & OFFSET

LIMIT Clause basically tells the processor how many results to return. If a table has 1000 rows, but you run a query and put a limit of 50, then only 50 rows will be returned instead of 1000 rows.

OFFSET Clause determines the starting row to start returning values.

Normally, with Limit, the number of rows returned starts from the beginning which is 1. But, you can instruct the query to return results starting in row 5.

Let’s look at some examples.

If I run this query with NO LIMIT & OFFSET …

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc;

This is the resulting table and Execution window output.

Resulting table without limit or offset
Query Execution window shows 5 rows returned
Resulting table without limit or offset
Resulting table without limit or offset
Query Execution window shows 5 rows returned
Query Execution window shows 5 rows returned

And If I run the same query and LIMIT the result to 4…

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc
LIMIT 4;

Here’s the resulting table and execution window output .

Result of LIMIT=4
Query Window shows only 4 rows returned instead of 5
Result of LIMIT=4
Result of LIMIT=4
Query Window - 4 Rows Not 5
Query Window - 4 Rows Not 5


And If I add OFFSET to the mix…

select 
customerid , Country, city, count(city) as city_count
from customers
where Country in ('Argentina', 'Brazil', 'France', 'UK')
and CITY IN ('Buenos Aires','Sao Paulo', 'Rio de Janeiro', 
'Paris','London')
group by Country, city
having city_count > 1
order by city_count desc
LIMIT 4 OFFSET 2;

Here are the results. As you can see, OFFSET takes priority over Limit.

So, technically, the order of operations here is OFFSET, then LIMIT.

Here, we can clearly see that the first 2 results (rows) were ignored because I set the OFFSET to 2.

Then…I set the limit to 4, but there is not enough data to return 4 values, so only 3 were returned.

Combination of OFFSET and LIMIT. OFFSET takes priority to LIMIT
Query window shows 3 rows returned because offset eliminated first 2 rows
OFFSET & LIMIT results
OFFSET & LIMIT results
OFFSET & LIMIT query execution window
OFFSET & LIMIT query execution window


-----------------------------------------
-----------------------------------------

I hope you enjoyed this blog. Feel free to get access to my FREE Technical Resources where I give away stuff worth 10X more than this blog for FREE.

And you will get emails whenever I publish new content.

Or maybe you don’t want access to my juicy free technical resources but you still wanna be in the know, feel free to join my free technical newsletter.

Follow me on Youtube

Follow me on Linkedin

Follow me on Github

Follow me on Twitter

If you want to contact me, speak to me here

Want me to create a similar technical blog for your brand? Contact me here.

If you want to run these queries on your own computer,

  1. First download SQLite or use SQLite Online
  2. Second download the Northwind DB from my website for FREE when you sign up for my FREE Tech Resources
  3. Third, Open the Northwind DB with SQLite
  4. Copy and paste the queries above
  5. And Lastly, Enjoy your Query Exploration


Categories: : Data Analysis, Data Engineering, Database, SQL