Page Nav

HIDE

Breaking News:

latest

Ads Place

Commonly Asked Interview Question: Where vs Having vs Qualify

https://ift.tt/RaYfh5p Back To Basics | SQL fundamentals for beginners Image by author, created on  canva Have you ever been assigned a...

https://ift.tt/RaYfh5p

Back To Basics | SQL fundamentals for beginners

Image by author, created on canva

Have you ever been assigned a task to analyse a new dataset and you didn’t know where to start from? I have been to the same spot when I first started working with data as a Business Intelligence Consultant. It can sometimes be overwhelming to start working with data that has millions of records you are unfamiliar with.

So, it’s always a best practice to start with a smaller chunk of data and lead forward (Yes! Data Analytics 101). Pick up one subset like department, product, or location to start with, and then move to a bigger subset of data.

Believe me, knowing how to efficiently retrieve enterprise data is a crucial skill to hone since it amplifies the value of data even more; not just for the day-to-day business tasks but from the interview point of view as well, since this is the most common interview question I have faced so far.

Here is a quick guide of the most commonly used data filtering clauses available in most SQL flavours. WHERE, HAVING and QUALIFY are all used for filtering the query result-set based on a conditional expression but for different usecases.

I’m using classicmodels MySQL sample database for demonstration; it holds business data of a car retailer. Below is the ER Diagram for a quick understanding,

Image
Image by author

WHERE

  • WHERE is most commonly used clause to filter out the result-set based on a conditional expression.
  • It can be used with/without the GROUP BY clause. GROUP BY clause always follow the WHERE clause when used together.

Here is the sample data from PRODUCTS table,

SELECT * FROM CLASSICMODELS.PRODUCTS LIMIT 10;
Image by author

Say, we want the total quantity currently in stock information for PRODUCTLINE Classic Cars, Vintage Cars and Motorcycles, grouped by each productline.

#where clause example
SELECT
PRODUCTLINE,
SUM(QUANTITYINSTOCK) AS TOTAL_QUANTITY
FROM
CLASSICMODELS.PRODUCTS
WHERE PRODUCTLINE IN ('Classic Cars', 'Vintage Cars', 'Motorcycles')
GROUP BY PRODUCTLINE;

The query above will return the following output:

Image by author

HAVING

  • HAVING is similar to the WHERE clause except it is used when filtering condition that utilises any aggregate functions and involves GROUP BY clause as a filtering condition/expression.
  • GROUP BY precedes the HAVING clause. (The HAVING clause is always used after the GROUP BY clause)
  • When HAVING clause is used in a query, only groups that satisfy

Continuing to the previous example, we further want to filter our result-set to PRODUCTLINE which has total quantity more than 100000,

#having clause example
SELECT
PRODUCTLINE,
SUM(QUANTITYINSTOCK) AS TOTAL_QUANTITY
FROM
CLASSICMODELS.PRODUCTS
WHERE PRODUCTLINE IN ('Classic Cars', 'Vintage Cars', 'Motorcycles')
GROUP BY PRODUCTLINE
HAVING TOTAL_QUANTITY > 100000;

This query will return following output,

Image by author

Some key differences between WHERE and HAVING clause,

  • Syntax-wise WHERE clause comes before GROUP BY clause that means it will filter the rows based on conditional expression before performing the aggregation calculations.
    Whereas, HAVING clause comes after the GROUP BY clause in syntax, so it will filter the rows after performing aggregate calculations. From performance optimisation point of view, HAVING clause is slower than WHERE, should be avoided wherever possible.
  • When both are used together, the WHERE clause will first filter the individual rows, the rows are then grouped, perform aggregate calculations, and at the end the HAVING clause will be used to filter the groups.
  • WHERE clause can be used with all DML commands, whereas HAVING can only be used with SELECT statement.

QUALIFY

  • QUALIFY is used in scenarios where the filtering expression uses any analytical/windows functions based on user-specified conditions.

Continuing to the PRODUCTLINE example, earlier we queried total quantity of products currently in stock for product lines Classic Cars, Vintage Cars and Motorcycles. Let’s now get the top 3 products by the product line that have the highest QUANTITYINSTOCK. Since MySQL does not support QUALIFY clause, try running below query in other SQL environments where it is supported.

#qualify clause example
SELECT
PRODUCTLINE,
PRODUCTNAME,
QUANTITYINSTOCK,
ROW_NUMBER() OVER (PARTITION BY PRODUCTLINE ORDER BY QUANTITYINSTOCK DESC) AS ROW_NUM
FROM
CLASSICMODELS.PRODUCTS
WHERE PRODUCTLINE IN ('Classic Cars', 'Vintage Cars', 'Motorcycles')
QUALIFY ROW_NUM <= 3;

The above query will return,

Image by author

Same result can be derived either by using a nested query,

#qualify clause work-around using nested query
SELECT
PRODUCTLINE,
PRODUCTNAME,
QUANTITYINSTOCK,
ROW_NUM
FROM
(
SELECT
PRODUCTLINE,
PRODUCTNAME,
QUANTITYINSTOCK,
ROW_NUMBER() OVER (PARTITION BY PRODUCTLINE ORDER BY QUANTITYINSTOCK DESC) AS ROW_NUM
FROM
CLASSICMODELS.PRODUCTS
WHERE PRODUCTLINE IN ('Classic Cars','Vintage Cars','Motorcycles')
)SUBQ
WHERE SUBQ.ROW_NUM <= 3;

or my personal favourite; using CTE (Common Table Expression),

#qualify clause work-around using cte
WITH STOCK_RANKING
AS
(
SELECT
PRODUCTLINE,
PRODUCTNAME,
QUANTITYINSTOCK,
ROW_NUMBER() OVER (PARTITION BY PRODUCTLINE ORDER BY QUANTITYINSTOCK DESC) AS ROW_NUM
FROM
CLASSICMODELS.PRODUCTS
WHERE PRODUCTLINE IN ('Classic Cars','Vintage Cars','Motorcycles')
)

SELECT
PRODUCTLINE,
PRODUCTNAME,
QUANTITYINSTOCK,
ROW_NUM
FROM
STOCK_RANKING
WHERE ROW_NUM <= 3;

Conclusion

I personally try and avoid using HAVING clause for the reasons stated above. I would normally create a temporary or intermediate table where the subset of data is fetched, which in turn be used in another query; for the sake of readability and reusability. All in all, it’s always worth knowing these clauses and use it, if it is the performance optimised solution for the your business case.

Here are some useful resources,

Happy Learning!


Commonly Asked Interview Question: Where vs Having vs Qualify was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/GDvl8I7
via RiYo Analytics

No comments

Latest Articles