Typically, there are several concepts that interviewers are testing in data science interviews, but since they may only have time to ask 1 or 2 questions, they will try to group the concepts into a single question. Therefore, it is important to know what these concepts are so that you can look them up in an interview.

So what are they really testing? What an interviewer really looks for are interviewed with a deep understanding of metric design and implementation of real-world scenarios that would be present in the data. The key phrase here is “real world scenario”, which means that there will likely be several extreme cases and scenarios that you will need to think about to solve the problem. There are 3 common concepts they test to test your understanding of how to implement code that solves real world scenarios.

Since they only have time to ask 1-2 questions in an interview before the time is up, you will often see all 3 concepts wrapped up in one question. I see this question, or a version of this question, (platform.stratascratch.com/coding-question?id=10300&python=) in almost every interview I have been or have been granted. Follow me and see if you can answer this question.

The 3 concepts you need to know are CASE, JOIN, and / CTE subqueries. Let’s go over a real interview question that covers these 3 concepts and talk about them in depth. The link to the question is here ((platform.stratascratch.com/coding-question?id=10300&python=) if you want to follow it.

CASE STATEMENTS aggregates

You will likely get some kind of categorization question where you need to categorize the data based on the values ​​you see in the table. This is very common in practice and you will probably always be categorizing and cleaning data. So a CASE statement is the simplest technique to try.

Add in the addition of aggregates like sum () and count () and they’ll be testing to see if you really know what is returned in a case when, not just the implementation. Based on the case statements, you can always add aggregate functions as a count or a sum.

Here is an example of a CASE statement with a simple aggregation in the SELECT clause for the question.

As you see in the CASe statement below, we are categorizing users based on whether they are paying customers or not. Then we apply a sum (), as it is a quick way to count the number of paying customers versus non-paying customers in a simple query. If we didn’t have the CASE statement, we would need two queries to find both numbers.

SELECT date, sum (CASE

WHEN pay_customer = ‘yes’ THEN downloads

END) HOW to pay,

sum (CASE

WHEN pay_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension to

Unions

The second concept is joining tables. Can you join the tables? This is the lowest bar you need to jump to be an analyst, let alone a data scientist. This bar is basically on the ground, so you can really step over it.

So in interviews, do you usually do a LEFT JOIN, CROSS JOIN, INNER JOIN? Most of their work will be using a LEFT JOIN, so they will be testing it for practicality. You will almost never use a cross join. It will use an inner join quite a bit, but the left join is a bit more complicated so they will use it just as an additional filter.

Self joins are common because it isn’t always obvious that you would be using it. But they are common in practice.

In the following example, we are joining tables to the CASE statement. We are joining two tables to our main table using a LEFT JOIN.

SELECT date, sum (CASE

WHEN pay_customer = ‘yes’ THEN downloads

END) HOW to pay,

sum (CASE

WHEN pay_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension to

LEFT JOIN ms_acc_dimension b ON a.acc_id = b.acc_id

LEFT JOIN ms_download_facts c ON a.user_id = c.user_id

GROUP BY date

ORDER BY date

Subquery / CTE

The last common concept is a subquery / CTE, basically a concept that you are working on and then you need to do more work on. This is a test to see if you can break your problem down into logical steps. Some solutions require more than one step to solve, so they’re testing to see if you can write code that follows a logical flow. Not necessarily complicated or complex, but multi-step and pragmatic. This is especially useful in practice because you will be writing code of more than hundreds of lines and you need to be able to create solutions that follow a good flow.

In the example below, I take the query we wrote above and put it in a subquery so that we can query its data. In this way we can apply an additional filter in the HAVING clause and keep the complete solution in a query.

SELECT date, no_pay,

payment

FROM

(SELECT date, sum (CASE

WHEN pay_customer = ‘yes’ THEN downloads

END) HOW to pay,

sum (CASE

WHEN pay_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension to

LEFT JOIN ms_acc_dimension b ON a.acc_id = b.acc_id

LEFT JOIN ms_download_facts c ON a.user_id = c.user_id

GROUP BY date

ORDER BY date) t

GROUP BY t. Date,

t. pay,

t.non_paying

HAVING (not paying – paying)> 0

ORDER BY t.date ASC

Leave a Reply

Your email address will not be published. Required fields are marked *