Query Exercise: Finding Sister Locations to Help Each Other

Query Exercises
12 Comments

For this week’s query exercise, let’s start with a brief query to get a quick preview of what we’re dealing with:

That query has a few problems, but hold that thought for a moment. (You’re going to have to solve those problems, but I just wanted to show you the sample data at first to give you a rough idea of what we’re dealing with.)

Our business users are thinking about doing some targeted advertising, and they have two questions.

First, what are the top 10 Locations populated with users who seem to be really helpful, meaning, they write really good answers?

Second, what are the top 10 Locations where people seem to need the most help? Meaning, they ask a lot of questions, but they do not seem to be answering those of their neighbors?

Your first guess would probably be to adapt the query above and simply sort in different directions, but buckle up. The data presents a lot of challenges that we need to consider.

  • There are null and empty locations. Those should probably be excluded.
  • In the Users table, the Location column’s datatype is NVARCHAR(100). People can type in whatever they want, which leads to a lot of oddball one-off locations. We should filter for a minimum number of people involved, especially since we’re considering paid advertising. Let’s filter for only locations with at least 100 people in them.
  • For the sake of this exercise, let’s pretend that each Location value is geographically unique. You’ll notice in the screenshot that there are entries for “India” and for “Bangalore”. You and I know that everyone who lives in Bangalore also lives in India, but for the sake of this question, let’s pretend that’s not the case. In reality, obviously we need to clean up our Location data, but I’ll save that for another challenge.
  • High values for Reputation do not necessarily mean that people write really good answers. You can earn Reputation points a lot of ways, including writing good questions.
  • You don’t have to join the first and second queries together. Just write one query for the top 10 helpful locations, and another query for the top 10 locations that need the most help. They can be separate queries, and I’m not worried about performance for the sake of this exercise.

Part of your solution is going to involve looking at the Posts table, which stores questions and answers (and other things.) A few columns to consider:

  • Posts.PostTypeId = when 1, the Post is a Question. When 2, it’s an Answer.
  • Posts.OwnerUserId = the Users.Id who owns the question or answer.
  • Posts.Score = the quality of the question or answer.

For this exercise, I’m not expecting a specific “right” or “wrong” answer – instead, for this one, you’re probably going to have a good time sharing your answer in the comments, and comparing your answer to that of others. Feel free to put your queries in a Github Gist, and include that link in your comments. I’ll circle back in a week and write about different ways to solve it. Have fun!

Previous Post
Find Recent Superstars: Answers & Discussion
Next Post
Updated First Responder Kit and Consultant Toolkit for February 2024

12 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.