Query Exercise: Looking for Email Addresses

Query Exercises
12 Comments
Brent Ozar reading
“They put their password WHERE?!?”

Personally identifiable information (PII) is freakin’ everywhere.

When companies first start looking to identify and lock down their data, they think it’s going to be as easy as identifying common columns like EmailAddress, DateOfBirth, SocialSecurityNumber, and so forth. They think, “We’ll just encrypt those columns and we’ll be fine.”

Ho ho, my sweet summer child. No, that’s not nearly enough.

Your users are completely diabolical. If you give a user a text box, they will find amazingly creative ways to stuff your database chock full of PII that you didn’t ask for, and you certainly don’t want. One of my favorite client examples was a help desk ticketing system where the support team routinely logged things like usernames, passwords, and other PII mixed in with their case notes. (Example: “Person needs help resetting their account, but can’t find their account. Tried searching under Jane Doe, Doe Jane, 123 Main Street, even tried their DOB of 1/2/1999, but I can’t find them, even though their credit card is getting billed under 5432-1234-9876-4567.”)

So when you start the journey of making your database safer by identifying and handling PII, it’s up to you to proactively dig through your database looking for this stuff. This kind of research work is often handed to database administrators because they’re trusted to look everywhere in the database, in every row and column. Plus, doing this kind of search is more efficient inside the database itself rather than on the app side – since it involves looking at every row and column.

This week, you’re that DBA, assigned that project.

Take any Stack Overflow database and examine the Users table. To keep things simple, I’m only going to ask you to look for email addresses.

This week’s challenge is a two-parter:

  1. Write a one-time query to check the Users table for email addresses. Do we have a problem with users putting email addresses where they don’t belong, and if so, what columns do we need to dig deeper into?
  2. Assuming we have a problem, write a query to find exactly which rows we need to address. Perhaps we’ll send them emails (ha ha ho ho) to log in and correct their profile, but we’re not actually going to modify their data.

Put your queries in a Github Gist and include those link in your comments. Check out the solutions from other folks, compare and contrast your work, and next week I’ll circle back with my answers & thoughts. Have fun!

Previous Post
Free Webcast Tomorrow: Fundamentals of Query Tuning
Next Post
T-SQL Has Regex in SQL Server 2025. Don’t Get Too Excited.

12 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.