1. Home
  2. Blog
  3. Page 153

Erika’s view of the internet


Upon viewing The Sheep Market, Erika remarked:

“People on the internet will work for nothing, and they buy everything.”

I got a good laugh out of that. It’s pretty much true, and The Sheep Market is a great piece of evidence. Aaron Koblin used Amazon’s Mechanical Turk to get 10,000 drawings of sheep at a cost of $.02 each. He’s now reselling them in 20-packs in sticker form for $20, netting him a tidy profit. Brilliant and funny idea.

Projects are like startups: they need active communities


Tara Hunt posted her BayCHI presentation notes, and she touches just a bit on why startup companies (heck, any companies) need to focus on building active user communities and listening closely to them.

The community phenomenon is something that everyday project planners need to recognize and acknowledge. Internal IT projects benefit by building this kind of an excited user base. Project managers who try to push bloated, overwrought projects down the throats of their users will … well, it won’t be pretty. The end result is a “successful” project deployment with a tiny amount of users.

Instead, take the approach Tara encourages startup companies to take. Listen closely to your perceived user base, and tailor the project to their needs as the project moves along. Respond quickly to their requests, and adapt your plan. Your own early adopters will make your project more successful, get the word out faster, and turn your project into a vital part of their everyday routine.

Plus, the more you make your users feel valuable and the more they feel appreciated, there’s a better chance they’ll tolerate your mistakes along the way.

I like encouraging people to think of their projects as little businesses. They need to show a positive ROI so that the business understands where the money’s going, what the tangible return is, and maybe best of all, they need to see your excited customer community. This is where internal projects are a bit different than standalone businesses: an internal project with a big fan base can “lose money” for years as long as the execs understand how much it’s contributing to the success of their customers.

IT project managers – treat your projects like startup companies. Read marketing tips from these Silicom Valley veterans, and apply their startup techniques to your own projects. It’s all about building your business – or your project – into something successful.

Why I laugh in the morning


You wonder why I’m so damn chipper on the mornings of Mondays, Wednesdays and Fridays? Blame Penny Arcade, my favorite web comic. It’s chock full of in-crowd jokes, but the commentary by the authors is almost – but not quite – accessible to everybody. Take a line from today’s commentary where Tycho is talking about an author that he holds in less than high esteem:

Awesome! It’s a Goddamn retard rodeo over there. It’s fun to think about the author tippity-tappin’ this one in while he eats light bulbs in the last car of the circus train.

Be still, my laughing heart.

If you scan hard enough through their history, there’s a few non-computer-related comics like this one, but most of the time you’re not going to “get it”. My apologies in advance.

For more a more accessible, more oftenly updated, and yet just as obscene comic, check out Questionable Content.

Amazon’s new virtual computing system

Amazon Web Services (AWS)

Amazon just announced EC2, their rent-a-datacenter plan. You build a custom virtual machine using their tools, and upload it to them. Then you can turn on that virtual machine anytime you want for a whopping ten cents per hour.

It gets even better – if you have scalability needs, like if your site suddenly becomes hugely popular, then you can turn on more machines. You just get billed ten cents per hour per machine.

There’s other costs for storage and bandwidth, but the bottom line is that the cost for a startup ebusiness just got dirt cheap. Implement your idea in a virtual machine, and you’ll be able to handle obscene growth without forking over huge money right away. You don’t have to have a lot of money tied up in a datacenter, don’t have to hire a huge infrastructure staff, etc.

This makes it even easier for your average geek to build a business without the overhead involved with finding and funding staff. It’s awesome. It’s amazing. And it works on a Mac. Gotta love that.

How would you like your datacenter? Scrambled, or over easy?


I’ve had so many hilarious things going on lately, I don’t even know where to begin.

Last weekend, IBM Director started emailing us that a couple of servers were overheating. In a matter of minutes, the heat problems were spreading across different racks, and the IT crew all started driving into the office simultaneously. Our two redundant AC units for the datacenter both failed within minutes of each other.

We shut down non-essential servers as fast as we could, but the room temps were still climbing over 120. Infrared thermometers were showing the rack equipment over 150. We ended up shutting down everything but our really, really mission-critical stuff, and we were thinking about failing that over to our alternate datacenter. Hours later, the ACs came back online.

No lost data. Hooah! Gotta love happy endings. Still, the whole event was a heck of a hard way to learn some ugly lessons, and I’m looking forward to the lessons-learned meeting on Monday.

Mostly, I’m just relieved that we didn’t have to disassemble server racks, carry huge servers out into the office area, plug them in, wire them up, and run a distributed datacenter out of a bunch of cubicles. That would have been so embarrassing. I can live with long hours in the datacenter trying to rebuild machines with fried drives, I can deal with failing things over to other datacenters, but I don’t like users seeing the datacenter in disarray.

Macbook in the driver’s seat

SQL Server

Macbook in the driver’s seat, originally uploaded by BrentOzar.

My coworkers were a little surprised yesterday when I alreay had my Macbook Pro as my primary machine, and my Dell had already been relegated to riding shotgun. The only reason I even still have it on my desk is because I have to get our UniCenter guys to install the UniCenter management tools in my Mac’s virtual Windows XP machine.

Still no major complaints. Granted, this whole “switch” thing would only be possible with Parallels and the ability to run both Mac OSX and Windows simultaneously. It’s not really a switch as much as it is an addition: I haven’t switched from Microsoft to Apple, but from Microsoft to Microsoft/Apple.

This is WAY better than my Linux experiences, that’s for sure.

Coding Horror t-shirts now available


coding-horror-t-shirt-black.jpgI try not to wear too many computer-related t-shirts because they emphasize my fashion ineptitude, but when Jeff Atwood’s Coding Horror blog came out with their logo t-shirts, I could not resist. The logo is great, the blog is funny and spot-on, and let’s face it, every geek should have at least one good inside joke t-shirt.

Besides, if I didn’t buy one of these, I’d probably be buying Apple t-shirts, and I don’t want to go down that road just yet. The mere ownership of a Macbook Pro does not yet entitle me to all the coolness perks.

I’m getting a Macbook Pro!

SQL Server

MacbookPro.pngYES! YES! YES!

Or as they say in the World Cup, GOOOOOOOOOOOOOOOAL!

My boss knows about my lust after the grooviness known as the Apple Macbook Pro, and he just scored me the coolest project. I get to be the pilot guy for supporting Macs internally. I’m way psyched.

Now for the wild part – figuring out how to do my work on it. I’m a SQL Server database administrator, and I already know the Microsoft SQL Management Studio doesn’t work on Macs, but I’ll be using Parallels Desktop to run that. Beyond that one application, though, I’d rather get everything running natively on Mac OSX wherever possible – after all, that’s the entire point of supporting Macs internally. If the user’s just running Windows apps, then there’s no support difference.

And besides, Mac-native stuff would just be cooler, muhahaha.

Ernie goes to the vet


Ernie in bedErnie and I took a trip this morning to Animal General Hospital, and Dr. Campbell says she’s got a rash. Might be scabies, but probably not, and he didn’t want to do the required skin scrape test on her right now with the skin being so sensitive.

She got a shot in the bottom to prevent the itching short-term, plus a couple of prescriptions for anti-itch and antibiotics.

I’m watching the World Cup finals between Italy and France. The rest of the world is on to something – soccer is a lot of fun to watch. The action is nonstop, the athletes are really skilled, and there’s less time for commercials. Woohoo!

Microsoft System Center Operations Manager 2007 Review


I spent some time yesterday going through an evaluation of the upcoming release of Microsoft Operations Manager 2007, and figured I’d share my evaluation notes with you, dear reader.  I’ve used MOM before, and I wanted to see what was new.

I’m looking at SCOM 2007 because I’d run into a showstopper with UniCenter: it can’t monitor my 64-bit machines. Since we’re going into production with 64-bit SQL Server for the data warehouses, I need a short-term (6-12 month) solution that will handle the 64-bit monitoring until UniCenter has 64-bit agents ready.

Server Monitoring Features I Need

My main criteria in order of importance:

64-bit OS and Application Monitoring

I’m looking for both operating system monitoring and application statistics monitoring. It’s not enough to just see that a 64-bit system is up – I need to know application events, like when SQL Server is running out of log space for a particular database. UniCenter is able to gather this data, and so should my short-term solution.

Easy To Deploy & Maintain

This system has to reduce our workload, not increase it. My goals for deployment are:

  • Monitoring system deployment in 1 day
  • Install agents on each server in 1 hour
  • Add & edit users and alerts in 5 minutes
  • Have everyone on the Windows group able to do those functions in less than one hour of training

There are plenty of monitoring products out there that achieve the rest of my goals, but miss on this ease-of-use one. I’m intimately familiar with ServersAlive, for example, but if I’m the only guy in-house who can maintain it, and it takes me one day a week, then we’re worse off than when we started.

Simple Monitoring Dashboard

Email alerting is good, but when a whole bunch of things go wrong at once and then get fixed, like during a network outage, I want to be able to see what’s still broken at a glance. A good example of this is UniCenter’s display up in the CORE. Bonus points if that console is accessible via a PDA.

Server Performance Trending

Right now, I’m manually gathering statistics on my machines via Windows Performance Monitor one week per month. I assemble that data into Excel and run calculations on it to stay on top of my machine health. Unfortunately, that takes about a day per month to collect, report and analyze. I’d like to automate that process. (UniCenter does this.)

With these goals in mind, I evaluated Microsoft Operations Manager 2007 Release Candidate 1. The product is mostly done, but things like documentation and polish are still hit-or-miss.

SCOM 2007 Review: Positives

SCOM 2007 is 64-bit capable.

It monitors 64-bit operating systems and applications seamlessly. The agents install and work fine without a hitch. OM2007 automatically detects and installs the appropriate 32bit or 64bit agents, including SQL Server 2005 64-bit.

SCOM autoinstalls the right agents.

To add a new server, right-click in the Administration console and click Discovery Wizard. By default, it will scan your AD forest, but that’s not a good idea here since we’re just evaluating, so instead, click Advanced Discovery, click Next, click Browse For Computers, and click the Browse button to search the AD forest. It will ask for an admin account to install the agents, and click Other Account to use your own account if you’re an admin on that box (or a domain admin).

It then checks your servers to make sure they’re not already managed, and then asks you which management mode you’d like to use (agent or agentless). If you choose to use an agent, it will ask which account the agent should use – just use the system account.

SCOM2007 detects the operating system (Win2000 vs 2003 vs 2003 64-bit) and installs the appropriate agents.

Even better, it automatically detects supported applications like IIS, Exchange, and SQL Server, and installs the right agents for those as well. The whole process is very transparent – I didn’t even have to tell it which servers were running SQL, and which ones were not.

Bottom line: I can add a new server within 1 minute, which is way better than I’d expected. Note that after the agent is installed, it takes several minutes before health data is reported.

It’s easy to add and edit email notifications.

Adding new users takes less than a minute and requires no special skills. To add a new user, go into the console and click Administration, Notifications, Recipients, then right-click on Recipients and click New Notification Recipient.

Adding new subscriptions (alerts) is also easy. Right-click in the Administration console and click New Subscription Notification. A wizard steps you through all of the choices like which applications to monitor (Windows, IIS, SQL, etc), what levels of alerts (health, maintenance, performance, etc), and how often to send reminders.

It has a great monitoring dashboard.

The console can be installed on laptops easily, and can run over the VPN. It gives a great heads-up view of the servers.

It has in-depth application health reporting.

It has deep visibility into SQL Server health statistics, and even the default settings gave me useful information. For example, it alerted me when a database shrank itself, which is a major no-no for enterprise databases.

It has a maintenance mode.

I can put servers in maintenance mode before doing work on them. That way, when I’m rebooting a machine to fix something, no false alarm emails go out. I actively used this feature in ServersAlive at my last company, and I’ve asked for the same ability in UniCenter too. When a sysadmin group is really living by these alert emails, every reduced false alarm means a lot.

System Center Operations Manager 2007 Negatives

It still has annoying bugs.

For example, I can’t always edit my email subscription – when I right-click on it and click Properties, the entire System Center console crashes and quits. Another example is that personalized views aren’t always saved correctly, and the sort order is reset when you go from one screen to another. Maybe these bugs will be fixed in the final release, but again, maybe not.

You have to know your application’s vital stats.

Like UniCenter, SCOM2007 can send a dizzying array of notifications. For a successful implementation, the sysadmin of each box needs to understand what they’re monitoring. When I first told it to send me everything, despite only monitoring a handful of servers, I got about 150-200 emails per day – only about 10 of which were useful. I’m now in the process of whittling down the notifications.

The console requires horsepower.

It runs about the same speed as UniCenter’s tools on my P4 laptop, which is to say slow.

The dashboard is not web-accessible.

In order to manage SCOM2007, you have to install the console or remote desktop into a machine running the console. Long-term, I would prefer a web-accessible portal like UniCenter’s, and in a perfect world, it wouldn’t require Java either, so I could run it from my handheld. But now we’re just talking crazy.

What I Didn’t Evaluate

Server Uptime Reporting

It’s integrated with SQL 2005 Reporting Services and does some data warehousing as well. This was way outside of my scope of needs, though: I don’t anticipate long-term use of this as an uptime tracking tool or statistics analysis system. I’d rather get back to UniCenter for performance monitoring when its agents become available.

Agentless Auditing

OM2007 has the ability to monitor systems without installing the OM2007 agent, and it can even audit event logs without an agent. For the purposes of this evaluation, I wanted to install the agents because I wanted in-depth reporting on the applications (SQL Server and IIS).

My Recommendation

Operations Manager 2007 achieves the requirements so well and requires so little effort that I would recommend we deploy it for a short-term 64-bit monitoring solution.

Ernie goes to church


.flickr-photo { border: solid 2px #000000; }
.flickr-yourcomment { }
.flickr-frame { text-align: left; padding: 3px; }
.flickr-caption { font-size: 0.8em; margin-top: 0px; }

Ernie goes to church, originally uploaded by BrentOzar.

On Lincoln Road, no less.

IBM UpdateXpress Server worthless


This past week, we had an IBM guy come in for a quick-start session on implementing IBM Director. Maintaining dozens or hundreds of servers can be a chore, and Director is supposed to make it easier by automating hardware checks, firmware updates, and even phoning home when there’s a problem.

Part of this package is IBM UpdateXpress Server, a central repository for all of the IBM firmware updates. Admins are supposed to set up Director, point it to their UpdateXpress repository, and then match up to see which servers need which updates – and deploy them easily from inside Director. This beats doing a sneaker shuffle around the datacenter, carrying CDs and floppy disks from machine to machine.

The reality didn’t match up with the promise. UpdateXpress Server is horridly out of date. It fetches internet updates from a different web server than the public hits, and that server is way behind what the public server has. For example, our x460’s have had several critical bios updates in the last three months, and none of those updates were available to the UpdateXpress server.

That immediately set off alarms with the IBM guy, who couldn’t believe what he was seeing. We repeated the same update tests with other servers (x346’s and x366’s) only to get the same results. None of the firmware updates that had come out in the last 6-9 months were available inside UpdateXpress Server.

To make matters worse, IBM Director only allows Windows software packages to be deployed – not floppy images. When a bios update comes out only on floppies, like the CPLD updates for the x460’s, Director can’t do anything about it. It’s still time to do the sneaker shuffle. Since these updates often come out simultaneously, some in Windows and some in floppies, then I’m really not saving any time at all.

My experiences have been echoed by others who had bad experiences with UpdateXpress Server. I only wish I’d seen these on IBM’s forum before we bought the Jump Start package.

We were all disappointed with the results, and I’m posting this just to warn anybody else who’s considering the IBM Director jump start program. IBM’s servers are phenomenal, but the firmware update system is just as bad as the hardware is good.

Lusting after the new Apple Macbook


macbook20060516.pngOh yeah, I gotta have me one of these.

Here’s the deal: as of about a year ago, Apple started moving their entire lineup over to Intel processors instead of Motorola ones, which means Apple computers can run Windows. During the boot process, the user can pick which operating system they’d like to run – Apple OSX or Windows. That’s great for those of us who still have to use Microsoft programs at work, like me. I make my living working with Microsoft SQL Server, and I can’t imagine trying to administer a database without the new SQL Server Management Studio. It doesn’t run run on Macs, doesn’t run on Linux, etc – ya gotta run Windows.

So now, I could either boot into Windows, or I could use Parallels, a slick piece of VMware-like software that lets you run Windows inside a Mac.

This was all good, but until last week, the only Intel-based Mac laptop was $2k – a little too pricey for me. Now there’s the new $1100 MacBook, and that’s a heck of a cheap price point. They make a black model for $1500, but I’d rather have the white one, actually.

The only question now: how can I get work to pay for it? Heh….alright, who in the company needs a new (used) Dell D810 laptop?

End ofa long, hard day

SQL Server

End ofa long, hard day, originally uploaded by BrentOzar.

I spent the second half of the day building and installing a couple of pizza boxes from IBM. We’re cleaning up our network test lab, mounting flat panels on the walls for easier builds, and just generally making it a nicer place to spend time.

BarCamp coming to South Beach


Yeah, it’s been a while since my last blog post, sorry about that. Life is a blast here, having a ton of fun with things at work, but I’m going to try to catch up on the blog over the weekend.

First things first: BarCamp South Beach! Woohoo! I can’t believe it. My first kinda-Valley-ish experience. I’m excited about it. I haven’t been involved in the geek meatspace since I stopped going to HoustonWireless last year, although I still keep in touch with a few of those guys.

But anyway, back to BarCamp. If you’re in the Miami area, you should check out the planned BarCamp experience and consider signing up. It’s demo-centric, meaning you’re expected to share as well as learn. I’m going to be frantically brainstorming about what I could show off. I doubt the attendees would be much interested in the finer points of database systems that cost $25k per CPU, but I might see if I can put together some demos of how we use Microsoft SQL Server on handheld computers. At the very least, I can help out by bringing projectors or servers from work.

Technorati tags: , ,

Reverse-engineering databases


I’m a big fan of the relatively inexpensive Case Studio as an alternative to the big database design tools. Case Studio does a very competent job at reverse engineering databases, modeling, UML, etc.

The reverse engineering just got a whole lot better with the recent InferRI add-in that can infer relationships between tables even when no relationship is defined in the database. It looks at what fields are the primary keys in each table, and then looks for similarly named fields in other tables. This makes reverse-engineering databases a lot easier for database administrators who don’t like to force the database to enforce relationships, or who take over databases that have never been managed by a DBA before. This thing just saved me a couple hours of work.

The only drawback is that the field names have to be identical. If the Customer table has a primary key of CustomerID, and the Orders table has a field called PurchasingCustomerID, CaseStudio won’t infer the join between the two tables.

Pretty damn good for a first version of the add-in, though.

SQL performance tuning for VPN users


The latest newsletter from the excellent had a tip about doing performance tuning on high-powered machines. If you need to test large queries running inside the server room (meaning, the app or web server runs a big query against the db server), you may not get the speed results you’re expecting if you try to run that same query on your desktop. SQL-Server-Performance attributes this to low resources on the desktop, but there can also be another reason: network bandwidth and latency.

Last week, I was doing performance tuning on a SQL Server that supports users across the United States. One of the applications exhibited very high Duration numbers in the profiles that I ran, and I couldn’t figure out why. Even when I tuned the tables by adding appropriate indexes, the load on the server didn’t improve the way I’d expected. Finally, I found out that this particular application was running on machines in another state – not application servers inside our data centers like we’d typically do. The query results were being piped over their slow VPN connection down to their workstations, and so the Duration statistics were artificially high. Sure, it looked like the queries were taking forever, but in reality, it was just a slow network connection, and there was nothing I could do on the database side to speed that up.

The morale of that first story: never performance tune by looking at a single statistic alone, like Duration, Reads, Writes, or CPU. Use all of them together to gauge what queries are really having an impact.

Technorati tags: ,

Raising the dead…servers


.flickr-photo { border: solid 2px #000000; }
.flickr-yourcomment { }
.flickr-frame { text-align: left; padding: 3px; }
.flickr-caption { font-size: 0.8em; margin-top: 0px; }

Raising the dead…servers, originally uploaded by BrentOzar.

I’ve blogged about more of the downsides than the upsides of my new Cingular 8125, and it’s time for some opposing views.

Stuck in the datacenter working on a dead server, this thing proves its worth. I’m able to take full advantage of some spare time while waiting for stuff to work. I’ve caught up on my emails. I’ve caught up on my voicemail, thanks to our slick VoIP phones that send our voicemails to email. I’m even blogging! Hooah!

Now I just need a pair of Bluetooth headphones so I can whistle while I work.

Why open source will triumph: one big community


South_Beach_Boardwalk.jpgFor the past few days, I’ve complained a couple of times about the lack of testing on my new Cingular 8125. My troubles continue with the MMS integration. The device has a great digital camera, and it has great email support, but do the two work together? Not so much.

1. Take a picture. (I took this particular one yesterday afternoon while walking Ernie down the boardwalk, and I’m only now getting the damn thing into my blog because this process is such a pain in the ass.)
2. Click Send at the bottom of the screen.
3. You get options for each of your email accounts – but not Goodlink, and I can almost understand that because it’s a third-party email client. However, Cingular touts its support, so Cingular should really support it. Bottom line, you can’t send camera phone attachments with Goodlink. But I digress – back to this particular bug. Choose MMS, a picture message.
4. The MMS window pops up, and the first thing is the “To” address.
5. Start typing a person’s name, an email address, or a phone number.

Nothing happens.

When I start typing an email address or a phone number, it should pick from my list of contacts and complete the address for me. Or at the very least, it should pop up a clumsy additional window and let me pick an address from my address book. No such luck. It’s as if my address book on the device didn’t even exist. Come on, man, this device fits in the palm of my hand – why are all of the parts so damned independent?

This brings me to testing. The bug reporting format I used above was copy/pasted from the Flock wiki entry on submitting Flock bugs. If I have a problem with the browser, I can simply submit a bug report like I did this morning. I caught a minor grammatical error and submitted a bug for it. I can then watch the Flock crew react to it, hopefully fix it, and publish the fixed code. I can download the fixed browser and see my contribution in action. The very fact that they’re open to my opinion gets me even more excited about using their product.

Now, contrast that with my Cingular 8125. I shudder to think at what would happen if I tried to report a bug to Cingular. I know they have to deal with a lot more users, and they have to insulate their developers better from their end users, but there should still be an open forum at some point where I can go to find problems other people are having with this device and its firmware. Oh, sure, there are forums alright, but they’re independent third-party phone forums that have sprung up out of desperation because they can’t get what they want from the big companies. But that doesn’t make me more excited about Cingular – it makes me more excited about the forums. Cingular has a chance to build their brand here, build a community of involved users, and they’re missing the boat.

Flock, on the other hand, Gets It. They understand that by building a community of involved users, they encourage their product to grow and spread wings.

SQL Server training for developers: primary keys & indexes

Indexing, SQL Server

I had to do some developer training last week and I wrote up a paper on the basics of primary keys and indexes. Sure, there’s tons of similar stuff around the net, but this is MINE, baby.

Our Table: Phone Numbers

For our evil training purposes, let’s say we work for the phone company, and we need a database table with phone numbers. We need to track:

  • Phone number (required)
  • Billing contact last name (required)
  • Billing contact first name (required)
  • Business name (optional)
  • Business category (restaurant, dog groomer, auto dealer, etc)
  • Address 1
  • Address 2
  • City
  • State
  • Zip
  • Service start date

(Sometimes a person or a business will have multiple phone numbers, but for the sake of this training, let’s keep it a simple flat table.)

We will never have two records in here with the same phone number. We have to tell our database about that by making the phone number our primary key. When we make the phone number the primary key, we’re telling SQL Server that there can be no duplicate phone numbers.

That means every time a record is inserted or updated in this table, SQL Server has to check to make sure nobody exists with that same phone number.

As of the year 2000, there were about 360,000 people in Miami. Throw in businesses, and let’s say our table has 500,000 records in it.

That means by default, every time we insert one eensy little record, SQL Server has to read half a million records just to make sure nobody else has the same phone number!

Every 1 write = 500,000 reads.

Well, that won’t work, will it? So let’s organize our table in the order of phone number. That way, when SQL Server inserts or updates records, it can quickly jump to the exact area of that phone number and determine whether or not there’s any existing duplicates.

This is called setting up a primary CLUSTERED key. It’s called clustered because – well, I have no idea why it’s called clustered, but the bottom line is that if you could look at the actual hard drive the data was stored on, it would be stored in order of phone number.

This is different from a primary NONCLUSTERED key – nonclustered means it has nothing to do with the way the data is stored on the physical hard drive.

So to recap:

  • Primary Clustered Key means 1 write = a few reads
  • Primary Nonclustered Key means 1 write = 500,000 reads

Sounds like a black and white decision, right? It usually is. There are ways to make primary nonclustered keys fast, but it needs to be a very careful decision not taken lightly, because there are a heck of a lot of ways to make them dead slow.

But let’s get back to our phone number table. Now we have the table organized by phone number, and if we want to find people by phone number, it’ll be very fast. While our computer systems will usually need to grab people’s data by phone number, our customers and end users often need to get numbers by other ways. That’s where indexes come in.

The White Pages: A Lookup Index

Our customers constantly need to find people’s phone numbers by their name. They don’t know the phone number, but they know the last name and first name. We would create an index called the White Pages:

  • Billing contact last name
  • Billing contact first name
  • Phone number

That index would save people a ton of time. Think about how you use the white pages:

  • You scan through pages looking at just the letters at the top until you get close
  • When you get close, you open up the full book and jump to the right letters
  • You can quickly find the right single one record

Now think about how you would do it without the White Pages. Think if you only had a book with 500,000 records in it, organized by phone number. You would have to scan through all 500,000 records and check the last name and first name fields.

The database works the same way, except it’s even worse! If a developer wrote a SQL query looking for the phone number, it would look like this:

SELECT PhoneNumber FROM Directory WHERE LastName = ‘Smith’ AND FirstName = ‘John’

That doesn’t say select the top one – it says select ALL of them. When you, as a human being, go through that list of 500,000 phone numbers, you would stop when you thought you found the right John Smith. The database server can’t do that – if it finds John Smith at row #15, it doesn’t matter, because there might be a few John Smiths. Whenever you do a table scan and you don’t specify how many records you need, it absolutely, positively has to scan all 500,000 records no matter what.

If the database has an index by last name and first name, though, the database server can quickly jump to Smith, John and start reading. The instant it hits Smith, Johnathan, it knows it can stop, because there’s no more John Smiths.

Covering Fields: Helping Indexes Work For You

But that’s not always enough. Sometimes we have more than one John Smith, and the customer needs to know which John Smith to call. After all, if your name was John Smith, and the phone book didn’t include your address, you’d get pretty tired of answering the phone and saying, “No, you want the John Smith on Red Road. He’s 305-838-3333.”

So we would add the Address 1 field in there too.

  • Billing contact last name
  • Billing contact first name
  • Address 1
  • Phone number

Do we absolutely need the address in our index for every query? No, but we include it for convenience because when we DO need it, we need it bad. And if we DON’T need it, it doesn’t really hurt us much.

This is called a covering index because it covers other fields that are useful.

Adding the address field to our index does make it larger. A phone book without addresses would be a little thinner, and we could pack more on a page. We probably don’t want to include the Address 2 field, because the Address 1 field is enough to get what we need. The database administrator has to make judgement calls as to which fields to use on a covering index, and which ones to skip.

When building covering indexes, the covering fields go at the end of the index. Obviously, this index would suck:

  • Billing contact last name
  • Address 1
  • Billing contact first name
  • Phone number

We don’t want all of the Smiths ordered by their address, and then a jumbled mess of first names. That wouldn’t be as fast and easy to use. That’s why the covering fields go at the end, and the names go first – because we use those.

Selectivity: Why the Last Name Goes First

If you wanted to search for Brent Ozar in the phone book, you look in the O’s for Ozar first, and then you’ll find Ozar, Brent. This is more efficient than organizing the phone book by first name then last name because there are more unique last names than first names. There are probably more Brents in Miami than Ozars.

This is called selectivity. The last name field is more selective than the first name field because it has more unique values.

For lookup tables – meaning, when users need to look up a specific record – when you’ve narrowed down the list of fields that you’re going to use in an index, generally you put the most selective field first.

Indexes should almost never be set up with a non-selective field first, like Gender. Imagine a phone book organized by Gender, Last Name, First Name: it would only be useful when you wanted a complete list of all women in Miami. Not that that’s a bad thing – but no matter how much of a suave guy you think you are, you don’t really need ALL of the women in Miami. This is why non-selective indexes aren’t all that useful on lookup tables.

This rule is really important for lookup tables, but what if you aren’t trying to look up a single specific record? What if you’re interested in a range of records? Well, let’s look at…

The Yellow Pages: Another Index

When we need to find a dog groomer, we don’t want to go shuffling through the white pages looking for anything that sounds like a dog groomer. We want a list of organized by business category:

  • Business Category
  • Business Name
  • Address 1
  • Phone Number

Then we’ll look at the list of businesses, see which name sounds the coolest and which address is closest to ours, and we’ll call a few of them. We’ll work with several of the records.

Here, we’re searching for a range of records, not just a single one.

Notice that we didn’t put the most selective field first in the index. The field “Business Name” is more selective than “Business Category”. But we put Business Category first because we need to work with a range of records.

When you’re building indexes, you not only need to know what fields are important, but you have to know how the user is fetching records. If they need several records in a row next to each other, then it may be more helpful to arrange the records like that by carefully choosing the order of the fields in the index.

When in doubt, experiment. Create two or more indexes with the same fields, but in different orders. Then check your query execution plan to see which index gets used.

One Phone Book per City: Partitioned Tables

When we’re dealing with a lot of data, sometimes we partition the same data into multiple tables. The phone company separates its customer list into cities, for example, so that there are separate tables for Miami and for Fort Lauderdale.

When customers want to look up a phone number, they grab the Miami phone book. They don’t want to look in the Fort Lauderdale phone book for someone who lives in Miami.

But how do they know which book to use?

They look at the front of the book, where it says “Miami, Florida”. This is a constraint: this book is constrained to Miami numbers only.

Constraints Define Partitions

Constraints help us because it means we don’t have to put fields like State in our index. When our users are looking at a single entry in the phone book, they don’t have to wonder which State the entry is for. The book is constrained to State = ‘FL’.

Databases need constraints too. We might create a table called OrderHistory_FL, and put a State field in it. However, our poor little database doesn’t know that only FL orders are stored in this table, because the table name doesn’t mean anything. We have to specifically create a constraint on the state field.

When the database server builds query execution plans, it looks at the constraints on each table and uses that knowledge to help build a better execution plan. If it knows that it will only find states of ‘FL’ in a particular table, then it knows it doesn’t have to do any work when the query says WHERE STATE = ‘FL’. The database already knows that all records in that table match the constraint – just like if I tell you to find me all of the John Smiths in Miami, you don’t have to call each of the John Smiths in the phone book and ask them if they’re actually in Miami.