Tag Archive: sql server training

SQL Server Index Tuning Tip: Identify Overlaps

Performance Tuning 101 - Add More Spoilers

Performance Tuning 101 - Add More Spoilers

If you’ve got performance troubles with an application that stores data in SQL Server, and especially if it’s a home-grown application (not a store-bought app), you can get dramatic performance improvements simply by focusing on some basic indexing techniques.  These tips and tricks pay off more than pouring money into hardware that might look good sitting in the datacenter, but doesn’t really make the application significantly faster.

When I go into a shop to speed up an application I’ve never seen before, two of my favorite quick-hits are from the index performance tuning queries from SQLServerPedia:

  • Find unused indexes – these are indexes the SQL Server engine says it’s not using.  Unused indexes incur a speed penalty because SQL Server still has to add/update the indexes as records change, so they make writes slower.
  • Find missing indexes – these are indexes SQL Server wishes it had available.

I’m not going to cover those in detail this week because I’ve already recorded tutorial videos over at SQLServerPedia for those, but I do want to focus on something these queries won’t pick up.  Sometimes a table has two nearly-identical indexes, and they’re both being used for reads.  Take these two:

CREATE NONCLUSTERED INDEX [IX_RunID_SiteID_DataSource_OutputType_PeriodType] ON [dbo].[MyTable]
([RunID] ASC,
[SiteID] ASC,
[DataSource] ASC,
[OutputType] ASC,
[PeriodType] ASC
)
 
CREATE NONCLUSTERED INDEX [IX_RunID_SiteID_DataSource_OutputType_PeriodType_QuotaItemDriverID] ON [dbo].[MyTable]
([RunID] ASC,
[SiteID] ASC,
[DataSource] ASC,
[OutputType] ASC,
[PeriodType] ASC,
[QuotaItemDriverID] ASC,
)
Performance Tuning 201 - Even Wood 2x4s Can Be Spoilers

Performance Tuning 201 - Even Wood 2x4s Can Be Spoilers

They’re two different indexes, and they’re both getting used – but does that mean we need them both?

They’re very nearly identical – but the second index has one extra field.  When the SQL Server engine gets a query that needs RunID, SiteID, DataSource, OutputType, and PeriodType – but not QuotaItemDriverID – then it will use the first index.  When it gets a query that needs all six fields, then it’ll use the second index.

In cases like this, I prefer to drop that first index and let the slightly bigger index pick up the slack.  Reading a slightly larger index will take slightly more time: if a query didn’t need that QuotaItemDriverID field, it still has to pull it off the disk in order to perform the query.  However, dropping the index pays off during inserts/updates/deletes, because it’s one less index SQL Server has to manage.  It also makes the database smaller, thereby making database maintenance tasks smaller/faster.

If:

  • I have two indexes with the exact same fields in the same order, but
  • One has 1-2 extra fields, and
  • There aren’t include fields, or the include fields are the same

Then I’ll drop the shorter index with extreme prejudice.

When Indexes Have Include Fields

If they have “include” fields, then I’ll merge the include fields between the two indexes to make one index to serve both needs.  Say we have these two indexes:

CREATE NONCLUSTERED INDEX [IX_RunID_SiteID_DataSource_OutputType_PeriodType_Includes] ON [dbo].[MyTable]
([RunID] ASC,
[SiteID] ASC,
[DataSource] ASC,
[OutputType] ASC,
[PeriodType] ASC
) INCLUDE ( [YTDRevenue], [MTDRevenue] )
 
CREATE NONCLUSTERED INDEX [IX_RunID_SiteID_DataSource_OutputType_PeriodType_QuotaItemDriverID_Includes] ON [dbo].[MyTable]([RunID] ASC,
[SiteID] ASC,
[DataSource] ASC,
[OutputType] ASC,
[PeriodType] ASC,
[QuotaItemDriverID] ASC,
) INCLUDE ( [SalespersonID], [MTDRevenue] )

The first index includes the YTDRevenue field, but the second index doesn’t.  If I just drop the first index, then queries that needed that field won’t get the full speed benefits from the second index.  To merge the two indexes, I need to drop both indexes and recreate the second one with the YTDRevenue field included, like this:

CREATE NONCLUSTERED INDEX [IX_RunID_SiteID_DataSource_OutputType_PeriodType_QuotaItemDriverID_Includes] ON [dbo].[MyTable]([RunID] ASC,
[SiteID] ASC,
[DataSource] ASC,
[OutputType] ASC,
[PeriodType] ASC,
[QuotaItemDriverID] ASC,
) INCLUDE ( [SalespersonID], [MTDRevenue], [YTDRevenue] )

In this example, I tacked the YTDRevenue field on to the end of the include field list.  The order of the included fields doesn’t matter, since SQL Server doesn’t sort by those.

Performance Tuning 301 - Beauty Just Adds Weight

Performance Tuning 301 - Beauty Just Adds Weight

Things to Watch Out For

In my examples, I kept things simple by omitting all of the extra indexing options like partitioning and sorting in TempDB.  When doing index tuning in real life, though, you’ll want to check those options to make sure they’re consistent from index to index.

Field order matters in indexes; if two indexes have the same fields but in different order, that doesn’t mean you can drop one of them.

Ideally, after making index changes, we would restart the SQL Server instance to reset the DMV counters that monitor index use.  In reality, though, that’s not so easy to pull off, so we need to log our changes to understand what the changes have been.  After making index changes, log the changes somewhere. I keep the output of the index performance tuning DMV queries in Excel spreadsheets because it’s easier to email those back and forth from machine to machine, especially when I’m consulting. The next time you do index performance tuning on the same database, you can use the historical spreadsheets to determine whether or not your changes worked the way you’d planned.

SQL Server 2008 Query Performance Tuning Distilled

SQL Server 2008 Query Performance Tuning Distilled

Learn More About SQL Server Index Tuning

I really like Grant Fritchey’s book SQL Server 2008 Query Performance Tuning Distilled, and I wrote a book review about it.  I can’t recommend it highly enough, and I’d start there.

If you don’t have the patience to wait for a book, here’s a few more blog posts about performance tuning:

Performance Tuning with Perfmon – how to set up Perfmon, what SQL Server Perfmon counters to track what the indicators mean.

Data Mining Your SQL Server Perfmon Counters – want to take your Performance Monitor statistics to a new level? I wrote an article on SQLServerPedia explaining how to use Microsoft’s free Table Analysis Tools for the Cloud plugins to dive deeply into your data.

Primary Keys and Indexes – I explain the concepts behind keys and indexes using phone books as an example. Indexes have huge impacts on performance, and if you master these you can make your server go a whole lot faster without spending more money.

SQL Server 2005 Setup Checklist – some simple configuration tweaks can get you 20-30% performance increases right from the start without spending any extra money.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SAN Multipathing Part 2: What Multipathing Does

In Part 1 of my multipathing series, I talked about what paths are, and today I’m going to be talking about multipathing.  SAN multipathing software has two goals, in this order:

  1. Protection
  2. Performance

Using SAN Multipathing for Failover Protection

What Could Go Wrong?

What Could Go Wrong?

Your server absolutely, positively has to be able to access its drives at all times.  When servers can’t access their hard drives, horrendous things happen.  When hard drives were directly attached to servers, this wasn’t a big risk, but storage area networks bring in a lot of risky factors.  Cables get unplugged or get bent beyond repair.  Switches fail.  Network configurations don’t go according to plan.

(Side note: I think this was one of the biggest reasons SAN administrators didn’t want to go to iSCSI.  They saw how our network cables looked, and they didn’t want their precious fiberoptic cables getting that same treatment.)

Multipathing software mitigates this risk by enabling the SAN admin to set up multiple routes between a server and its drives.  The multipathing software handles all IO requests, passes them through the best possible path, and takes care of business if one of the paths dies.

In the event of a problem like an unplugged cable, the multipathing software will sense that IO has taken too long, then reset the connections and pass the request over an alternate path.  The application (like SQL Server) won’t know anything went wrong, but the IO request will take longer than usual to perform.  Sometimes in SQL Server, this shows up as an application-level alert that IO has taken more than 15 seconds to complete.

To make this work, SAN administrators build in redundancy at every possible layer of the SAN infrastructure – multiple HBAs, multiple switch networks, multiple connections from the controllers, and so forth.  But most of the time, all this extra connectivity sits around idle.  It’s designed to be used for protection, but not necessarily performance: it’s active/passive gear where only one thing is active at a given time.   The secondary goal of multipathing is performance, but it’s a far, far second.  SAN administrators are so conservative, they make database administrators look like gambling addicts.  They’re perfectly comfortable leaving half or more of the infrastructure completely unused.

Do We Really Need More Bandwidth?

Depending on the SAN infrastructure, the theoretical speed limits are around:

  • 1GB Fibre Channel or iSCSI – around 125 MBs/second (this is the most commonly deployed iSCSI speed)
  • 2GB Fibre Channel – around 250 MBs/second
  • 4GB Fibre Channel – around 500 MBs/second (this is the most commonly deployed FC SAN speed)
  • 10GB iSCSI – around 1250 MBs/second

These limits were fine ten or fifteen years ago when hard drives weren’t all that fast, but here’s some sample read speeds from today’s desktop-class SATA drives:

  • One drive – around 130 MBs/second (from TomsHardware reviews)
  • RAID 5 array of five drives – around 300 MBs/second (from my home lab)

Forget 15k drives or solid state drives – even just with today’s SATA drives, 4GB Fibre Channel can get saturated fairly quickly during large sequential read operations, like SQL Server backups or huge table scans on data warehouses.  Sadly, I see so many cases where the IT staff bought a SAN with dozens or hundreds of hard drives, hooked it up to a server with just two 4GB fiberoptic connections, and they can’t understand why their storage isn’t much faster than it was with local disks.  Even if they get savvy to the basics of multipathing and try connecting more 4GB HBAs, their storage speed doesn’t necessarily increase.

Enter Active/Active Multipathing

Active/active multipathing is the ability to configure a server with multiple paths to the storage and simultaneously use all of them to get more storage bandwidth.  This type of multipathing software is usually sold by the SAN vendor, not a third party, because it’s a lot more complicated than it looks at first glance.  Talk to your SAN vendor and ask how much their active/active multipathing software costs, and what it’s compatible with.  EMC’s PowerPath even works with gear from multiple vendors.

But before you plunk down a lot of hard-earned cash – well, it’s not that hard-earned for storage administrators, but I’m talking to database administrators here – you need to ask one very important question: what exactly does this software mean by active/active?  In your feeble mind, you probably believe that you can have one array, accessed by one server, and spread the load evenly over two or more Host Bus Adapters.  Not so fast – some vendors define active/active as:

  • Only one path can be active per array at a given time. If you have four HBAs, you’ll need four arrays in the SAN, and SQL Server will need to spread the data across all four arrays.  This means designing your database filegroups and files specifically for the number of HBAs in use on your server.
  • All paths work for sending data, but only one can receive. I’ve seen this in iSCSI active/active multipathing solutions.  For SQL Server, this means you can insert/update/delete/bulk-load data at breakneck speeds, but your selects still crawl.
  • Active/active works, but failover sticks. Say you have two paths to your data, and one of the paths goes bad for some reason.  All traffic fails over to the alternate path.  When the bad path comes back up (like the cable is plugged back in, the power comes back on, the port is replaced, etc) traffic doesn’t automatically balance back out.  It stays on the single path.  The only way to find this out is with expensive SAN-monitoring software or by browsing through SAN configuration screens periodically.

For virtual servers, I’ve got bad more news: the only true active/active SAN multipathing today is in VMware vSphere 4.0 with EMC PowerPath.  Stephen Fosketts explains the storage changes in vSphere.  If you’re on VMware v3.5 or prior, on Windows Hyper-V, or on vSphere 4.0′s lower licensing tiers, you’re stuck with one HBA of throughput per server per LUN (array).  This is one reason why you might not want to virtualize your high-end SQL Servers yet: they don’t get quite the same level of throughput that you can get on physical hardware.  Don’t let that scare you off virtualization, though – remember, you’re probably reading this article because you don’t have true active/active multipathing set up on your physical SQL Servers, either.

There’s a lot of catches here, and the SAN salespeople are always going to smile and nod and say, “Oh yeah, ours does that.  That’s good, right?”  It’s up to you: you have to ask questions and test, test, test.  Get a time-limited evaluation copy of their multipathing software and test your SAN performance with SQLIO, as I explain over at SQLServerPedia.  It’s the only way to know for sure that you’re getting the most out of your storage investment.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SAN Multipathing Part 1: What are Paths?

In the beginning, computer makers created servers with hard drive bays built in.  And it was good.

Server with Built-In Hard Drives

Server with Built-In Hard Drives

Built-in drive bays are easy to manage: when you need more storage, you just plug in another hard drive.  You don’t have to fork out a lot of money or make room – the server already has the drive bays built in.  It’s so easy, a developer could do it.

Built-in drives are reliable because they’re directly connected inside a server with just a few wires.  People aren’t going to walk past the server and nudge the cables loose.  As long as the server has power, the drives have power, so admins aren’t worried about drives suddenly going offline while the server is up.

They also have drawbacks: when you run out of drive bays, you run out of options.  If you have extra drive bays in the server next door, you can’t cable them together and use the extra bays.  And perhaps worst of all, when you buy a server, you have to be pretty confident that you’re buying one with the right number of drive bays – not too many, because bigger servers cost money, and not too few since it’s not easy to add more later.

Expansion Chassis

Direct Attached Storage

Later, computer makers designed Direct Attached Storage: shelves of drives that could be attached directly to a server.  One of these DAS units could hold a dozen or more drives, thereby giving the sysadmins more storage capacity for a server.  They are cheaper than buying a new server.

They introduced two reliability risks: the DAS’s power supply could fail, thereby taking the drives offline, or the connection between the DAS and the server could be tugged loose.  Hardware makers mitigated the risks by adding redundant power supplies on enterprise-class models, and they’ve started taking steps to reduce the risk of SAS/SATA connectivity problems.

Direct Attached Storage units usually dedicated to a single server, which means that even if you only need one or two additional drives, you still have to buy a whole 12-bay chassis.  The extra bays sit around unused, wasting space in the datacenter.

SAN - Front Side

SAN - Front Side

The Solution: Storage Area Networks

Storage Area Networks (or SANs) put a full-blown network between the servers and the drives.  SANs consist of a few parts:

  • Drives (hard drives or solid state drives)
  • Drive enclosures (shelves with space for a dozen or more drives)
  • Controllers (kinda like computers that connect to the drives, and have network ports)
  • Network switches (could be fiberoptic networks or conventional Ethernet)
  • Host Bus Adapters (the fancy-pants SAN name for network cards that plug into your server, thereby connecting your servers to your drives)

Suddenly, this picture starts to get kinda complicated.  There’s a lot of parts here connected to other parts with a lot of cables.  Servers really need to see their hard drives at all times, so what do engineers do?  They build in redundancy: each part has backups, and backup ways to connect to other parts.

In the picture above, we’re showing just two parts of the SAN.  The top big black box is a single controller, and there’s two drive enclosures underneath it.  In the SAN world, this is a really basic SAN, and SAN administrators would say it’s not really redundant.  The rest of us would be stunned at how much redundancy is already included, though – check out a picture of the back side of that controller and just one of the drive enclosures.

SAN Back

SAN Back

This picture shows two units: the top half is a controller, and the bottom half is one drive enclosure.  The connections are:

A – One pair of fiberoptic cables that connect to a SAN network switch.

B – One pair of fiberoptic cables that start a loop down to the drive enclosures.

C – Another pair of fiberoptic cables that represent the other side of that loop started by B.

D, E, F – pairs of fiberoptic cables that run up to B/C above and to the other drive units.

At any time, the SAN admin can walk to the back of the rack, pull one pair of the B/C/D/E/F cables out that carry communication between the controller and the drive enclosures, and business will keep right on truckin’.  In fact, if the SAN has been set up according to best practices, the admin can probably pull more than one cable without taking the system down.

SANs Have Multiple Paths to Access Data

A common misperception is that SAN admins sleep so well at night because their pillows are stuffed with money.  While SAN admins do make a fortune, the sad fact is that money-stuffed pillows are surprisingly lumpy and noisy.

Instead, the reason SAN admins sleep so well is because the SAN has so many paths between each server and its drives, and a single failure just won’t stop the SAN.  Furthermore, most production SANs have multiple controllers, multiple network switches, and beyond that, two or more completely separate networks (called fabrics).  If all hell breaks loose and one defective network switch goes down or broadcasts garbage, there’s a totally independent network that stays up.

If this was your home network, it would be like having a cable modem and a DSL modem, with two separate routers.  Your home computer would have two separate network cards, each with a different TCP/IP address, connected to the two different routers.  If any one component failed, you could still continue to watch your mission-critical “adult material” without interruption.

Or could you?

In the event of a real failure, like if you were watching Hulu over your cable modem and it started to go down, odds are your movie would start to stutter and cut out.  The traffic wouldn’t be automatically and instantaneously switched over to the DSL modem.  You would probably have to do something manually, or heaven forbid, reload the movie again.  That doesn’t cut it for, say, a SQL Server trying to access its data files over the SAN: it simply can’t go down.

This is where SAN multipathing comes in: it needs to know what paths are available, what paths are not working well, and proactively route traffic over the best possible path.  In the next part of my series tomorrow, I’ll talk about the basics of multipathing, and then talk about the differences between Fibre Channel, iSCSI, and virtualization multipathing.

Continue to SAN Multipathing Part 2

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL Server Inventory Checklist

Richard Jones, a DBA in the UK, asked in with a question, and rather than giving my own opinion, I thought it’d be better to ask it here to get everybody’s opinion in the comments:

The DB team that I am a member of are about to take control of a number of SQL Server boxes from the Apps Support team (don’t ask why they’ve got control of them in the first place, ok).  I’m trying to put together a checklist of things we need to get out of them during the handover and was wondering if you had any ideas of what I might have missed from my initial thoughts??  So far I have:

  • Usernames/schemas and passwords
  • Instance/Server names and whether it’s Dev/Test or Prod
  • Number of users each box has (roughly)
  • What App runs on each box and the criticality of it
  • Backup schedules/maintenance plans/scheduled jobs
    • Tape or disk
    • Full/incremental
    • Transaction log backups
    • Recovery mode
  • 3rd party contact details for support where applicable

What else would you add?

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Upcoming Events

All kinds of webcasts and presentations coming up in the next couple of weeks:

May 14 – Quest Pain of the Week Webcast:
Developing Something for Nothing

Join SQL Server expert Brent Ozar and SQL Server developer Jeremiah Peschka in this webcast, as they discuss the free tools available to you to build solid, robust applications for free. They will cover:

  • SQL Server Express Edition – the free version of SQL Server
  • SQL Server Management Studio Express – the free way to manage your Express Edition instances
  • Toad® Freeware – the free Toad resource to manage your SQL Servers (including Express) as well as your MySQL instances

Don’t miss the overview of the free tools you can’t afford to ignore in tough economic times! In this presentation, you’ll see how to improve your productivity with an intuitive, graphical tool that will satisfy your growing SQL Server development needs.  Register for the free webcast.

May 14 – Columbus, OH PASS Chapter
Using Cloud-Based BI to Interpret Perfmon & Profiler Results

After learning how to use Perfmon and Profiler to gather performance statistics about your SQL Server, it still takes a lot of time to interpret those results and figure out what’s going on.  Microsoft’s SQL Server Data Mining team has built a free cloud-based data mining tool for Excel that can help slice and dice mountains of data and help you make sense of it all.

Even if you’re not ready for BI in the cloud, you can use this same type of tool in combination with a local SQL Server Analysis Services instance.  Wait!  Don’t freak out – it’s much easier than you think, and you never have to leave the comforting environment of Excel.  Even if this doesn’t sound like fun to you, you might want to learn about it because mid-level managers in your company might want to use this technique to analyze sales or customer data.

Attendees will learn how to install & configure data mining in Excel, how to analyze Perfmon data to break the server’s load into categories, and how to use BI to write a performance report about your SQL Server.

This one will be presented remotely, and we’ll tweet out links as we get closer to the presentation.

May 19 – PASS Application Development SIG – Online
Rolling Your Own Replication

SQL Server’s built-in replication has made great strides in the last couple of versions, but what if it’s still not enough?  One team needed more flexibility, easier administration and higher scalability, so they built their own solution instead.  Brent Ozar will discuss how the system was designed, the pros and cons, and how you can build a similar solution for your own needs.  He’ll explain some of the lessons learned in scaling this out to thousands of remote SQL Servers.

The meeting starts at 1pm.  You can watch the webcast online, and for the audio, call 1-866-379-8990, passcode 6489756.  After joining, please mute your phone.

May 19 – IndyPASS – Indianapolis, IN
SAN Multipathing: You CAN Get There From Here

SANs are expensive pieces of hardware that offer a lot of performance and failure protection.  The key is multipathing, yet DBAs rarely get exposed to it.  Brent knows firsthand: he managed data warehouses and SAN storage, and was able to wring much more performance out of his SANs when he learned multipathing.  He’ll explain the basics of multipathing, how to test for failover protection, and how to configure your storage to get the most performance possible from your investment.

The meeting is at 3500 DePauw Blvd., Pyramid #3 – Lower Level, Indianapolis, IN 46268.   I’ll be doing this one online via my Ustream channel if all goes well.  More info via Twitter on the day of the presentation.

May 27 – Converging Paths of SQL and SharePoint
Microsoft Office, Victoria London

I’m not doing this one myself, but whenever Quest does a SharePoint/SQL event, it’s always very popular so I thought I’d mention it here.  This will be a full day-long seminar with sessions on:

  • The SQL Server –SharePoint connection
  • Creating a scalable and resilient architecture
  • DBAs as SharePoint Administrators
  • Understanding and managing SharePoint

The speakers include:

  • Andrew Fryer, SQL Server Evangelist, Microsoft
  • Viral Tapara, SharePoint Evangelist, Microsoft
  • Christian Bolton, SQL Server MVP and Director/Database Architect, Coeo Ltd.
  • Doug Davis, Director of Product Management, SharePoint, Quest Software
  • Iain Kick, SQL Server Systems Consultant, Quest Software

You can register for the day-long seminar here, and it’ll be available as a webcast too.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Getting the most recent record

Sounds simple, right?  Just grab the max.  But what if you want to use a single T-SQL operation to fetch the most recent record and some of its attributes?

StorminSpank asked:

@BrentO SQL Question. 4 columns, ID, Date, Info1, Info2. ID has multiple entries. Want Latest Date for Distinct ID, but also info1 and 2.

Let’s say our table schema is:

CREATE TABLE [dbo].[TestTable](
	[id] [int] NOT NULL,
	[create_date] [date] NOT NULL,
	[info1] [varchar](50) NOT NULL,
	[info2] [varchar](50) NOT NULL,
 CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
	[id] ASC, [create_date] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO

And let’s populate it with some data:

INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (1, '1/2/09', 'Red', 'Yellow')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (1, '1/3/09', 'Orange', 'Purple')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (2, '1/1/09', 'Yellow', 'Blue')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (2, '1/5/09', 'Blue', 'Orange')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (3, '1/2/09', 'Green', 'Purple')
INSERT INTO dbo.TestTable (id, create_date, info1, info2)
VALUES (3, '1/8/09', 'Red', 'Blue')

Now let’s get out the most recent record for each ID:

SELECT tt.*
  FROM dbo.TestTable tt
    LEFT OUTER JOIN dbo.TestTable ttNewer
    ON tt.id = ttNewer.id AND tt.create_date < ttNewer.create_date
  WHERE ttNewer.id IS NULL

That left outer join is looking for any newer TestTable records, but the where clause makes sure there aren’t any. If you switch it around to “IS NOT NULL”, you’d get the exact opposite – you’d get all of the older records.

And remember, ladies and gentlemen, do as I say and not as I do – never SELECT *.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Certifications are the icing on the cake

Wondering whether or not you should get certified on something?

Experience is cake, certifications are the icing.

It’s hard to sell icing without cake.  Yes, there are some people who like to eat icing by itself, but frankly, those people are freaks.  They’re not healthy.  Something is wrong with them.

If you take an easy-to-get certification into a shop and the manager immediately proclaims, “You’re hired!” then you should run.  The kinds of managers who instantly hire based on qualifications are not the kind of managers you really want to work for, because they’re desperate for head count.  They’re not desperate because the skills market is so tough – they’re desperate because they have a really bad work environment and they’ve already burned every bridge in town.

The job you really want is going to require experience.

If you’re hungry and you don’t have cake, those ads for icing might look mighty tasty, but hold on there for a second.

Icing by itself isn’t the best option for you.

Geeks like certifications because they’re relatively easy to get by studying at home alone on weekends.

Geeks are really good at doing things at home alone on weekends.

The problem is that you’re competing with other geeks who are also home alone on weekends, and they’re all out getting certs too.  If you want to stand out, you have to do something different.  Look at your resume and count the number of people that you can count on for great references.  If there’s not at least five, let me suggest that you learn to make something other than icing.

Let’s say you’re itching to improve your job options, and as a result, you’re thinking about pursuing a certification that will take six weekends of study in order to achieve.  To determine its worth, consider its opportunity cost: what else could you do in those same six weekends?

  • Network with potential employers
  • Get experience with someone who can vouch for you (and yes, you can do this in six weekends – try volunteering for IT work at a local non-profit)
  • Build a marketing campaign for yourself (blog, Twitter, local user groups)

All of these have something in common: other people.  People are what give out jobs, not software programs or tests.

Having just icing is better than starving to death.

If you use terms like “meatbag” when describing other people and you’re dead set against interacting with them, then yes, go get certified first.  It’ll be something on your resume other than an arrest record, and that’s good for something.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Using aliases in multi-table queries

When joining multiple tables together in a query, use aliases every time on every field.

Say we’ve got two tables, Customers and Salespersons.  The Customers table has a PreferredSalespersonID field that identifies who their normal sales rep is, and that lets us quickly grab the right salesperson when a customer calls in.  We want to display basic information about the customer and their salesperson:

SELECT CompanyName, Address, City, StateID, ZipCode, SalespersonName
FROM dbo.Customers
INNER JOIN dbo.Salespersons ON Customers.PreferredSalespersonID = Salespersons.ID
WHERE Customers.ID = 12345

That works great at first, so we put it in production and off we go.  Months later, someone decides to add an Address field to the Salespersons table.  Boom goes the dynamite – suddenly this query starts failing because SQL Server isn’t sure which City field we mean. We get errors like this:

Msg 209, Level 16, State 1, Line 3
Ambiguous column name 'Address'.

To prevent this problem, always alias every single field.  My personal preference is to use an alias as short as possible, but long enough to explain what the alias points to.  Never alias tables with single letters starting with A, B, C, etc – it’s painful to read those queries.  Instead, consider aliasing the query like this:

SELECT cust.CompanyName, cust.Address, cust.City, cust.StateID, cust.ZipCode, sls.SalespersonName
FROM dbo.Customers cust
INNER JOIN dbo.Salespersons sls ON cust.PreferredSalespersonID = sls.ID
WHERE cust.ID = 12345

If I used one-letter aliases like “c” and “s”, it would still work, but I try to design every query as if I’m going to have to come back to it tomorrow and add three more tables.  If I come back and join more tables in, then the “c” and “s” might be confusing if those new tables also start with C or S.

Finally, when picking aliases, check other stored procedures and views to see if there’s already an aliasing standard in the database.  If the Customers table is already being aliased with “cmr” in other T-SQL code, then reuse that same alias even if it doesn’t make perfect sense.  The more consistent the code, the easier it is to jump from one T-SQL script to another with ease.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Free SQL Server DBA Training Videos

It’s tough for us SQL Server DBAs to get training.

Developers can get mentored by a more senior developer because it’s rare to have just one developer at a company: odds are, there’s a more senior person just down the hallway or in the next cube over.  Junior developers can ask ‘em questions, learn by reading their code, or watch ‘em work.

DBAs, on the other hand, fly solo. There’s often only one of us in the entire company, so we can’t get mentored by a more senior person.  We’re stuck with reading books, going to user groups once a month, and maybe getting one training course approved for the budget.

There’s a solution – well, actually, there’s a bunch!

SQL Server Training Video Archives

Kevin Kline and I did a full day of training videos covering SQL Server performance tuning and troubleshooting using the DMVs.  The topics included:

  • How to manage your indexes with DMVs
  • How to find out what’s going on inside the server better than SP_WHO2
  • How to play doctor with the DMVs

You can watch the archives here.  Under each session header, there’s a link to watch the webcast.

SQL Server DBA Tutorial Podcasts

Podcasts are regularly-produced video shows that you can download over the internet.  Here’s my favorite SQL Server ones:

  • SQLServerPedia Podcasts- twice a week, I record a short (3-10 minute) presentation or demo about a SQL Server topic focusing on technical training.
  • The Voice of the DBA- Steve Jones talks about industry issues and topics that are personally interesting to DBAs.  These generally aren’t howto videos – they’re more like a topical talk show from a fellow DBA who does the same work you do.
  • SSWUG SelectViews- by far and away the most professionally polished SQL Server podcasts.  Stephen Wynkoop interviews industry personalities and talks shop.  The only downside is that you can’t subscribe to the video feeds and download them to your iPod/iPhone/Zune automatically – the only way to watch the video is through your web browser.

There’s a few others like SQL Down Under and The Midnight DBA, but they don’t come out as regularly – months go by without new episodes, so it’s not as easy to recommend those for regular training.  You can peruse the old episodes though.

Free Online SQL Server Training Events

Conferences and vendors offer free webcasts, plus they put their webcast archives online for later viewing.  Here’s the ones I’ve been checking out lately:

  • SSWUG’s SQL Server 2008 Community Launch – a free series of videos including Donald Farmer talking about SSAS cubes and Ted Malone talking about Data Dude.
  • Mix 2009 Video Archives – Microsoft’s Mix conference is all about producing online experiences.  As a DBA, I work with web people, and I find this kind of thing exciting.
  • SQLShare – purely technical start-to-finish tutorial videos that explain how to accomplish a task in SQL Server.  If you like these videos, you might also like SQLServerVideos.com, but I’m not wild about that particular site – it doesn’t quite have the polish of the other options.
  • PASS Summit On Demand – requires PASS membership, but that’s free now.  I’ve had hit-or-miss results with these – some of the links don’t seem to work, and some of them are screencast-only.

If you know about more free SQL Server training video resources around the web that are frequently updated, I’d love to hear about ‘em.

More SQL Server Database Training in Book Form

I’ve also compiled my reviews of the best SQL Server training books for every focus level – junior DBAs, developers, people trying to become senior SQL Server database administrators, you name it.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Should I Specialize?

A reader asked a great question, and I just had to share it in a more general form because it applies to so many of us production database administrators.

He’s currently a production SQL Server DBA, and his company’s bringing in a new application with a SQL Server back end.  He’s doing a good job as a production DBA, so the company offered him the chance to be the application administrator.  He’ll need to learn the application and specialize in it, and he won’t be doing as much – if any – SQL Server work in his new role.  Should he take it?

So many applications offer this same opportunity for database administrators: accounting software, sales force automation systems, document management software, I could go on and on.  For me, the choice boils down to a few basic questions.

Can you bet your career on this application?

Around 2001-2002, my company was preparing to retool our software from scratch, and and we had to decide between Java and .NET.  A few of us went to training classes for both platforms, did some basic development in each, and held meeting after meeting debating the pros and cons of each platform.  None of us had a really clear feeling about what the Way of the Future would be, but we started to realize that none of the options out there were the right answer.  It was entirely possible that ten years down the road we’d be retooling everything again due to unseen weaknesses in the platform we’d chosen.

No matter what language we chose, it would have been the third programming language I’d learned in my short programming career (not including things like batch files or HTML).  I realized the rest of my development career would be a continuous cycle of:

  1. Learning a language
  2. Getting good enough at it to build good applications (by building several bad applications first)
  3. Maintaining my initially bad applications, and rebuilding them with the new best practices for the language when things went wrong
  4. Deciding which language would solve the new business problems coming down the line
  5. Go back to step 1

Ugh.  I bailed out of programming and focused on database administration because ANSI SQL works not only in nearly every current programming language, but even on most database platforms.  I love to learn, but I don’t really love to learn new languages.  SQL Server has paid off for me: it’s still phenomenally popular, and I don’t see it going away in ten years.  The fact that SQL Data Services is switching from an abstract XML-style data storage over to standard table-style data storage is yet more evidence that things are looking good in the DBA arena.

SQL Server is a gateway drug: you can branch off from there into all kinds of hardcore stuff.  However, the more specialized you get, the more of a risk you’re taking that your new specialization will disappear over time, and the tougher you’ll find it to step back down to your old SQL habits.

If you become, say, an SAP BI administrator, you’ll make a lot of money today, but if that platform fades out, the knowledge you’ve gained won’t help you much in another job.  You’ll be forced to find another BI platform and learn a completely different way of doing things.  Going with a massively mainstream app like SAP is a relatively safe bet, but picking a smaller niche vendor is a riskier bet.

Is it easy to get training on the app?

Learning to be an Engineer

Learning to be an Engineer

Wanna specialize in web design?  Open the phone book, call your local community college, and they’ve probably got a series of courses you can take dirt cheap.  If you want to learn faster, you can call a place like New Horizons and do a couple of week-long boot camps.  Presto, you’re a designer.

If you learned the necessary skills to perform your job in less than a month, then so can anybody else.

Goodbye, job security.  As evidence of that fact, I give you 99designs.com: a site where people like me who can’t draw a stick man shell out some money, and graphic designers all over the world compete to build the best logo/site/design/etc.  Yesterday, I forked out about $250 for a logo for a site I’m working on, and already today I’ve got dozens of designs, some freakin’ amazing.  The contest goes on for an entire week, and during the whole time, I’m interacting with each of the designers giving them advice and tips on how I’d like their logos to be improved.

The tougher it is to get training, the more staying power your career will have.  It’s a balancing act, though: a lack of training options might also indicate that the market sees the technology as lacking staying power.

Do you like the team you’re going to work with?

You’re going to be using training wheels for a year while you get up to speed on the new technology.  Is your new team going to be supportive and tolerant of your mistakes and your downtime?  Are you going to enjoy working late nights with ‘em when you’re under a tight deadline?  After years of work as a DBA, you might be accustomed to getting big things done in a short amount of time, but you may not have that luxury with your new application role.

I’ve had to make these decisions more than once.  Sometimes I’ve branched off briefly, like when I learned SAN administration and VMware administration.  As it happens, I’ve only gone down those routes when I felt it would make me a better database administrator, and I’ve come back to my roots each time.  If things were just a little different, I might have stayed with either of those technologies – and I bet they’re both going to be great career choices too.  It’s a fun decision to have to make!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts