We have a lot of fun in the pre-shows for our Tech Tuesday Triage webcasts. Technically we start at 11:30AM Central, but we usually go live at around 11AM Central. We talk about tech topics, what we’re up to, and we take questions from the audience.
We’re trying a new experiment just for fun – giving things away. (We like experiments like that.) In the pre-show, we’re going to draw names from the attendee list, and one of the early attendees will get something cool each week. It might be one of our online training videos, a book, our favorite things like T-shirts or computer peripherals, or free entry into one of our training classes. It’ll be different each week, and it’ll be a surprise during the pre-show.
To be eligible, show up at least 15 minutes before a Tech Tuesday Triage webcast starts, and be good at reading the instructions on the screen. Good luck!
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
SQL Server is trying to help you – when you run a query, you see a missing index request. Before you run that CREATE INDEX script in production, consider a few things. How helpful will the index be? Is it similar to an existing index? Why is it recommending three indexes that are very similar? Jes will solve some common missing index mysteries in this 30-minute webcast. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
CXPACKET waits don’t mean you should set MAXDOP = 1. Microsoft Certified Master Brent Ozar will boil it all down and simplify CXPACKET to show you the real problem – and what you should do about it. Register now.
Sponsored by Dell Software – starting at 12:30 PM Central
You need to monitor uptime and performance, but what counters should you focus on? Microsoft Certified Master Brent Ozar will share his favorite counters, explain what they mean, and give you thresholds to watch out for. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
SQL Server’s great for a lot of uses, but not every application design pattern shows off SQL Server’s best side. Join Kendra in this free 30 minute webcast to learn about the top mistakes developers make in architecting solutions with SQL Server. Register now.
Sponsored by Idera – starting at 12:30 PM Central
Your SQL Server’s data lives on the SAN, and you’re not happy about that. All the Perfmon metrics you gather seem to point to a storage problem, but the SAN admin says the storage is sitting around bored, so it must be a SQL Server problem. Brent Ozar feels your pain – years ago, he was a DBA in the same position, so when his SAN admin quit, Brent took the job. In just 90 minutes, he’ll teach you what’s inside the SAN, why multipathing is so important, how to test storage throughput, and why TempDB should be on local SSDs. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
You’ve got a query that’s running too slow, and you need to figure out why. The first step is admitting that you’ve got a problem, but the second step is getting the execution plan. Brent Ozar will show you how to get an estimated execution plan, why you probably shouldn’t, and several different ways to get the actual execution plan. He’ll finish with his favorite resources for learning how to read and improve execution plans. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
Database backups are a critical part of being a database administrator. And, let’s face it, many SQL Server DBAs are also responsible for Oracle systems. In this presentation, Jeremiah Peschka covers the basics of Oracle database backups in a way that will make sense of Oracle backups for SQL Server DBAs. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
Curious how you can give a compelling technical presentation? Join Kendra to learn five important tips on how to select the right topic for your talk, write an effective abstract, construct a coherent presentation, and make it to the podium to give your first presentation. Register now.
Sponsored by Idera – starting at 12:30 PM Central
You’ve got an in-house application that just isn’t fast enough, but why? You’ve tried running Profiler to trace queries, but it takes too much overhead, and you don’t end up with a really good understanding of what’s going on. In just one hour, you’ll learn three easy ways to catch the killer queries lurking in the shadows. Even better, they’re all completely free. Register now.
Triage Tuesday 30 minute Webcast starting at 11:30 AM Central
It’s hard to keep up with what your Report Server is doing, especially if your only tool is Report Manager. Now there’s an easier way, using the newest member of the sp_Blitz family: sp_BlitzRS! In this 30-minute webcast, you’ll learn how sp_BlitzRS can help you stay informed of how well your SSRS installation is running, who gets what subscriptions, which reports would benefit from preprocessing, and more! Register now.
I’ve worked with a lot of features in SQL Server. I know what I think is tricky and more difficult than it looks like at first. But experiences vary, right?
So I asked the Twitterverse, “What are the Top 3 Trickiest Features in SQL Server?” Here’s what I heard back.
SQL Server Replication “wins” the top spot for being mentioned by the most people. Maybe it won because it’s touched the hearts of the most people since it works with Standard Edition. Maybe it’s just been in the product long enough to have tricked lots of us?
@Kendra_Little replication is definitely #1!
— Derik Hammer (@SQLHammer) July 8, 2014
#2: Availability Groups
Coming in second is SQL Server Availability Groups. These may have only been with us since SQL Server 2012, but their complexity has impressed quite a few people already.
@Kendra_Little replication, service broker, AG
— Jason Kyle (@JasonNKyle) July 8, 2014
The number three place goes to a feature I hadn’t thought of myself… database administrators themselves. I laughed out loud when I saw these tweets, but, well, there’s some truth in it. We are a tricksy bunch!
— Joe Fleming (@muad_dba) July 8, 2014
Other top tricky features that came up:
- Service Broker (guessing they worked at MySpace)
- SSIS (oh, the clicking! the clicking!)
- SQL Server Clustering
- Active Directory (ah, Kerberos authentication, you devil you)
- Resource Governor (someone actually used Resource Governor!?!?!)
- Extended Events
- SAN Administrators
- Enterprise Architects
My Personal Top 3
Turns out I’m not so different from the Twitter community. My personal top three trickiest features are: Availability Groups, Replication, and Service Broker. (I’m not really all that into queues in SQL Server, but I do like Event Notifications, which use the Broker framework.)
What are yours?
Let’s say you’ve got a vendor application that creates tables, loads them, queries them, and drops them all the time. You can’t change the code.
The tables involved desperately need an index in order to go faster, and they’re your biggest performance bottleneck.
Enter DDL triggers. Regular DML triggers fire when you insert/update/delete rows in a table, but DDL triggers fire when you create/alter/drop tables or databases. You may remember them from the last time Jeremiah and I played with them – way back in 2008!
The below code fires whenever a table is created, checks to see if the table name is like CustomerExport%, looks for the existence of a CustomerName field (which I probably should have done with XQuery, but I’m lazy), and if it exists, creates an index on the CustomerName field:
CREATE TRIGGER trgTableAdded ON DATABASE AFTER CREATE_TABLE AS BEGIN SET NOCOUNT ON DECLARE @TableName SYSNAME, @SchemaName SYSNAME, @StringToExecute NVARCHAR(4000) SELECT @TableName = EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)','SYSNAME') SELECT @SchemaName = EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)','SYSNAME') IF @TableName LIKE 'CustomerExport%' AND EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = 'CustomerName' AND TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName) BEGIN SET @StringToExecute = 'CREATE INDEX IX_CustomerName ON ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + ' (CustomerName);' EXEC(@StringToExecute); END END GO
Voila – automatic indexing, and a lifetime of regret and careful debugging. See, there’s a boatload of ways this can break:
No datatype checks. In the above example, I’m not checking for datatype. If the vendor changes CustomerName to an NVARCHAR(MAX), my table creation will bomb with an error that CustomerName “is of a type that is invalid for use as a key column in an index.”
Queries can fail. We had some queries fail (particularly SELECT INTOs) because the table schema changed before the rows started going in. Multi-statement transactions may also fail because the whole transaction will get rolled back if the trigger fails.
No duplicate index checks. If the vendor later decides to add an index as part of their code, I could be creating a duplicate index, and now slowing my processes down. Or they might just change the way they do exports, and no longer need the index at all – and I won’t be able to catch that since the tables and indexes appear and disappear so fast. I could actually set up another DDL trigger looking for table drops of these same names, and log index usage information to a table, then periodically analyze that information to see if I needed different indexing.
Lots of risk and work here. Let’s not even go into how incredibly hard it is to catch these issues when the tables and the query plans keep disappearing from memory – it’s not even enough to just grab the slow queries, but you also have to grab the tables and their contents so you can reproduce the queries to figure out the right index to build. Obviously this isn’t a solution everyone should go implement on their vendor apps, particularly mission-critical ones, but it’s an example of how far you can go to tune vendor code when other options have been taken off the table.
As teachers, we’re always working to maximize the skills that students can learn in a given amount of time.
Many students “learn by doing.” But what’s the best way to do this?
You may immediately think of lab exercises as an option. But labs are treacherous to implement in a classroom environment: ten minutes after you’ve begun, the person next to you is suffering from unexpected reboots, you’re not sure why the scripts aren’t working properly for you, and the fellow behind you is somehow already done. Even under the best of circumstances, labs don’t move at the same pace for everyone. By lunchtime, you’re bored and frustrated.
We’ve got a better way.
We’re building two very cool Challenges in our two day Seattle course, Make SQL Apps Go Faster, which will be held just prior to the SQL PASS Summit in 2014.
Challenge 1: Learn to Diagnose a Server
We work with SQL Server in two different ways: we’re consultants and teachers. The two parts of our business enrich one another. Through consulting we constantly expand our experience and understanding of the real challenges people face. In the classroom we’re always refining our methods to help people learn faster. Now we’re bringing these two things even closer together.
In our upcoming two day course in Seattle, “Make SQL Apps Go Faster”, we’re using challenges to help our students learn more and really engage with the class. For the first day, students will get dynamic management view information for a SQL Server with a problematic workload. It will contain information just like you can gather with our free tools in the real world:
- sp_Blitz® ouput
- A snapshot of wait statistics and information from sp_AskBrent®
- Index information from sp_BlitzIndex®
- Top plans in cache from sp_BlitzCache™
Your challenge: write down what you’d do to tackle this environment.
We won’t give away all the answers. We’ll train you on these topics using slightly different examples so that you still get to figure things out on your own! Then at the end of the day we’ll go over the challenge, talk through your solutions, and compare them to our suggestions and experience working with SQL Servers around the world.
Challenge 2: Digging into TSQL, Queries, and Execution Plans
On the second day of the training, you get to specialize on query tuning challenges: you have 3 queries that you need to make faster. You’ll get the TSQL, schema, indexes, and execution plan information for all three queries and your challenge is to figure out how YOU would make these faster.
On this day you’ll watch Kendra, Brent, and Jeremiah dig in deep to tune queries and indexes and delve into more advanced features of execution plans in SQL Server.
At the end of the day you’ll revisit the challenge. Would you make any choices differently? How do your ideas compare to the solutions we came up with? Can you make the queries even faster than we can!?!
Get involved in your training
Consulting and teaching have taught me a huge lesson: people learn better and faster when they’ve got something fun to work on. We have a blast teaching people about SQL Server — join us in Seattle and take the challenge.
It has come to my attention that some people bought training from us last week and they paid full price.
While that’s cool – we really do appreciate your generosity – you can do better.
During the month of July, use the coupon code Launch2015 to save 30% off any of our training, including:
- Make SQL Apps Go Faster – our pre-PASS class in Seattle, an incredible value for 2 days of our training – just $416.50 with the coupon
- How to Read Execution Plans – Jeremiah’s awesome new 5+ hour video class – a killer launch price at $209.30
- SQL Server Performance Troubleshooting 2015 – the expanded 4-day version of this year’s 3-day class
So in the remaining 3 days left to use the coupon code, be smart: shop for our training videos & classes, but don’t pay full price. And use the savings to buy yourself a little something. You’re welcome.
You’ve been doing this database thing for a while, and you’re ready to get serious about it. What’s the next step?
Step 1: Define your specialty in one sentence.
If you say you do everything, you compete with everyone.
You want to be the only one they want. That means you’re:
- Actively sought-after
- Uniquely qualified
- A very high value for short bursts of time
- Respected for your opinion
- Worth more than your competitors (more on that later)
This sounds selfish, but remember – it’s not about you. It’s about your customers (whether they’re internal or external) and your ability to help them.
To pick your specialization, watch my webcast archive How to Get Senior In Your Title. I talk about the different types of DBAs and what they specialize in. Here’s one of the important slides from that session:
Most of you out in the crowd are going to say, “But I do all of these.” Sure you do – today. But we’re talking about where you want to be two years from now if you’re going to really stand out. Not only am I encouraging you to pick one of the columns, but I’m even encouraging you to focus on a specific horizontal row.
Examples of specialties include:
- “This server has to be reliable. We need AlwaysOn Availability Groups. I know just who to call.”
- “We need to manage thousands of servers with easier automation. I know the right person for the job.”
- “Our SQL Servers in VMware are just too slow, and nobody knows whose fault it is. I know who can tell.”
- “We need to offload our full text search, but we have no idea what to use. I know somebody who does.”
Notice that I’m phrasing these in a one-sentence pain point. You need to be known for resolving someone’s pain. This is the funny thing about business and consulting – you get paid the most to relieve urgent pain, not to provide keeping-the-lights-on maintenance.
The first step in your two-year plan is to write the one-sentence pain you want to resolve.
Step 2: Assess your current skills and your target skills.
Thinking about your one-sentence pain point:
- How many times have you relieved that pain?
- How many times have you failed to relieve it?
- When you hit an impasse, who did you escalate it to?
- Have you sketched out a process for diagnosing it? Has anyone?
- Have you documented the process for others to follow?
The more answers you have, and the more confident you are giving those answers aloud to someone else, the better your skills are. What, you expected a true/false multiple choice assessment test? Technology moves so fast that often the questions aren’t even right, let alone the answers.
Here’s a longer version of that assessment that I use for my own skills testing:
- I don’t know where the pain is coming from.
- I can identify the pain in clear terms.
- I know several possible root causes of the pain.
- I can identify exactly which one is at fault here.
- I know several ways to relieve that pain.
- I can identify exactly which one is right here.
- I’ve documented my triage process.
- I’ve hit situations where my process has been wrong, and I’ve learned from it.
From those levels, what level do you think you get paid for?
Surprise – it’s #1. You know plenty of people who are getting paid right now even though they have absolutely no idea where the pain is coming from. However, the higher your level, the easier it is to get paid more. (Don’t think that just because you’re on level 7, you’re making a bazillion dollars – there’s plenty of folks who aren’t great at negotiating their value, either.)
Figure out what level you’re at today, and get a rough idea of what level you want to be at in two years. Now let’s figure out how to get there.
Step 3: Build a 2-year learning plan to make that leap.
Divide the number of levels you want to jump by the amount of time you have. If you want to go up four levels, and you’ve got two years to do it, then you need to progress a level every 6 months.
This sounds really easy, but there’s a problem: you’re probably not repeatedly solving this pain point at your day job. You probably solve it every now and then, but not over and over in a way that helps you refine your technique.
That’s why a 2-year learning plan is really a 2-year sharing plan.
Nothing teaches you something like being forced to teach it to someone else. Heck, even building this blog post (and a presentation on it a few weeks ago) made me flesh out my own philosophies!
But to share, you have to get permission. Start by having this discussion with your manager:
Dear Manager – Recently, we ran into the problem of ____. To get relief, I did ____. Are you happy with that relief? If so, I’d like to talk about what I learned at my local SQL Server user group. I won’t mention our company name. Is that OK? Sincerely, Me
By having that discussion, you’re also making sure the manager is really satisfied with your pain relief efforts and that they saw value in your work. (After all, think of them as one of your first pain relief clients.)
Once you’ve got permission, here’s how you build the 2-year sharing plan: every level jump equals one user group presentation.
- Write the user group presentation agenda in 4-5 bullet points.
- Write a blog post about each bullet point. (The words in your blog post are what you’ll say out loud in your session – think about it as writing your script.)
- Build slides that help tell the story, but the slides are not the story. Don’t transcribe your blog posts word-for-word on the slide.
For example, if you need to hit the level “I know several ways to relieve that pain,” and your specialization is improving the performance of virtual SQL Servers, your user group session could be titled “5 Ways to Make Your Virtual SQL Server Faster.” You’d then write a blog post about each of the 5 ways. Presto, there’s your session resources.
At the end of your 2-year sharing plan, you’ve built up a solid repertoire of material, plus built up your own level of expertise. (You’ve also built up a little bit of a reputation – but more on that later.)
Step 4: Decide what lifestyle works best for you.
How much risk can you tolerate?
- Some. I could miss a couple of paychecks a year and manage my own benefits if I earned more.
- Lots. I’d be willing to go without income for a month or two per year if I could earn lots more.
- None. A very predictable salary and benefits are absolute requirements for me.
This determines whether you should be a full time employee, a long-term contractor that switches positions periodically, or a short-term consultant. In a nutshell, the differences are:
Consultants tell you what to do. They listen to your business problems, come up with solutions, and guide your staff on how to do it. They are typically short-term stints – a couple of days per month at a client, multiple clients at a time.
Contractors do what they’re told. They get a list of required solutions from the business and implement those solutions. They typically work together for long stints, showing up at the same client every day for months at a time, with only one live client relationship.
Full time employees do a mix of this. They come up with ideas, plus implement some of those ideas.
There’s no one answer that’s better for everyone. Heck, I’ve even changed my answer a few times over the last several years! It comes down to finding the right risk/reward balance for your own lifestyle needs, and then bringing the right customers in the door.
Step 5: Decide how you’ll market yourself.
Consultants sell advice, not manual labor, so they have many clients – which means doing a lot of sales.
Contractors sell labor, so they have fewer clients – which means less sales efforts.
Full time employees (FTEs) only have one sales push every few years when they change jobs.
Our company is a good example of the work required to do marketing and sales when you want to scale beyond one or two people:
- We have tens of thousands of regular blog readers
- Thousands of them attend the weekly webcasts
- Hundreds of them email us per month asking for help
- A few dozen turn into serious sales opportunities
- Around a dozen will book consulting engagements with us
This funnel approach demonstrates inbound marketing – using lots of free material to get the word out about your services and invite them to contact you for personal help. It’s a lot of hard work – very hard work.
The other approach is outbound marketing – cold calls to strangers asking if they’ve got your specialized pain point, and then trying to convince them that you’re the right person to bring pain relief. (You can kinda guess how I feel about outbound marketing.) Sure, it sounds slimy – but the takeaway is that it’s hard work, and every bit as hard as doing inbound marketing.
But only one of those options polishes your skills.
Inbound marketing is a rare two-for-one in life – it’s both your 2-year sharing plan, and your 2-year marketing plan. You don’t have much spare time, so you need every bit of it to count. Choose inbound marketing, do your learning and sharing in public, and you’ll write your own ticket.
Presto – You’re two years away from success.
No matter what pain you want to solve, how you want to solve it, or how you want to get paid for it, this simple plan will have you on the road to success. Now get started on writing down that one-sentence pain point!
kCura Relativity is a software product for law firms to find interesting stuff to help their case. To get you up to speed on what it does, here’s some of the posts I’ve written about Relativity in the past:
- Performance Tuning kCura Relativity – explains what the product is for and how DBAs can help make it faster
- Tiering kCura Relativity Databases – how to manage hundreds or thousands of Relativity databases
- Using Partitioning to Make Relativity Faster – when you have a 1TB+ workspace, this technique makes backups and maintenance easier
Today, I’m going to talk about the database mechanics – where data lives, and which tables you need to care about.
Relativity’s EDDS* Databases
The EDDS database stores Relativity’s system-wide configuration. (Just plain EDDS, no numbers or letters after it.) All of the users and processes will hit this database at various points of their work.
The EDDSResource database is like Relativity’s TempDB. I’m a huge, huge fan of this approach – this lets DBAs tune the EDDSResource independently from TempDB.
Each of the EDDS12345 databases (with a bunch of different numbers) is a legal matter, or in Relativity terms, a workspace. Think lawsuit or case, basically. As your lawyers take on new cases, each one will get its own new EDDS database.
You may also have an EDDSPerformance database, which houses kCura Performance Dashboard – a product that gathers performance metrics about your environment.
Distributed Environments: Spreading the Load Between SQL Servers
When you first get started, all of these databases could be on the same server. In the e-discovery business, though, growth happens really fast. Right from the get-go, you probably want to plan to separate them onto multiple SQL Server instances – in Relativity terms, a distributed environment with a couple of different SQL Servers:
We’ve got two SQL Servers, each with a couple of workspaces. (Obviously, Relativity scales WAY bigger than two workspaces on a single server, but I only wanna make these images so big, people.)
The EDDS database – the central config data – only lives on SQL1.
Both servers have their own EDDSResource database, and that’s for temporary scratch space.
But two standalone SQL Servers would be an insanely bad idea because if either server goes down, you’re screwed. Instead, you want to build a failover cluster of SQL Server instances, each instance living on a different physical box:
The databases live on shared storage, so if either box dies, the SQL Server instance can start up on the other box. Of course, this means you’ll have twice as many workspaces living on the same hardware, and that’s not a recipe for high performance, and you can mitigate that by buying a separate passive node. I’m not going into the intricacies of failover clustering here – for that, see our clustering resources.
How the EDDS Databases Affect Cluster Design
The EDDS database will consume performance resources. As your distributed environment grows larger and larger, the load on the EDDS database will increase. If other workspaces are sharing that instance, they may be bullied around by the EDDS database. In very large environments, the EDDS database may grow to the point where it needs its own SQL Server instance – or rather, you just don’t want to put any workspaces on that instance.
Any one workspace is confined to one database server. If you have a massive case going on with tens of terabytes of data, the load isn’t spread across the servers. One database still lives on just one host. While technically, SQL Server 2012 AlwaysOn Availability Groups does let you spread load across multiple servers, and technically Relativity 8.2 supports SQL Server 2012, they don’t support AGs for failover or spreading load yet.
You can move databases between servers to balance load, but it requires some downtime and work on the application side. You can’t just back up a workspace on one server, restore it onto another SQL Server, and take off. kCura has administration tools to help with this task, but it’s up to you to figure out which databases should be on which servers. This is where the concept of tiered workspaces comes in.
How to Performance Tune the EDDS Database
Because this database holds configuration data, and because the queries that hit here aren’t usually user-created, you generally don’t want to touch this database.
A few weeks after deploying a new release of Relativity, I recommend checking SQL Server’s index usage DMVs and plan cache to find out if there are any new problems that pop up. There may be a new query that needs a new index, or a unique way of doing a query in your environment that hasn’t been seen out in the wild yet.
When issues like that pop up, start by opening a support case with kCura rather than making index changes in here. In your support case, include the query text from the plan cache (if applicable) and evidence from your index DMVs to support the index change you want to make. You could actually make index changes inside this database, but generally speaking, that’s not a good first step. Let the Relativity support folks make the call there because any index changes here can dramatically affect all workspaces.
How to Tune the EDDSResource Databases
You don’t. It’s a temporary staging ground. You can skip this guy under most circumstances.
How to Tune the EDDSPerformance Database
Performance Dashboard is a relatively new product – at least compared to Relativity. Early versions of it desperately needed a few indexes, so using tools like sp_BlitzIndex® pay off big time here. I would highly recommend checking missing indexes in this database, but then after a couple of low-hanging-fruit indexes are applied, this database won’t be a performance issue.
Before making changes here, again, start a support case with the changes you’d like to make. In most cases, this is just a simple known issue and easy to fix.
How to Tune the Workspaces (EDDS* Databases)
Ah, now here’s the fun stuff.
Expanding on the process I discussed in my Performance Tuning kCura Relativity post, every database is a new case that goes through a lifecycle:
- The database is created. It’s technically restored from a “template”, a database that the users have set up as their standard starting point. Larger Relativity shops may have several templates to choose from.
- Some documents are loaded. We need to load data into the database server as quickly as possible.
- Users review the documents. They run searches looking for terms and attributes that might indicate evidence that would bolster their case. As they review documents, they make small edits to the metadata fields at the document level, like marking whether the document has been reviewed, who reviewed it, and whether or not it was interesting. We need to audit everything the users do (as well as things the system does, too.)
- We go back to step 2 a few times. More documents get loaded, and users run more searches. This cycle continues for some time, until the amount of documents trickles to a halt, but searches still continue for a while.
- The case becomes dormant. Legal matters can drag on for years, but we may need to keep this database online the whole time. The amount of changes drops dramatically – sometimes with no data changes for months or years – but the database has to be online, and it has to be backed up.
Some of the major tables in each EDDS* database include (and remember, lawyers, I’m simplifying this for the DBAs in the house):
- Document, File – things we loaded into Relativity to search through, like Excel files and Outlook PSTs
- AuditRecord_PrimaryPartition – a log of most Relativity events, like document loads or end user searches (when this is a problem, start by partitioning it out)
- Artifact – think of this like a system table for Relativity that lists every Document, plus other system objects
- CodeArtifact – prior to Relativity v8, this one table stored records for all choices for every Document. (Think multiple-choice fields, like what kind of file type it was.) This had scalability limits because it had multiple times more rows than the Document table, and query plans could get ugly. This was changed in Relativity v8, but I’m mentioning it here in case any of you out there are still on 7.5. (Get on this level.)
Index tuning isn’t necessary on most of these tables because the queries that hit these tables are all managed by Relativity itself. The kCura developers sit around the office trying to figure out how to make those queries go faster, and they come up with some pretty good ideas. (Well, they also surf my site when they’re bored. Did I mention that they’re attractive people?) You shouldn’t need to touch indexes or queries here, other than the same every-new-version check that I described about EDDS.
Except for the Document table.
Oh, boy, the Document table.
Why the Document Table is Fun to Tune
Relativity lets end users write whatever crazy searches they want against the Document table. Wanna find every email with “S” in the email address? You got it. Need to see every PowerPoint created in June last year? Can do. Interested in every file whose extension is MP4 but the data is actually a PowerPoint slide deck and has hidden slides? No problem. You can build these searches in a GUI without understanding anything whatsoever about how SQL Server works, and Relativity will build the T-SQL for you.
To make matters even more fun (HA! see what I did there, lawyers? “matters”, oh, I kill me), the end users can add new fields to the Document table any time they want. If they want to add a new decimal field called LooksSuspicious, it happens with no DBA intervention or change request. Relativity generates the ALTER TABLE commands on the fly, and then users can populate that field and run searches against it.
Index tuning becomes really challenging because we may never be done loading documents. To load documents, we want as few indexes as possible for faster insert speeds. To search for documents, we want lots of indexes so we don’t have to scan the Document table. As DBAs, we’d like to ask the users, “Are you done loading now? Because I can add indexes to make this go fast.” The answer with Relativity may always be, “No, I might load some more tomorrow.”
And what I find the most interesting is that every EDDS* database can be wildly different. Every team that’s involved with every legal matter may have totally different approaches to loading, searching, and managing their documents. That means you have to treat every EDDS* database as its own unique indexing challenge.
At any given time, you might have a hundred EDDS* databases, each for a different legal matter, each with their own Document table. Each has different numbers of fields and indexes in each case.
You can’t conquer each of these databases individually. You simply have to use my tiered workspace approach, define the small databases that will work just fine on their own, and go tackle the largest and most active databases with traditional index performance tuning methods.
We’re really excited to share these, plus give you a discount to celebrate.
First, we’ve added a new video class – How to Read Execution Plans with Jeremiah. You’re comfortable writing queries, but some of them are slow – and you need more ways to tune than just adding indexes. You’ve heard that you should read execution plans, but you don’t know where to start. Learn more about reading execution plans.
Next up, our 2015 in-person training class lineup. Our students told us they loved our 2014 classes, but they wanted more of everything. We’ve lengthened the classes – we took the performance one from 3 days to 4, and added a couple of new 5-day classes:
Advanced Querying and Indexing: 5-day in-person class. Do you need to learn to write the fastest queries possible for SQL Server? In 2015, join us for five days of advanced TSQL query and index optimization. Join us in Chicago or Portland.
SQL Performance Troubleshooting: 4-day in-person class. You need to speed up a database server that you don’t fully understand – but that’s about to change in four days of learning and fun in Chicago, Denver, and Portland.
Senior DBA Class of 2015: 5-day in-person class. You’re a SQL Server DBA who is ready to advance to the next level in your career but aren’t sure how to fully master your environment and drive the right architectural changes. That’s about to change in one week of learning and adventure in Chicago and Denver.
Some of our students (especially the consultants) told us they wanted to really go in-depth and take two weeks of classes back-to-back. To make that easier, we lined up our classes and put them in some of our favorite cities, at the best times to spend a weekend between classes:
Denver in February (hey, it’s ski season!):
- February 2-6 – Senior DBA Class of 2015
- February 9-12 – SQL Server Performance Troubleshooting
Chicago in May (best time to visit our fair city):
- May 4-8 – Advanced Querying and Indexing
- May 11-14 – SQL Server Performance Troubleshooting
Portland in August (Oregon summers are beautiful):
- Aug 3-7 – Advanced Querying and Indexing
- Aug 10-13 – SQL Server Performance Troubleshooting
Chicago in September (not too hot, not too cold):
- Sept 14-18 – Senior DBA Class of 2015
- Sept 21-24 – SQL Server Performance Troubleshooting
You know how to design indexes, but you’re not sure how good your skills really are. In this quiz-packed session you’ll get a chance to test your skills! Kendra Little will walk you through a set of index design challenges. You’ll have time to answer each problem on your own, then find out whether the SQL Server engine thinks you made the right choice. At the end of the session you’ll get to tally your score (but you can keep it a secret if you like).
Take the quiz while you watch the video. If you’d like to dive straight into the questions, skip to 2:30.
Compare your answers to the group
Our webcast had 416 attendees and 80% of people answered each question. To find out the right answers (and why) watch the video above, but here’s the stats on how webcast viewers answered each question.
This is NOT the answer key– don’t assume the majority of people got every question right! This just lets you compare what you thought to what others thought.
Q1) A NonClustered Index is…
Q2) What will this query probably due given this table definition?
Q3) What is most likely the biggest problem with this table?
Q4) INCLUDED columns are…
Q5) A unique constraint is…
A only) 11%
A and B) 11%
A, B, and C) 16%
A, B, C, and D) 14%
A and C) 41%
A, C and D) 8%
A and D) 4%
(There were some other combos, but they all got low percentages)
Q6) Which one of these statements is ALWAYS true?
Q7) How many indexes were written to?
Q8) Why wouldn’t SQL Server Ask for an Index?
When you weren’t looking, your databases went and grew up. Now your backup window has grown so large that you’re about ready to open it and jump.
Time to make a choice.
The Native Way: Tuning SQL Server Backups
You can theoretically pull this off by using a combination of tactics:
Back up as infrequently as the business will allow. Run your full backups once a week (or if you want to go wild and crazy, once per month) and differential backups periodically. As Jes explains in her backup and recovery class, differentials back up the data pages that have changed since the last full backup. When disaster strikes, you only need to recover the most recent full backup, the most recent differential backup, and all of the log backups after the differential. This can shave a lot of time off your restores – but only if you minimize the number of changed pages in the database. This means…
Change the database as little as possible. We can’t change what the users do, but we can change what we DBAs do. Stop doing daily index defrag/rebuild jobs – you’re just changing pages in the database, which means instantly inflating the size of your differential backups. In a scenario like this, you can only do index maintenance when you’re sure it is the only way to solve a performance problem, and it absolutely has to be your last resort.
Tune the data file read speeds. You need to read the pages off disk as fast as possible to back them up. Use tools like CrystalDiskMark and SQLIO to measure how fast you’re going, and then tune your storage to go faster.
Compress the data as much as possible. It’s not just about minimizing the size of your backup file – it’s about minimizing the amount of data we have to write to disk. Bonus points for using index compression inside the database so that it’s compressed once, not recompressed every time we do a backup, although that doesn’t really help with off-row data.
Tune the backup target write speeds. If you’re using a small pool of SATA drives in RAID 5 as a backup target, it’s probably not going to be able to keep up with a giant volume of streaming writes, even if those writes are compressed versions of the database. Problems will get even worse if multiple servers are backing up to the same RAID 5 pool simultaneously because the writes will turn random, which is the worst case scenario for RAID 5.
Tune the bottleneck between the reads and the writes. If you’re backing up over the network, use 10Gb Ethernet to avoid the pains of trying to push a lot of data through a tiny 1Gb straw.
Tune your backup software settings. If you’re using native backups, start with using multiple files and the built-in options, and graph your results. Third party compression products usually offer all kinds of knobs to tweak – you’ll need to use that same level of graphing diligence.
Whew. I got tired just typing all that stuff. And if you’re lucky, at the end of it, your backups will complete in an hour or two, but the server might be darned near unusable while you’re beating the daylights out of it. Then the fun balancing act starts, trying to figure out the right point where the system is still usable but the backups complete quickly.
Or Just Cheat with SAN Snapshots.
In my Virtualization, SAN, and Hardware video class, I explain how SAN snapshots are able to take a full database backup of any size in just a couple of seconds.
See, while it’s technically a backup, I don’t really consider it a backup until it’s off the primary storage device. Your SAN storage, expensive as it was, is still vulnerable to failure, and you need to get that data out as quickly as possible. The good news is that you can move that data out without dragging it through the SQL Server’s storage connections, CPU, and network ports. You can simply (simply?) hook a virtual tape library, actual tape library, or another storage device to the same storage network, and copy directly between the two.
Your data read speeds may degrade during that process, but it’s up to you – if you want to architect your storage so that it’s fast enough to do these full backups without any noticeable performance to the end user, it’s possible by inserting enough quarters in the front.
You still have to pay attention, though, because your backup process will look like this:
- Daily full backups via SAN snapshots – all writes are quiesced for 1-10 seconds during this time
- Conventional log backups every X minutes – where X is dictated by the business
If you push a big index rebuild job through, you can still bloat the transaction log, and your log backups may take longer than X minutes to complete. This is where our RPO/RTO planning worksheet is so important – if your RPO is 1 minute, you simply may not be able to do index rebuild jobs.
SAN snapshots have one other drawback: depending on your storage make/model, snapshots may not be included in your licensing pricing. You may have to spend a lot more (typically tens of thousands of dollars) to unlock the feature. Ask your SAN admin if snapshots are right for your wallet.
Building Terabyte Servers Means Starting with Backups First
When I’m building a SQL Server to hold multiple terabytes of databases, this backup question is the very first one we have to address – even before we talk about the speed of end user queries.
Otherwise, we could end up designing a server with all local solid state drives, which is very inexpensive and satisfies end user performance goals – but we can’t back the data up fast enough.