Blog

We’re really excited to share these, plus give you a discount to celebrate.

First, we’ve added a new video class - How to Read Execution Plans with Jeremiah. You’re comfortable writing queries, but some of them are slow – and you need more ways to tune than just adding indexes. You’ve heard that you should read execution plans, but you don’t know where to start. Learn more about reading execution plans.

Next up, our 2015 in-person training class lineup. Our students told us they loved our 2014 classes, but they wanted more of everything. We’ve lengthened the classes – we took the performance one from 3 days to 4, and added a couple of new 5-day classes:

Advanced-Querying-And-Indexing

Advanced Querying and Indexing: 5-day in-person class. Do you need to learn to write the fastest queries possible for SQL Server? In 2015, join us for five days of advanced TSQL query and index optimization. Join us in Chicago or Portland.

SQL Performance Troubleshooting: 4-day in-person class. You need to speed up a database server that you don’t fully understand – but that’s about to change in four days of learning and fun in Chicago, Denver, and Portland.

Senior-DBA-Class-of-2015

Senior DBA Class of 2015: 5-day in-person class. You’re a SQL Server DBA who is ready to advance to the next level in your career but aren’t sure how to fully master your environment and drive the right architectural changes. That’s about to change in one week of learning and adventure in Chicago and Denver.

Some of our students (especially the consultants) told us they wanted to really go in-depth and take two weeks of classes back-to-back. To make that easier, we lined up our classes and put them in some of our favorite cities, at the best times to spend a weekend between classes:

Denver in February (hey, it’s ski season!):

  • February 2-6 – Senior DBA Class of 2015
  • February 9-12 – SQL Server Performance Troubleshooting

Chicago in May (best time to visit our fair city):

  • May 4-8 – Advanced Querying and Indexing
  • May 11-14 – SQL Server Performance Troubleshooting

Portland in August (Oregon summers are beautiful):

  • Aug 3-7 – Advanced Querying and Indexing
  • Aug 10-13 – SQL Server Performance Troubleshooting

Chicago in September (not too hot, not too cold):

  • Sept 14-18 – Senior DBA Class of 2015
  • Sept 21-24 – SQL Server Performance Troubleshooting

Check out our full catalog of in-person training events and online training videos - and all of our videos & classes, not just the new classes, are 30% off with coupon Launch2015 in July. Come join us!

0 comments ↑ Back to top

I’ve worked with a lot of features in SQL Server. I know what I think is tricky and more difficult than it looks like at first. But experiences vary, right?

So I asked the Twitterverse, “What are the Top 3 Trickiest Features in SQL Server?”  Here’s what I heard back.

#1: Replication

SQL Server Replication “wins” the top spot for being mentioned by the most people. Maybe it won because it’s touched the hearts of the most people since it works with Standard Edition. Maybe it’s just been in the product long enough to have tricked lots of us?

#2: Availability Groups

Coming in second is SQL Server Availability Groups. These may have only been with us since SQL Server 2012, but their complexity has impressed quite a few people already.

#3: DBAs

The number three place goes to a feature I hadn’t thought of myself… database administrators themselves. I laughed out loud when I saw these tweets, but, well, there’s some truth in it. We are a tricksy bunch!

Honorable Mentions

Other top tricky features that came up:

  • Service Broker (guessing they worked at MySpace)
  • Security
  • SSIS (oh, the clicking! the clicking!)
  • SQL Server Clustering
  • PIVOT
  • XML
  • CLR
  • Active Directory (ah, Kerberos authentication, you devil you)
  • Resource Governor (someone actually used Resource Governor!?!?!)
  • Extended Events
  • SAN Administrators
  • Enterprise Architects

My Personal Top 3

Turns out I’m not so different from the Twitter community. My personal top three trickiest features are: Availability Groups, Replication, and Service Broker. (I’m not really all that into queues in SQL Server, but I do like Event Notifications, which use the Broker framework.)

What are yours?

0 comments ↑ Back to top

Let’s say you’ve got a vendor application that creates tables, loads them, queries them, and drops them all the time. You can’t change the code.

The tables involved desperately need an index in order to go faster, and they’re your biggest performance bottleneck.

Enter DDL triggers. Regular DML triggers fire when you insert/update/delete rows in a table, but DDL triggers fire when you create/alter/drop tables or databases. You may remember them from the last time Jeremiah and I played with them – way back in 2008!

The below code fires whenever a table is created, checks to see if the table name is like CustomerExport%, looks for the existence of a CustomerName field (which I probably should have done with XQuery, but I’m lazy), and if it exists, creates an index on the CustomerName field:

CREATE TRIGGER trgTableAdded ON DATABASE AFTER CREATE_TABLE AS
BEGIN
    SET NOCOUNT ON
    DECLARE @TableName SYSNAME, @SchemaName SYSNAME, @StringToExecute NVARCHAR(4000)
    SELECT @TableName = EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]','SYSNAME')
    SELECT @SchemaName = EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)[1]','SYSNAME')
    IF @TableName LIKE 'CustomerExport%' 
        AND EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
            WHERE COLUMN_NAME = 'CustomerName' AND TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName)
    BEGIN
        SET @StringToExecute = 'CREATE INDEX IX_CustomerName ON ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + ' (CustomerName);'
        EXEC(@StringToExecute);
    END
END
GO
And when Kendra found out what I'd done, she broke my arm.

And when Kendra found out what I’d done, she broke my arm.

Voila – automatic indexing, and a lifetime of regret and careful debugging. See, there’s a boatload of ways this can break:

No datatype checks. In the above example, I’m not checking for datatype. If the vendor changes CustomerName to an NVARCHAR(MAX), my table creation will bomb with an error that CustomerName “is of a type that is invalid for use as a key column in an index.”

Queries can fail. We had some queries fail (particularly SELECT INTOs) because the table schema changed before the rows started going in. Multi-statement transactions may also fail because the whole transaction will get rolled back if the trigger fails.

No duplicate index checks. If the vendor later decides to add an index as part of their code, I could be creating a duplicate index, and now slowing my processes down. Or they might just change the way they do exports, and no longer need the index at all – and I won’t be able to catch that since the tables and indexes appear and disappear so fast. I could actually set up another DDL trigger looking for table drops of these same names, and log index usage information to a table, then periodically analyze that information to see if I needed different indexing.

Lots of risk and work here. Let’s not even go into how incredibly hard it is to catch these issues when the tables and the query plans keep disappearing from memory – it’s not even enough to just grab the slow queries, but you also have to grab the tables and their contents so you can reproduce the queries to figure out the right index to build. Obviously this isn’t a solution everyone should go implement on their vendor apps, particularly mission-critical ones, but it’s an example of how far you can go to tune vendor code when other options have been taken off the table.

3 comments ↑ Back to top
A team challenge in our Chicago classes in 2013

A team challenge in our training in Chicago, 2013

Act quickly. During the month of July, use the coupon code Launch2015 to save 30%, which makes this course only $416.50.

As teachers, we’re always working to maximize the skills that students can learn in a given amount of time.

Many students “learn by doing.” But what’s the best way to do this?

You may immediately think of lab exercises as an option. But labs are treacherous to implement in a classroom environment: ten minutes after you’ve begun, the person next to you is suffering from unexpected reboots, you’re not sure why the scripts aren’t working properly for you, and the fellow behind you is somehow already done. Even under the best of circumstances, labs don’t move at the same pace for everyone. By lunchtime, you’re bored and frustrated.

We’ve got a better way.

We’re building two very cool Challenges in our two day Seattle course, Make SQL Apps Go Faster, which will be held just prior to the SQL PASS Summit in 2014.

Challenge 1: Learn to Diagnose a Server

We work with SQL Server in two different ways: we’re consultants and teachers. The two parts of our business enrich one another. Through consulting we constantly expand our experience and understanding of the real challenges people face. In the classroom we’re always refining our methods to help people learn faster. Now we’re bringing these two things even closer together.

In our upcoming two day course in Seattle, “Make SQL Apps Go Faster”, we’re using challenges to help our students learn more and really engage with the class. For the first day, students will get dynamic management view information for a SQL Server with a problematic workload. It will contain information just like you can gather with our free tools in the real world:

Your challenge: write down what you’d do to tackle this environment.

We won’t give away all the answers. We’ll train you on these topics using slightly different examples so that you still get to figure things out on your own! Then at the end of the day we’ll go over the challenge, talk through your solutions, and compare them to our suggestions and experience working with SQL Servers around the world.

Challenge 2: Digging into TSQL, Queries, and Execution Plans

On the second day of the training, you get to specialize on query tuning challenges: you have 3 queries that you need to make faster. You’ll get the TSQL, schema, indexes, and execution plan information for all three queries and your challenge is to figure out how YOU would make these faster.

Always get involved in your learning and question everything. (Isn't that what my face says here?)

Always get involved in your learning and question everything. (Isn’t that what my face says here?)

On this day you’ll watch Kendra, Brent, and Jeremiah dig in deep to tune queries and indexes and delve into more advanced features of execution plans in SQL Server.

At the end of the day you’ll revisit the challenge. Would you make any choices differently? How do your ideas compare to the solutions we came up with? Can you make the queries even faster than we can!?!

Get involved in your training

Consulting and teaching have taught me a huge lesson: people learn better and faster when they’ve got something fun to work on. We have a blast teaching people about SQL Server — join us in Seattle and take the challenge.

1 comment ↑ Back to top

Oracle 12.1.0.2 was recently released with some fanfare. One of the most talked about features is Oracle Database In-Memory, but there’s more to this release than just columnar storage.

Big Investment in Data Warehousing

Almost all of the new features reflect an increased focus on data warehousing. This is similar to recent SQL Server releases and, indeed, most mainstream database releases. Different vendors are betting on BI and Oracle’s most recent release is no exception.

Full Database Caching

Caching data in memory can avoid costly disk-based I/O. By default, databases will cache frequently accessed data. There’s a setting in Oracle called the small table threshold – scans of tables larger than the_small_table_threshold will use a portion of memory that should age out faster. This idea (in SQL Server it’s called page disfavoring) exists to prevent large table scans from clearing out database server memory.

What if you want those tables to stick around in memory? Modern servers can hold huge amounts of memory and with Oracle RAC, you can scale your buffer pool across multiple servers.

The full database caching feature changes this behavior. Once full database caching is enabled, Oracle will cache all data in memory as the data is read.

It’s important to note the guidance around this new feature – it should only be used if the logical size of the database is no bigger than 80% of available memory. If you’re using Oracle RAC, DBAs need to be doubly careful to make sure that the workload is effectively partitioned across all RAC nodes. Read more about the features and gotchas in the Oracle Database Performance Tuning Guide.

Who needs disk when you have memory?

Who needs disk when you have memory?

Big Table Caching

Speaking of larger tables, the new patch includes a feature called automatic big table caching. If you have more data than you have memory, this feature seems a lot safer than full database caching.

Rather than attempt to cache all data in memory, big table caching sets aside a portion of the buffer pool for caching data. The DBA sets up a caching target (DB_BIG_TABLE_CACHE_PERCENT_TARGET) and up to that percentage of memory will be used for caching really big tables.

Like the full database caching feature, this works with single instances of Oracle or Oracle RAC. There are, of course, considerations for using the feature. The considerations are documented in the VLDB and Partitioning Guide.

In-Memory Column Store.

I’ve already covered Oracle’s In-Memory Column Store. There are some additional features that have been announced that make the In-Memory Column Store feature a bit more interesting.

A new data flow operator has been added to Oracle: the VECTOR GROUP BY. This new operator enables efficient querying for In-Memory Column Store tables. Although no new SQL operators have been added to Oracle for theVECTOR GROUP BY, it’s possible to use query hints to control optimizer behavior. Six new hints have been added to push the optimizer in the right direction.

Attribute Clustering

In the Oracle world, tables are almost always heaps. Clustering is done only rarely and in special circumstances.

Attribute clustering lets the data warehouse designer specify a table order based on one or more columns of the table. So far, this sounds like something that you could already do. It is also possible to use join attribute clustering to cluster on columns connected via a foreign key relationship. The data warehouse designer, or even the DBA, can modify table structure to cluster a fact table based on dimension hierarchy.

An additional feature called interleaved ordering uses some pretty advanced storage mechanisms to get better I/O out of an attribute clustered table. Interleaved clustering makes it more efficient to perform queries where you may not know the search predicates, or where search predicates may vary over multiple columns (sounds like a dimensional model to me).

There are a few upsides to attribute clustered tables – they require fewer (or no) indexes, they may require less I/O, and they improve data compression. As always, consult the Attribute Clustering documentation for all the details and fine print.

A map of all the people who need data.

A map of all the people who need data.

Zone Maps

Zone maps are new to Oracle 12.1.0.2 – they provide an additional level of metadata about the data stored in a table. Instead of operating at the row level, like an index, a zone map operates at the level of a range of blocks. In a way, an Oracle zone map is a lot like a SQL Server Columnstore dictionary – it allows I/O pruning to happen at a much coarser granularity than the row level.

When zone maps are combined with features like attribute clustering or partitioning, the database can take advantage of metadata at many levels within the zone map and perform incredibly efficient I/O pruning. Data can be pruned at the zone, partition, or subpartition level, and when attribute clustering is in use the pruning can happen at the dimension level too.

Zone maps are going to have the most benefit for Oracle databases with a heavy data warehousing workload. Big scans with common predicates, especially with low cardinality predicates, will use zone maps to eliminate a lot of I/O. Users taking advantage of attribute clustering and partitioning will also see improvements from this feature. You can read more in the Zone maps section of the data warehousing guide.

More than Data Warehousing

This is a wealth of features. Seriously, that’s a crazy investment in data warehousing for any release of a product, much less a point release.

Every Developer’s Dream: JSON Support

Everybody hates XML. It’s cumbersome to query and people love making fun of it. Many databases have been adding support for JSON and Oracle is no exception.

Oracle’s JSON implementation supports search indexes for general querying (give me the third element of the array in property “sandwiches”). Multiple index types exist for the JSON data type, giving developers fine grained control over how they interact with the data stored in the database. Oracle full text search also supports full-text queries using a new json_textcontains operator. If you want to jump down the JSON rabbit hole, make sure you check out the JSON documentation and pay particular attention to the sections about indexing – there are a number of examples of what will and won’t work.

Astute readers will notice that with the addition of JSON to Oracle 12c, SQL Server is now the only major database platform that doesn’t support JSON as a native datatype.

Cross Container Querying

Oracle 12c brought multi-tenant databases, sometimes referred to as contained databases or containers. If you wanted to query across databases, you’d typically have to go through some shenanigans. The CONTAINERS clause makes it easier for developers to write queries that can go across all containers, or even some containers.

Basically, instead of querying directly from one table, a developer would put the name of the table in a CONTAINERS function:

SELECT CON_NAME, *
FROM   CONTAINERS(employees) ;

And if you wanted to query only specific containers:

SELECT CON_NAME, *
FROM   CONTAINERS(employees)
WHERE  CON_ID IN (1,2,3,4) ;

While not useful for all systems, this does allow DBAs to perform administrative or back office queries across multiple database containers. The gotcha? All databases need to have the same schema.

Summary

Oracle 12.1.0.2 has a number of new features, some of which I didn’t mention since they relate entirely to managing the contained database features. If you want to learn more, check out the new feature guide - it’s full of links to the Oracle documentation on each new feature.

0 comments ↑ Back to top
Uncle Brent Wants You To Use Launch2015 to Save 30%

Uncle Brent Wants You
To Use Launch2015 to Save 30%

It has come to my attention that some people bought training from us last week and they paid full price.

While that’s cool – we really do appreciate your generosity – you can do better.

During the month of July, use the coupon code Launch2015 to save 30% off any of our training, including:

So in the remaining 3 days left to use the coupon code, be smart: shop for our training videos & classes, but don’t pay full price. And use the savings to buy yourself a little something. You’re welcome.

0 comments ↑ Back to top

You’ve been doing this database thing for a while, and you’re ready to get serious about it. What’s the next step?

Step 1: Define your specialty in one sentence.

If you say you do everything, you compete with everyone.

You want to be the only one they want. That means you’re:

  • Actively sought-after
  • Uniquely qualified
  • A very high value for short bursts of time
  • Respected for your opinion
  • Worth more than your competitors (more on that later)

This sounds selfish, but remember – it’s not about you. It’s about your customers (whether they’re internal or external) and your ability to help them.

To pick your specialization, watch my webcast archive How to Get Senior In Your Title. I talk about the different types of DBAs and what they specialize in. Here’s one of the important slides from that session:

Common types of DBAs

Common types of DBAs

Most of you out in the crowd are going to say, “But I do all of these.” Sure you do – today. But we’re talking about where you want to be two years from now if you’re going to really stand out. Not only am I encouraging you to pick one of the columns, but I’m even encouraging you to focus on a specific horizontal row.

Examples of specialties include:

  • “This server has to be reliable. We need AlwaysOn Availability Groups. I know just who to call.”
  • “We need to manage thousands of servers with easier automation. I know the right person for the job.”
  • “Our SQL Servers in VMware are just too slow, and nobody knows whose fault it is. I know who can tell.”
  • “We need to offload our full text search, but we have no idea what to use. I know somebody who does.”

Notice that I’m phrasing these in a one-sentence pain point. You need to be known for resolving someone’s pain. This is the funny thing about business and consulting – you get paid the most to relieve urgent pain, not to provide keeping-the-lights-on maintenance.

The first step in your two-year plan is to write the one-sentence pain you want to resolve.

Step 2: Assess your current skills and your target skills.

Thinking about your one-sentence pain point:

  1. How many times have you relieved that pain?
  2. How many times have you failed to relieve it?
  3. When you hit an impasse, who did you escalate it to?
  4. Have you sketched out a process for diagnosing it? Has anyone?
  5. Have you documented the process for others to follow?

The more answers you have, and the more confident you are giving those answers aloud to someone else, the better your skills are. What, you expected a true/false multiple choice assessment test? Technology moves so fast that often the questions aren’t even right, let alone the answers.

Here’s a longer version of that assessment that I use for my own skills testing:

  1. I don’t know where the pain is coming from.
  2. I can identify the pain in clear terms.
  3. I know several possible root causes of the pain.
  4. I can identify exactly which one is at fault here.
  5. I know several ways to relieve that pain.
  6. I can identify exactly which one is right here.
  7. I’ve documented my triage process.
  8. I’ve hit situations where my process has been wrong, and I’ve learned from it.

From those levels, what level do you think you get paid for?

Surprise – it’s #1. You know plenty of people who are getting paid right now even though they have absolutely no idea where the pain is coming from. However, the higher your level, the easier it is to get paid more. (Don’t think that just because you’re on level 7, you’re making a bazillion dollars – there’s plenty of folks who aren’t great at negotiating their value, either.)

Figure out what level you’re at today, and get a rough idea of what level you want to be at in two years. Now let’s figure out how to get there.

Step 3: Build a 2-year learning plan to make that leap.

Divide the number of levels you want to jump by the amount of time you have. If you want to go up four levels, and you’ve got two years to do it, then you need to progress a level every 6 months.

This sounds really easy, but there’s a problem: you’re probably not repeatedly solving this pain point at your day job. You probably solve it every now and then, but not over and over in a way that helps you refine your technique.

That’s why a 2-year learning plan is really a 2-year sharing plan.

Nothing teaches you something like being forced to teach it to someone else. Heck, even building this blog post (and a presentation on it a few weeks ago) made me flesh out my own philosophies!

But to share, you have to get permission. Start by having this discussion with your manager:

Dear Manager – Recently, we ran into the problem of ____. To get relief, I did ____. Are you happy with that relief? If so, I’d like to talk about what I learned at my local SQL Server user group. I won’t mention our company name. Is that OK? Sincerely, Me

By having that discussion, you’re also making sure the manager is really satisfied with your pain relief efforts and that they saw value in your work. (After all, think of them as one of your first pain relief clients.)

Once you’ve got permission, here’s how you build the 2-year sharing plan: every level jump equals one user group presentation.

  1. Write the user group presentation agenda in 4-5 bullet points.
  2. Write a blog post about each bullet point. (The words in your blog post are what you’ll say out loud in your session – think about it as writing your script.)
  3. Build slides that help tell the story, but the slides are not the story. Don’t transcribe your blog posts word-for-word on the slide.

For example, if you need to hit the level “I know several ways to relieve that pain,” and your specialization is improving the performance of virtual SQL Servers, your user group session could be titled “5 Ways to Make Your Virtual SQL Server Faster.” You’d then write a blog post about each of the 5 ways. Presto, there’s your session resources.

At the end of your 2-year sharing plan, you’ve built up a solid repertoire of material, plus built up your own level of expertise. (You’ve also built up a little bit of a reputation – but more on that later.)

Step 4: Decide what lifestyle works best for you.

How much risk can you tolerate?

  • Some. I could miss a couple of paychecks a year and manage my own benefits if I earned more.
  • Lots. I’d be willing to go without income for a month or two per year if I could earn lots more.
  • None. A very predictable salary and benefits are absolute requirements for me.

This determines whether you should be a full time employee, a long-term contractor that switches positions periodically, or a short-term consultant. In a nutshell, the differences are:

Consultants tell you what to do. They listen to your business problems, come up with solutions, and guide your staff on how to do it. They are typically short-term stints – a couple of days per month at a client, multiple clients at a time.

Contractors do what they’re told. They get a list of required solutions from the business and implement those solutions. They typically work together for long stints, showing up at the same client every day for months at a time, with only one live client relationship.

Full time employees do a mix of this. They come up with ideas, plus implement some of those ideas.

There’s no one answer that’s better for everyone. Heck, I’ve even changed my answer a few times over the last several years! It comes down to finding the right risk/reward balance for your own lifestyle needs, and then bringing the right customers in the door.

Step 5: Decide how you’ll market yourself.

Consultants sell advice, not manual labor, so they have many clients – which means doing a lot of sales.

Contractors sell labor, so they have fewer clients – which means less sales efforts.

Full time employees (FTEs) only have one sales push every few years when they change jobs.

Our company is a good example of the work required to do marketing and sales when you want to scale beyond one or two people:

  • We have tens of thousands of regular blog readers
  • Thousands of them attend the weekly webcasts
  • Hundreds of them email us per month asking for help
  • A few dozen turn into serious sales opportunities
  • Around a dozen will book consulting engagements with us

This funnel approach demonstrates inbound marketing – using lots of free material to get the word out about your services and invite them to contact you for personal help. It’s a lot of hard work – very hard work.

The other approach is outbound marketing – cold calls to strangers asking if they’ve got your specialized pain point, and then trying to convince them that you’re the right person to bring pain relief. (You can kinda guess how I feel about outbound marketing.) Sure, it sounds slimy – but the takeaway is that it’s hard work, and every bit as hard as doing inbound marketing.

But only one of those options polishes your skills.

Inbound marketing is a rare two-for-one in life – it’s both your 2-year sharing plan, and your 2-year marketing plan. You don’t have much spare time, so you need every bit of it to count. Choose inbound marketing, do your learning and sharing in public, and you’ll write your own ticket.

Presto – You’re two years away from success.

No matter what pain you want to solve, how you want to solve it, or how you want to get paid for it, this simple plan will have you on the road to success. Now get started on writing down that one-sentence pain point!

13 comments ↑ Back to top

kCura Relativity is a software product for law firms to find interesting stuff to help their case. To get you up to speed on what it does, here’s some of the posts I’ve written about Relativity in the past:

  1. Performance Tuning kCura Relativity – explains what the product is for and how DBAs can help make it faster
  2. Tiering kCura Relativity Databases – how to manage hundreds or thousands of Relativity databases
  3. Using Partitioning to Make Relativity Faster – when you have a 1TB+ workspace, this technique makes backups and maintenance easier

Today, I’m going to talk about the database mechanics – where data lives, and which tables you need to care about.

Relativity’s EDDS* Databases

The EDDS database stores Relativity’s system-wide configuration. (Just plain EDDS, no numbers or letters after it.) All of the users and processes will hit this database at various points of their work.

The EDDSResource database is like Relativity’s TempDB. I’m a huge, huge fan of this approach – this lets DBAs tune the EDDSResource independently from TempDB.

Each of the EDDS12345 databases (with a bunch of different numbers) is a legal matter, or in Relativity terms, a workspace. Think lawsuit or case, basically. As your lawyers take on new cases, each one will get its own new EDDS database.

You may also have an EDDSPerformance database, which houses kCura Performance Dashboard - a product that gathers performance metrics about your environment.

Distributed Environments: Spreading the Load Between SQL Servers

When you first get started, all of these databases could be on the same server. In the e-discovery business, though, growth happens really fast. Right from the get-go, you probably want to plan to separate them onto multiple SQL Server instances – in Relativity terms, a distributed environment with a couple of different SQL Servers:

The starting point for a Relativity distributed setup

The starting point for a Relativity distributed setup

We’ve got two SQL Servers, each with a couple of workspaces. (Obviously, Relativity scales WAY bigger than two workspaces on a single server, but I only wanna make these images so big, people.)

The EDDS database – the central config data – only lives on SQL1.

Both servers have their own EDDSResource database, and that’s for temporary scratch space.

But two standalone SQL Servers would be an insanely bad idea because if either server goes down, you’re screwed. Instead, you want to build a failover cluster of SQL Server instances, each instance living on a different physical box:

Two SQL Server instances for Relativity on a failover cluster

Two SQL Server instances for Relativity on a failover cluster

The databases live on shared storage, so if either box dies, the SQL Server instance can start up on the other box. Of course, this means you’ll have twice as many workspaces living on the same hardware, and that’s not a recipe for high performance, and you can mitigate that by buying a separate passive node. I’m not going into the intricacies of failover clustering here – for that, see our clustering resources.

How the EDDS Databases Affect Cluster Design

The EDDS database will consume performance resources. As your distributed environment grows larger and larger, the load on the EDDS database will increase. If other workspaces are sharing that instance, they may be bullied around by the EDDS database. In very large environments, the EDDS database may grow to the point where it needs its own SQL Server instance – or rather, you just don’t want to put any workspaces on that instance.

Any one workspace is confined to one database server. If you have a massive case going on with tens of terabytes of data, the load isn’t spread across the servers. One database still lives on just one host. While technically, SQL Server 2012 AlwaysOn Availability Groups does let you spread load across multiple servers, and technically Relativity 8.2 supports SQL Server 2012, they don’t support AGs for failover or spreading load yet.

You can move databases between servers to balance load, but it requires some downtime and work on the application side. You can’t just back up a workspace on one server, restore it onto another SQL Server, and take off. kCura has administration tools to help with this task, but it’s up to you to figure out which databases should be on which servers. This is where the concept of tiered workspaces comes in.

How to Performance Tune the EDDS Database

Because this database holds configuration data, and because the queries that hit here aren’t usually user-created, you generally don’t want to touch this database.

A few weeks after deploying a new release of Relativity, I recommend checking SQL Server’s index usage DMVs and plan cache to find out if there are any new problems that pop up. There may be a new query that needs a new index, or a unique way of doing a query in your environment that hasn’t been seen out in the wild yet.

When issues like that pop up, start by opening a support case with kCura rather than making index changes in here. In your support case, include the query text from the plan cache (if applicable) and evidence from your index DMVs to support the index change you want to make. You could actually make index changes inside this database, but generally speaking, that’s not a good first step. Let the Relativity support folks make the call there because any index changes here can dramatically affect all workspaces.

How to Tune the EDDSResource Databases

You don’t. It’s a temporary staging ground. You can skip this guy under most circumstances.

How to Tune the EDDSPerformance Database

Performance Dashboard is a relatively new product – at least compared to Relativity. Early versions of it desperately needed a few indexes, so using tools like sp_BlitzIndex® pay off big time here. I would highly recommend checking missing indexes in this database, but then after a couple of low-hanging-fruit indexes are applied, this database won’t be a performance issue.

Before making changes here, again, start a support case with the changes you’d like to make. In most cases, this is just a simple known issue and easy to fix.

How to Tune the Workspaces (EDDS* Databases)

Ah, now here’s the fun stuff.

Expanding on the process I discussed in my Performance Tuning kCura Relativity post, every database is a new case that goes through a lifecycle:

  1. The database is created. It’s technically restored from a “template”, a database that the users have set up as their standard starting point. Larger Relativity shops may have several templates to choose from.
  2. Some documents are loaded. We need to load data into the database server as quickly as possible.
  3. Users review the documents. They run searches looking for terms and attributes that might indicate evidence that would bolster their case. As they review documents, they make small edits to the metadata fields at the document level, like marking whether the document has been reviewed, who reviewed it, and whether or not it was interesting. We need to audit everything the users do (as well as things the system does, too.)
  4. We go back to step 2 a few times. More documents get loaded, and users run more searches. This cycle continues for some time, until the amount of documents trickles to a halt, but searches still continue for a while.
  5. The case becomes dormant. Legal matters can drag on for years, but we may need to keep this database online the whole time. The amount of changes drops dramatically – sometimes with no data changes for months or years – but the database has to be online, and it has to be backed up.

Some of the major tables in each EDDS* database include (and remember, lawyers, I’m simplifying this for the DBAs in the house):

  • Document, File – things we loaded into Relativity to search through, like Excel files and Outlook PSTs
  • AuditRecord_PrimaryPartition – a log of most Relativity events, like document loads or end user searches (when this is a problem, start by partitioning it out)
  • Artifact – think of this like a system table for Relativity that lists every Document, plus other system objects
  • CodeArtifact – prior to Relativity v8, this one table stored records for all choices for every Document. (Think multiple-choice fields, like what kind of file type it was.) This had scalability limits because it had multiple times more rows than the Document table, and query plans could get ugly. This was changed in Relativity v8, but I’m mentioning it here in case any of you out there are still on 7.5. (Get on this level.)
Advice for Relativity 7.5 shops

Advice for Relativity 7.5 shops

Index tuning isn’t necessary on most of these tables because the queries that hit these tables are all managed by Relativity itself. The kCura developers sit around the office trying to figure out how to make those queries go faster, and they come up with some pretty good ideas. (Well, they also surf my site when they’re bored. Did I mention that they’re attractive people?) You shouldn’t need to touch indexes or queries here, other than the same every-new-version check that I described about EDDS.

Except for the Document table.

Oh, boy, the Document table.

Why the Document Table is Fun to Tune

Relativity lets end users write whatever crazy searches they want against the Document table. Wanna find every email with “S” in the email address? You got it. Need to see every PowerPoint created in June last year? Can do. Interested in every file whose extension is MP4 but the data is actually a PowerPoint slide deck and has hidden slides? No problem. You can build these searches in a GUI without understanding anything whatsoever about how SQL Server works, and Relativity will build the T-SQL for you.

To make matters even more fun (HA! see what I did there, lawyers? “matters”, oh, I kill me), the end users can add new fields to the Document table any time they want. If they want to add a new decimal field called LooksSuspicious, it happens with no DBA intervention or change request. Relativity generates the ALTER TABLE commands on the fly, and then users can populate that field and run searches against it.

Index tuning becomes really challenging because we may never be done loading documents. To load documents, we want as few indexes as possible for faster insert speeds. To search for documents, we want lots of indexes so we don’t have to scan the Document table. As DBAs, we’d like to ask the users, “Are you done loading now? Because I can add indexes to make this go fast.” The answer with Relativity may always be, “No, I might load some more tomorrow.”

And what I find the most interesting is that every EDDS* database can be wildly different. Every team that’s involved with every legal matter may have totally different approaches to loading, searching, and managing their documents. That means you have to treat every EDDS* database as its own unique indexing challenge.

At any given time, you might have a hundred EDDS* databases, each for a different legal matter, each with their own Document table. Each has different numbers of fields and indexes in each case.

You can’t conquer each of these databases individually. You simply have to use my tiered workspace approach, define the small databases that will work just fine on their own, and go tackle the largest and most active databases with traditional index performance tuning methods.

2 comments ↑ Back to top

You know how to design indexes, but you’re not sure how good your skills really are. In this quiz-packed session you’ll get a chance to test your skills! Kendra Little will walk you through a set of index design challenges. You’ll have time to answer each problem on your own, then find out whether the SQL Server engine thinks you made the right choice. At the end of the session you’ll get to tally your score (but you can keep it a secret if you like).

Take the quiz while you watch the video. If you’d like to dive straight into the questions, skip to 2:30.

Compare your answers to the group

Our webcast had 416 attendees and 80% of people answered each question. To find out the right answers (and why) watch the video above, but here’s the stats on how webcast viewers answered each question.

This is NOT the answer key– don’t assume the majority of people got every question right! This just lets you compare what you thought to what others thought.

Q1) A NonClustered Index is…

A) 4.39%
B) 19.26%
C) 73.65%
D) 2.70%

Q2) What will this query probably due given this table definition?

A) 13.92%
B) 70.55%
C) 9.06%
D) 6.47%

Q3) What is most likely the biggest problem with this table?

A) 4.35%
B) 69.93%
C) 17.39%
D) 8.33%

Q4) INCLUDED columns are…

A) 14.33%
B) 77.88%
C) 5.30%
D) 2.49%

Q5) A unique constraint is…

A only) 11%
A and B) 11%
A, B, and C) 16%
A, B, C, and D) 14%
A and C) 41%
A, C and D) 8%
A and D) 4%

(There were some other combos, but they all got low percentages)

Q6) Which one of these statements is ALWAYS true?

A) 16.77%
B) 36.13%
C) 14.52%
D) 32.58%

Q7) How many indexes were written to?

0) 2.33%
1) 17.00%
2) 46.67%
3) 34.00%

Q8) Why wouldn’t SQL Server Ask for an Index?

A) 22.73%
B) 45.45%
C) 26.22%
D) 5.59%

 

6 comments ↑ Back to top

Execution plans are fantastic – they make it easier to visualize how a query is running and find tuning opportunities. SQL Server DBAs starting with Oracle may be a bit frustrated by the lack of a clear way to get to execution plans. This covers three ways to get to execution plans in Oracle – estimated plans, actual plans from an active session, and actual plans from the plan cache.

Getting Estimated Oracle Execution Plans

One way, likely more familiar to MySQL and PostgreSQL DBAs, is to use the EXPLAIN PLAN FOR command to get an execution plan out of Oracle. To do this, a DBA would simply add the command EXPLAIN PLAN FOR to the beginning of a SQL statement:

EXPLAIN PLAN FOR
SELECT o.order_id, v.product_name
FROM   oe.orders o,
       (  SELECT order_id, product_name
          FROM   oe.order_items o, oe.product_information p
          WHERE  p.product_id = o.product_id
                 AND list_price < 50
                 AND min_price < 40  ) v
WHERE  o.order_id = v.order_id;

Oracle will immediately finish and return Explained. SQL Server DBAs will likely exclaim “WTF, Oracle?” and stomp off in frustration in search of a GUI that will let them connect to Oracle and show them a graphical execution plan.

Oracle gives DBAs the ability to save off plans into tables, and that’s exactly what the EXPLAIN PLAN FOR syntax does – it saves the plan into a table named plan_table. You can view the execution plan using the following command:

SELECT plan_table_output
FROM   table(dbms_xplan.display());

This produces lovely text output:


Plan hash value: 1906736282

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS         |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN|                     |     4 |   128 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |     1 |    28 |     5   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |                     |   105 |   420 |     1   (0)| 00:00:01 |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |   105 |   420 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |     1 |     8 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("MIN_PRICE"<40 AND "LIST_PRICE"<50)
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")

19 rows selected.

There are three parts here - the plan hash value, the row source tree, and predicate information. Plans can be read similarly to SQL Server execution plans; just start at the deepest level and work your way out. In this case, start at Id 6 and work your way up to 3, then 2, then 1, then 0. An asterisk next to the Id means that there’s more detailed information about that step in the Predicate Information section.

You can get more detailed output from Oracle by supplying parameters to the DBMS_XPLAN.DISPLAY() function:

select plan_table_output
from table(dbms_xplan.display('plan_table', null, 'all'));

More documentation on DBMS_XPLAN can be found in the Oracle documentation.

Getting Actual Execution Plans From Oracle

Run the query first:

SELECT o.order_id, v.product_name
FROM   oe.orders o,
       (  SELECT order_id, product_name
          FROM   oe.order_items o, oe.product_information p
          WHERE  p.product_id = o.product_id
          AND    list_price < 50
          AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id;

And then follow up by executing:

select * from table(dbms_xplan.display_cursor());

You should see the following execution plan:


SQL_ID  gpbdw897v4mbg, child number 0
-------------------------------------
SELECT o.order_id, v.product_name FROM   oe.orders o,        (  SELECT
order_id, product_name           FROM   oe.order_items o,
oe.product_information p           WHERE  p.product_id = o.product_id
        AND    list_price < 50           AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |       |       |     6 (100)|          |
|   1 |  NESTED LOOPS         |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN|                     |     4 |   128 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |     1 |    28 |     5   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |                     |   105 |   420 |     1   (0)| 00:00:01 |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |   105 |   420 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |     1 |     8 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")


28 rows selected.

Hey, that looks exactly the same! The default parameters for dbms_xplan.display_cursor won’t show us all the juicy details that we want. Adding in FORMAT => '+ALLSTATS'will do exactly what we want. Before you go running that execution plan query again, you have two choices: you can either execute the full select query again, or you can copy down the SQL_ID and use it in the dbms_xplan.display_cursor function like this:

select *
from table(dbms_xplan.display_cursor(sql_id => 'gpbdw897v4mbg',
                                     format => '+ALLSTATS'));

The output gets a bit more detailed and ends up looking like this:


SQL_ID  5p20zwvbgb93j, child number 0
-------------------------------------
    SELECT o.order_id, v.product_name     FROM   oe.orders o,
 (  SELECT order_id, product_name               FROM   oe.order_items
o, oe.product_information p               WHERE  p.product_id =
o.product_id               AND    list_price < 50               AND
min_price < 40  ) v     WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |      1 |        |    269 |00:00:00.08 |    1337 |       |       |          |
|   1 |  NESTED LOOPS         |                     |      1 |      1 |    269 |00:00:00.08 |    1337 |       |       |          |
|   2 |   MERGE JOIN CARTESIAN|                     |      1 |      4 |   9135 |00:00:00.03 |      34 |       |       |          |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |      1 |      1 |     87 |00:00:00.01 |      33 |       |       |          |
|   4 |    BUFFER SORT        |                     |     87 |    105 |   9135 |00:00:00.01 |       1 |  4096 |  4096 |     1/0/0|
|   5 |     INDEX FULL SCAN   | ORDER_PK            |      1 |    105 |    105 |00:00:00.01 |       1 |       |       |          |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |   9135 |      1 |    269 |00:00:00.02 |    1303 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")


28 rows selected.

Now we can see the estimated (E-Rows) and actual (A-Rows) returned from each operation as well as timing and memory information. Parallel queries will return even more information (take a look at the example DBMS_XPLAN commands and output to see what that would look like.

Viewing The Oracle Execution Plan Cache

You might hear Oracle people talk about this as the library cache or the cursor cache. In the SQL Server world, we’d call it the execution plan cache. Whatever you want to call it, it’s contained in a system view: V$SQL_PLAN.

We could find our query in V$SQL_PLAN and get the execution plan doing something like this:

select plan_table_output
from   v$sql s,
       table(dbms_xplan.display_cursor(s.sql_id, s.child_number,
'basic')) t
where  s.sql_text like 'SELECT o.order_id, v.product_name%';

This query will pull back basic information about the different statements that have been executed that match our search condition. That output looks different because I used thebasic parameter for plan display instead of the default of typical, but here’s an example:


EXPLAINED SQL STATEMENT:
------------------------
SELECT o.order_id, v.product_name FROM   oe.orders o,        (  SELECT
order_id, product_name       FROM   oe.order_items o,
oe.product_information p       WHERE  p.product_id = o.product_id
    AND    list_price < 50       AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

-----------------------------------------------------
| Id  | Operation             | Name                |
-----------------------------------------------------
|   0 | SELECT STATEMENT      |                     |
|   1 |  NESTED LOOPS         |                     |
|   2 |   MERGE JOIN CARTESIAN|                     |
|   3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |
|   4 |    BUFFER SORT        |                     |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |
|   6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |
-----------------------------------------------------

If you want to get the full plan from any of these statements, you can either include the sql_id in the query to find all plans for a statement, or you can use the plan hash value to get retrieve it from the v$sql_plan view and use it in the dbms_xplan.display_cursor function.

In the query to retrieve matching plans, our query uses Oracle's v$sql_plan system view to find matching cursors (everything's a cursor, yay) in memory. Once we have a list of matching cursors in memory, dbms_xplan.display_cursor is used to pull back information about the actual plan from the cursor cache.

Wrapping Up

There you have it - three ways to view execution plans in Oracle:

  1. Using EXPLAIN PLAN FOR to get an estimated execution plan.
  2. Using dbms_xplan.display_cursor() after running a query to get the actual execution plan.
  3. Querying v$sql_plan to find plans in the plan cache.

SQL Server experts and novices alike can get the kind of query details from Oracle that they’re used to seeing from SQL Server.

2 comments ↑ Back to top
css.php