Blog

Choosing a Presentation Tool

If your New Year’s resolution is to start presenting at user groups and conferences, congratulations! You’re about to embark on a fulfilling journey that will enrich the lives of thousands of people. It’s a blast. Now let’s put some thought into the tool you’re going to use.

You’re going to be working on these same sessions for years – growing them, expanding them, building them into all-day training. Don’t pick a flash-in-the-pan presentation technology that might not be here in a couple of years. Pick the most reliable technology you can find.

If you’re lucky, you can turn your presentations into an entire company. When that happens, you want your employees to be able to give your presentations to clients when possible. Don’t use a hard-to-understand technology – use the simplest, most straightforward way to get your point across.

If you’re unlucky, your laptop will fail right before you walk onstage. Save all your presentations on a USB drive (mine’s on my keychain) so that when disaster strikes, you can turn to the nearest attendee and say, “Mind if I borrow your laptop to present? You’ve got (technology x) on it, right?” Attendees love to say yes, but you’re not going to have the time or bandwidth to download & install new software, and you shouldn’t be installing anything on a stranger’s laptop anyway. I present about Microsoft technologies, so my audience usually has PowerPoint installed. I’ve presented from attendee laptops more than once.

My presentation gear at work. Photo by Michael Kappel.

Use a technology that allows you to move around the stage while advancing slides. When you just get started presenting, you’ll probably stand behind the podium, gripping it tightly, fearing that you’ll fall over if you let go. After a few sessions, you’ll gain the confidence to move around the stage and use positioning just like actors do. You’ll want a technology that lets you use a remote control. I use the $60 Logitech R800 because it’s also got a built-in countdown timer. I can glance down to see how much time I’ve got left, and it vibrates when I start running out of time.

Use a technology that allows for easily exportable, not-easily-editable content. I export my sessions to PDF and give ‘em away to attendees. If you give away the native format, some attendees will take your session, edit out your name, and re-present it as their own. Of course, if you’re okay with that (and I am for some of my sessions), then take the other tack – use a technology that your attendees will all have, and will be able to quickly edit and re-present.

The export needs to stand on its own, including in a printed version. If your technology relies on animations to get the point across, and the export doesn’t include the animations, it won’t work. Personally, I do all my animations by having a separate slide for each step. That way even if you’re reading along in a printed handout, you can follow what I’m doing. At all-day training sessions, I’m amazed at how many attendees love following along in a printed copy and writing notes on it. Many attendees don’t have laptops/tablets that can last all day on battery for note-taking, and many conferences don’t have power outlets for every attendee during all-day training sessions.

A couple/few years into your journey, you’re going to be so proud of your chosen technology. You’ll have polished that knife to a sharp edge. At that point, it’s time to step back and pick up a new knife. Try a new tool, and start sharpening that too. The more knives you have, the better chef you’ll be, because different knives work better for different foods.

Every knife technology will fail on you. The slides, the demo code, the services, the remote, the laptop, the projector, the microphone, all of it. If you’ve polished multiple knives, you’ll be completely comfortable when one tool fails. I’ll never forget the time when I was presenting slides via projector about disaster recovery for databases, and midway through my session, the entire conference center’s power went out. I grabbed the whiteboard markers, eager to sketch out the concepts for the rest of the session. Those moments make your reputation as a presenter.

Having said all that, here’s the tools I use:

Text editor – I storyboard my sessions in a text editor first, using one line per slide. I write down the things I need to teach during the presentation, then I start adding and arranging, turning it into a story. When I’m done arranging the story, then I decide which portions need demos, diagrams, pictures, etc. If there’s an overwhelming theme, I try to pick just that one method of delivery. For example, if your session is 80% demos, dump the slides and just put comments in the code. Zoom in onscreen to show those comments in large text. If I can get the entire session delivered in just one tool, it makes the session easier for attendees to digest. If I do have to use two tools (like slides & demos, or slides & whiteboard) then I want to minimize the number of transitions back & forth.

Microsoft PowerPoint – I’m not a big fan of it, but it’s the de facto standard in the MS database world. Many MS conferences require me to use a PowerPoint template, plus I have to upload my slides to them for approval, and they’ll make edits and send it back. This just comes down to knowing your audience and picking a tool all the attendees will have. At our company, we’ve started breaking up our slides into mini-decks that our employees can reuse and present to clients. For example, I might have a 1-hour session on tuning databases, and then rip out two 15-minute sections of that to turn into mini-decks. When a client has a question and there’s a minideck to answer it, the employee can whip it out and give the client the best answer possible.

Demo code – think of this as a standalone presentation tool. If you can do a whole session in here (including the title, about-me slide, and resource slide), do it.

Whiteboard – I’ve always casually used this knife to handle attendee Q&A live, but I’m starting to polish it. I’m picking between iPad teaching tools to find one that lets me zoom in & out, record the session, write with a keyboard, etc. I want to get to the point where I can deliver a 1-hour session entirely via iPad whiteboard projected onscreen, and get good feedback from the attendees that it was the best way to learn a particular concept.

Want more tips like this? Check out our past presenting posts.

Saving Session State (video)

Session state frequently ends up on a busy SQL Server. What seemed like a good idea in development turns into a problem in production. While there are valid business reasons for persisting session state to permanent storage; there are equally valid reasons to avoid using SQL Server as the permanent storage. We’ll investigate why session state poses problems for SQL Server and cover an alternate solution that allows for persistent session state. This talk is for developers and DBAs who want a better way to safely track ASP.NET session state.

Links and References

LandsofAmerica.com – Elastic Data Warehouse – A Case Study

Company Overview

LandsofAmerica.com is the largest rural listing service in the Nation. The Network specializes in land for sale, which includes farms, ranches, mountain property, lake houses, river homes, beachfront homes, country homes, and residential homes in smaller towns across the country. These properties have many diverse uses including recreational and agricultural activities like hunting, fishing, camping, backpacking, horseback riding, four wheeling, grazing cattle, gardening, vineyards, cropland, raising horses, and other livestock.

Business Challenges

LandsofAmerica.com (LoA) has been collecting an immense amount of data about visitor usage and search patterns for several years using Microsoft SQL Server as their data storage solution. LoA is happy using SQL Server for OLTP workloads, but with nearly 1 billion rows of data, previous attempts at combining OLTP and analytical queries on the same SQL Server instances caused poor reporting performance. LOA’s goal was to provide this data to several thousand clients so it needed to be optimal for their use.

The real time requirements for data analysis are relaxed: a 24-hour lag between data collection and analysis delivery is acceptable for the business users. Many of the analysis questions took the form of “What is the average price range users are searching for in these 7 counties over the last 6 months?” This presented a problem: how could the business provide a product reporting on this data about user trends without sacrificing core application performance? The issue was LoA’s production SQL Server was unable to answer these questions and still serve OLTP data.

LoA’s team was faced with two choices: they could purchase a second server used solely for analytical querying or they could evaluate other options. During discussions with the company’s development team, I reviewed solutions that would let LoA use their historical data, help the business make better decisions, and move the data processing load outside of SQL Server.

Working closely with LoA, I designed and implemented a solution using Apache Hive hosted in Amazon Elastic MapReduce (EMR) – EMR delivers managed Hadoop and Hive services, low cost storage, and flexible computing resources. LoA was up and running with EMR and Hive in just several weeks.

Use Case Description

LandsofAmerica.com already leverages components of Amazon Web Services in conjunction with Microsoft SQL Server. Extending their usage to Elastic Map Reduce and Hive was an easy addition. Long term data storage is offloaded from SQL Server to Amazon S3 to lower storage costs compared to traditional storage options. S3 stores the detailed source records exported from SQL Server and data stored in S3 is accessed through Hive.

Hive is used as a separate data processing system. User search and activity is aggregated along several key measures through Hive. The aggregated data is stored in S3 before being imported into an on-premise SQL Server for interactive querying and reporting.

Impact

Multi-dimensional search data provides many opportunities for complex analysis. This analysis is typically both CPU and disk intensive – it’s difficult to provide effective indexing techniques for large analytic queries. Through Hive’s ability to conduct large-scale analysis, LandsofAmerica.com is able to uncover trends that would otherwise remain hidden in their data. Other options exist to perform analysis, but carry a significant hardware and licensing cost, like the Microsoft SQL Server Fast Track Data Warehouse. By utilizing commodity cloud computing resources and Apache Hive, LandsofAmerica.com is able to gain insight across their collected data without a significant investment of capital – resources are consumed on-demand and paid for on-demand.

By using Hive as the definitive store of historical data, LandsofAmerica.com is able to reduce their local storage requirements. Older historical data can be removed from Microsoft SQL Server as it is loaded into Hive.

Interested in learning more about Hadoop? Check out our Introduction to Hadoop training class.

FAQ About Our SQL Server Developer Training Course

On May 9-10, we’re doing a 2-day training course for developers who rely on SQL Server.  Here’s the most frequently asked questions we’ve gotten so far.

Q: Will you be doing this training in a different city this year?

Nope – this is it.  In 2014, we’re going to offer a 2-day developer course and a 2-day DBA course, and we’re going to offer them together in the same week.  We’ll run that in two different cities – Chicago, and a city to be determined.

Q: That’s four days. What are you going to do on the fifth day that week?

A free event open to the public sounds like a good idea, doesn’t it?

Q: How many people are coming?

There’s a couple dozen developers signed up so far, plus a couple of DBAs too.  (Hey, everybody wants to figure out how to make queries go faster.)  And of course there’s all four of us, too.

Q: Do I get the scripts and handouts?

Yes, you’ll get some of our favorite troubleshooting and demo scripts – cool stuff that we save for private paid sessions.  We use these exact same scripts during our SQL Critical Care® sessions to help clients make their SQL Servers faster and more reliable.

Q: My schedule is unpredictable. I want to save a spot but I might not make it.

That’s not a question.  Okay, here’s the answer anyway – you can hold a spot in the class with a 20% nonrefundable deposit.  On the order page, click the “Show other payment options” button, and choose to pay via check.  It’ll give you payment information including the address for the check.  Send the 20% now, and send the remaining 80% before the event starts.  (We won’t be accepting checks onsite.)

Q: How will the Q&A work?

In addition to the onsite Q&A, we’ll run two two-hour Q&A webcasts after the training.  This gives you the time to go back to the office, dig into what you learned, and ask followup questions based on your specific environment. We’ll run a morning session and an afternoon session to help more people attend.

Q: What’s with the “Enter promotional code” link on the order page?

It’s where you can put in stuff to get a discount on the price.

Q: What’s one of the discount codes?

I could tell you, but then I’d have to kill you.  But they probably line up with funny things that have happened to us in the past, like when we talked about our experience writing a book or we dressed up in costumes for sessions. And they would probably expire soon, like the end of this week.

Q: How do I get my boss to send me?

Check out our business justification PDF.  It’s written for the pointy-haired guys in mind.

Q: Okay, I want to register. Where do I click?

An Introduction to SQL Server IO for Developers (video)

Developers frequently make a big mistake: they tune out on discussions of storage performance and IO. Don’t fall into this trap! Understanding how storage impacts SQL Server performance helps you tune the right parts of your code and focus on what really matters. In this 30 minute video, Microsoft Certified Master Kendra Little demonstrates why developers need to understand the IO impact on writes and reads in SQL Server. She’ll show you how simple configuration changes can sometimes save hundreds of hours of developer time.

Interested in the tools or references in the video? Check out the links at the bottom of this post.

Links and References

Log Shipping FAQ

I’ve been working with and talking about SQL Server Log Shipping a lot lately! If you haven’t yet, go ahead and watch my webcasts – Part 1: Preparing for Disaster and Part 2: When Disaster Strikes.

One way to ship logs…

I’ve gotten a lot of great questions about log shipping through these webcasts, so I’ve put together an FAQ.

  • What editions of SQL Server is log shipping available in?
    • 2012 – Enterprise, Business Intelligence, Standard, and Web
    • 2008R2 – Datacenter, Enterprise, Standard, Web, and Workgroup
    • 2008 – Enterprise, Standard, Web, and Workgroup
    • 2005 – Enterprise, Standard, and Workgroup
  • Does the secondary need to be licensed?
    • I am not the licensing police, and I am not Microsoft – check with your licensing representative to clarify your exact situation. Generally, you can have one warm standby server. However, the second someone starts using it for reporting, testing, or anything else, you need to license it like any other server.
  • Log shipping is compatible with backup compression. What edition of SQL Server do I need to take advantage of compression?
    • 2012 – Enterprise, Business Intelligence, or Standard
    • 2008R2 – Datacenter, Enterprise, or Standard
    • 2008 – Enterprise
    • 2005 – Not available
  • When log shipping is set up, Agent jobs are created to alert me if a backup, copy, or restore fails. How do I get notified?
    • You need to go into the Agent job, pull up Notifications, and choose your method – email an operator, or write to the event log, for example.
  • Are my logins shipped from the primary to the secondary?
    • No, they are not. You’ll need to set up a separate method to sync the logins.
  • Does this replace, or can it be combined with, our existing daily full and log backups?
    • TL; DR – no.
    • You’ll still want to take regular full and/or differential backups. Log shipping only takes one full backup – at the beginning – and that’s only if you specify that it does so. It can also be initialized from an existing full backup.
    • Taking two log backups in separate jobs will break the log chain, however. If you implement log shipping, it will replace your current transaction log backup job.
  • What’s the difference between the secondary being in “Restoring” vs. “Standby”?
    • Restoring means the database is not accessible. Standby means it is read-only. You make this decision when you set up the log shipping.
    • If the database is in Standby mode, users can query it – except when a log backup is being restored. You need to decide if a restore job will disconnect users, or if the restore is delayed until after the users are disconnected.

What other questions do you have? 

Interested in learning more about backups and recovery?

Grab my Backup & Recovery Step By Step training for more information on why backups and restores are so important, and how to perform them! 

Why I’m Crazy Excited about the SQLIntersection Conference

Next month, I’ll be in Vegas at the SQLIntersection conference.  Yeah yeah yeah, I’m a presenter, but forget that for a second.  I want to give you a tour of the agenda and talk about why I’m so excited to be there as an attendee, too.

Here’s the pre-conference workshops:

Sunday, Kimberly Tripp is talking indexes.  If you work with indexes as a DBA or a senior developer, I can’t think of a better index teacher today than Kimberly.  I’m lucky to have seen most of the great SQL presenters in person around the world, and when it comes to indexes, she’s at the top.  At $450 for one day, this is a screaming deal.

Monday, Kim’s got a workshop on understanding the plan cache.  She’s a complete expert at that too, at the top of the game, and it’s a great session for developers and DBAs who manage internally developed applications.  I’ve got a one-day workshop on SQL Server setup best practices, which seems odd because there’s a million best practices documents out there already, right?  Well, things have changed a lot this year with SSDs, VMware, clusters, TempDB, and AlwaysOn Availability Groups.  If you’re building a new server this year, you need to revisit what you think you know.

On Tuesday, the main conference sessions start:

There’s three track rooms, and in each room, we’ve got a host/MC.  The host will be facilitating discussion in the room, helping with Q&A, and talking shop in between sessions.  This is a completely new way of handling downtime between sessions, and I’m really excited about it.

The speakers are all killer, no filler.  These are people who know their subjects really well, and they do a fantastic job of explaining the topics.  It’s a complete all-star list, and I could write for pages about just this one list of sessions, but I’m going to focus on the people in my track just because I don’t wanna bore you to death.

Joe Sack ran the Microsoft Certified Master of SQL Server program when he worked at Microsoft, and now he’s with SQLskills.  He’s one of the funniest, friendliest guys I know, and if you’ve been reading his incredibly detailed blogs, you’ll already know he’s smart too.  His session topic is so relevant – if you’re not already troubleshooting SQL Server CPU issues, you will be soon.  Companies are getting tighter and tighter with SQL licensing now that we pay by the core, and they want to minimize CPU use as much as possible.

Paul White is the blogger that wows other bloggers.  His blog posts do an incredible job of teaching even the smartest SQL Server professionals, and I can’t wait to see his session on parallel execution.  His classic post on forcing parallel execution plans contains a ton of gems, even just starting with the list of things that cause your queries to go single-threaded.

Andrew Kelly has been distilling the best parts of performance monitoring for years, and he’s at the top of his game here too.  He doesn’t get the blog press that the other speakers get, but believe me, he knows his stuff forwards and backwards.  One of my favorite gauges of presenter quality is the number of notes that attendees take during the session, and I always see people scribbling like crazy while Andrew’s onstage.

The day finishes up with Kevin Kline and Sumeet Bansai talking about how you can use local SSDs to achieve high availability and high performance with AlwaysOn Availability Groups.  I’m not gonna lie – keynotes at conferences usually suck hard, but with Kevin onstage, that’s not going to happen here.  His recent post about doing good technical demos is a clue that he not only takes his work seriously, but he wants to explain it to others well too

And that’s just the first conference day.

Because SQL Intersection is a brand new conference, you might be thinking, “I dunno if I should go.  Maybe it’s just a flash in the pan and they don’t know what they’re doing.”  Au contraire, mon frere – this conference is set up to be very different than others, and in a good way.

April is a great time to be in Vegas – the weather’s wonderful.  Vegas has some of my favorite restaurants in the world, and I love seeing shows like Cirque du Soleil and Penn & Teller.

If you’d like to join me, Jeremiah, Kendra, and the sharpest, friendliest minds in the SQL Server business, register for SQL Intersection.  The Show Package (conference plus one pre-conference or post-conference session) even includes a free Surface RT tablet.  (I’m so not ready to see an army of attendees with Surface RTs.)

Use discount code OZAR and you’ll get $50 off, plus we’ll donate $10 to charity.  We’re looking for charity ideas, so leave a comment with your favorite charity.  We’ll pick our favorite suggestion(s) and report back with how much you gave to charity after the show.

Introduction to Hive Partitioning

An Introduction to Hive’s Partitioning

You’re probably thinking about building a data warehouse (just about every company is if they haven’t already). After reading SQL Server Partitioning: Not the Best Practice for Anyone and Potential Problems with Partitioning you’re wondering why anyone would partition their data: it can be harder to tune queries, indexes take up more space, and SQL Server’s partitioning requires Enterprise Edition on top of that expensive SAN you’re adding to cope with the extra space. Anyone who is looking at implementing table partitioning in SQL Server would do well to take a look at using Hive for their partitioned database.

Partitioning Functions

Setting up partitioning functions in SQL Server is a pain. It’s left up to the implementor to decide if the partition function should use range right or range left and how partitions will be swapped in and out. Writing robust partitioning functions is stressful the first time around. What if we didn’t have to define a partition function? What if the database knew how to handle partitioning for us? Hive does just that.

Rather than leave the table partitioning scheme up to the implementor, Hive makes it easy to specify an automatic partition scheme when the table is created:

CREATE TABLE sales (
    sales_order_id  BIGINT,
    order_amount    FLOAT,
    order_date      STRING,
    due_date        STRING,
    customer_id     BIGINT
)
PARTITIONED BY (country STRING, year INT, month INT, day INT) ;

As we load data it is written to the appropriate partition in the table. There’s no need to create partitions in advance or set up any kind of partition maintenance; Hive does the hard work for us. The hardest part is writing queries. It’s a rough life, eh?

You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the PARTITIONED BY clause. This is very different from SQL Server where the partitioning key must be used everywhere in a partitioned table. In Hive, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=US/year=2012/month=12/day=22. During a read operation, Hive will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set.

This approach means that we save a considerable amount of space on disk and it can be very fast to perform partition elimination. The downside of this approach is that it’s necessary to tell Hive which partition we’re loading in a query. To add data to the partition for the United States on December 22, 2012 we have to write this query:

INSERT INTO sales
PARTITION (country = 'US', year = 2012, month = 12, day = 22)
SELECT  sales_order_id,
        order_amount,
        due_date,
        customer_id,
        cntry,
        yr,
        mo,
        d
FROM    source_view
WHERE   cntry = 'US'
        AND yr = 2012 
        AND mo = 12
        AND d = 22 ;

This is a somewhat inflexible, but effective, approach. Hive makes it difficult to accidentally create tens of thousands of partitions by forcing users to list the specific partition being loaded. This approach is great once you’re using Hive in production but it can be tedious to initially load a large data warehouse when you can only write to one partition at a time. There is a better way.

Automatic Partitioning

With a few quick changes it’s easy to configure Hive to support dynamic partition creation. Just as SQL Server has a SET command to change database options, Hive lets us change settings for a session using the SET command. Changing these settings permanently would require opening a text file and restarting the Hive cluster – it’s not a difficult change, but it’s outside of our scope.

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

Once both of these settings are in place, it’s easy to change our query to dynamically load partitions. Instead of loading partitions one at a time, we can load an entire month or an entire country in one fell swoop:

INSERT INTO sales
PARTITION (country, year, month, day)
SELECT  sales_order_id,
        order_amount,
        due_date,
        customer_id,
        cntry,
        yr,
        mo,
        d
FROM    source_view
WHERE   cntry = 'US' ;

When inserting data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names, but they really do need to be last – there’s no way to wire up Hive differently.

Be careful using dynamic partitions. Hive has some built-in limits on the number of partitions that can be dynamically created as well as limits on the total number of files that can exist within Hive. Creating many partitions at once will create a lot of files and creating a lot of files will use up memory in the Hadoop Name Node. All of these settings can be changed from their defaults, but those defaults exist to prevent a single INSERT from taking down your entire Hive cluster.

If the number of partitions rises above a certain threshold (in part based on the number of underlying files), you can run into out of memory errors when MapReduce jobs are being generated. In these conditions, even simple SELECT statements can fail. Until the underlying problems are fixed, there are a few workarounds:

  1. Tune Java heap size (not for the faint of heart)
  2. Find ways to reduce the number of underlying files. This can happen by manipulating load processes to use a single reducer.
  3. Modify the partition scheme to use fewer partitions. With a rough threshold of 10,000 partitions, most partition schemes can be accommodated. Remember – each partition should contain about 64MB of data, minimum.

What About Partition Swapping?

Much like SQL Server, Hive makes it possible to swap out partitions. Partition swapping is an important feature that makes it easy to change large amounts of data with a minimal impact on database performance. New aggregations can be prepared in the background

How do we perform a partition swap with Hive? A first guess might be to use the INSERT OVERWRITE PARTITION command to replace all data in a partition. This works but it has the downside of deleting all of the data and then re-inserting it. Although Hive has no transaction log, we’ll still have to wait for data to queried and then written to disk. Your second guess might be to load data into a different location, drop the original partition, and then point Hive at the new data like this:

ALTER TABLE sales 
    DROP IF EXISTS PARTITION 
    (country = 'US', year = 2012, month = 12, day = 22) ;

ALTER TABLE sales 
    ADD PARTITION (country = 'US', year = 2012, month = 12, day = 22) 
    LOCATION 'sales/partitions/us/2012/12/22' ;

It’s that easy: we’ve swapped out a partition in Hive and removed the old data in one step. . Truthfully, there’s an even easier way using the SET LOCATION clause of ALTER TABLE.

ALTER TABLE sales
    PARTITION (country = 'US', year = 2012, month = 12, day = 22)
    SET LOCATION = 'sales/partitions/us/2012/12/22' ;

Just like that, the new partition will be used. There’s one downside to this approach – the old data will still exist in Hadoop, only the metadata will be changed. If we want to clear out the old data, it’s going to be necessary to run drop down to HDFS commands and delete the old data out of Hadoop itself.

Is Hive Partitioning Right For You?

If you’re thinking about partitioning a relational database, you should give serious consideration to using partitioned tables in Hive. One of the advantages of Hive is that storage and performance can be scaled horizontally by adding more servers to the cluster – if you need more space, just add a server; if you need more computing power, just add a server. Hive’s approach to data skips some of the necessary costs of partitioning in SQL Server – there’s no Enterprise Edition to purchase, minimal query tuning involved (hint: you should almost always partition your data in Hive), and no expensive SAN to purchase. Keep in mind – you don’t need to use partitioning to get started with Hive, either. You can always partition after the fact.

For better or for worse – if you’re thinking about partitioning a data warehouse in SQL Server, you should think about using Hive instead. 

Interested in learning more about Hive and Hadoop? Check out our Introduction to Hadoop training class or Buy our Introduction to Hadoop now!

Log Shipping Part 2: When Disaster Strikes (video)

Log shipping is a tried and true method in SQL Server for keeping a copy of your data on a secondary server. You have your primary and secondary server set up, and it’s working great. Are you monitoring it? Do you know what to do if you need to fail over to the secondary? Join Jes in this free 30 minute video to find out!

In case you missed it: Log Shipping Part 1: Preparing for Disaster!

Backups: More Important Than You Think

Grab my Backup & Recovery Step By Step training for more information on why backups and restores are so important, and how to perform them!

Databases Five Years from Today

Five years from now, in March 2018, what will be different about databases?  I’m a Microsoft SQL Server guy, so keep that in mind when reading my thoughts.

Looking Back at 2008-2013

The big story here for me was virtualization.  In mid-2008, Microsoft added support for SQL Server running under Hyper-V, and later they added support for VMware as well.  Today, almost all of my clients have at least one virtual SQL Server.  They typically start with development/QA/test instances, then disaster recovery, then production.

For the remaining physical servers, cheaper memory and solid state storage meant you could get insane performance out of commodity hardware.  Microsoft got caught without a good scale-out story, but cheap scale-up prices meant it didn’t matter.  I don’t know whether this was luck or strategy, but good on ‘em either way.  In 2008, I’d never would have believed that 2013 Brent would have a dozen clients with 2TB of memory per server.  Over the last couple of years, I’ve had a few companies think they needed to switch database platforms in order to get the performance they need, but the real answer has always been quick (and relatively cheap) changes to code, configuration, and hardware.

Management tools didn’t really change at all, and that’s a story in and of itself.  Microsoft made several attempts to change how DBAs and developers interact with databases – Visual Studio Data Dude, DACPACs, Utility Control Points.  When SQL 2008R2 came out, I wrote about why these features wouldn’t really have an impact, and today, I’m not surprised that they’re just not getting any serious adoption.  Microsoft threw in the towel on Data Dude and replaced it with SQL Server Data Tools, but didn’t include all of the functionality.  I don’t see a lot of developer confidence in Microsoft’s short attention span here, so tooling hasn’t been a big story.  (We did get an all-new SQL Server Management Studio under the hood, but Microsoft went to great pains to ensure it looked/worked basically the same as the old one, so…yeah.)

Business Intelligence (BI) got a lot of headlines, but here we’ve got another hyper-distraction story.  Microsoft threw so many different tools against the wall that the naming even became a joke – does PowerPivotPointPro use the XVelocitySuperMart in v3.14?  I don’t envy the BI pros who have to keep up with this jumbled mess of licenses, features, and names, but I do think Microsoft is heading in the right direction.  The combination of Excel, SharePoint, columnar storage, and hella fast laptops means Microsoft is in a good spot to give insight to managers.  It just wasn’t a huge revolution in 2008-2013 because the stories and products kept changing.

Looking Forward at 2013-2018

When our servers were physical, they had a built-in expiration date.  The hardware support would grind to an end, and we’d be under pressure to migrate them onto more reliable hardware.  We often included a SQL Server version upgrade in that same project.

Those days are over.  The combination of virtualization and SQL 2005/2008 will leave an interesting legacy challenge for DBAs.  Once your SQL Server is virtualized, it’s really easy to get it off old hardware – just VMotion or LiveMigrate it to another host.  You can do it even while it’s still powered on.  Does that old version need some more horsepower?  Shut it down, add a couple of virtual CPUs and more memory, and power it back on.  What used to be a big ugly maintenance project is now a matter of just a reboot.

This means you’ll be supporting SQL Server 2005 and 2008 forever.

SQL Server 2000 has thankfully (mostly) already been exterminated from serious production work.  Its lack of management tools and mainstream support means it’s painful to troubleshoot, so most of us have already migrated production work to 2005 and 2008.  Support for those newer versions doesn’t end for years, so settle in and get comfy.  Sure, SQL Server 2005 and 2008 have bugs, and they’re missing cool features like backup compression in Standard Edition, but for the most part, they just work.  Businesses will stick with ‘em for most applications because they don’t see enough compelling features in 2012.

In SQL Server 2012 and beyond, we’ve got:

Call me maybe crazy, but I don’t see really widespread adoption for any of these.  To do them right, we’ve gotta make changes to application code.  The changes won’t pay off for the majority of customers, so it’s risk without much reward.  Don’t get me wrong – when you need this kind of speed, then you need it, and the features are fantastic.  I do see widespread adoption coming in 2013-2018 for AlwaysOn, but only for high availability and disaster recovery, not the scale-out reads part.

The virtualization/SQL2005-is-good-enough combination also means we’re not going to see massive, widespread migrations from on-premise SQL Servers to cloud services like SQL Azure.  (We’re also not going to see people call it by its official product name, Microsoft Windows Azure SQL Database.)  Your app would require code changes to make that switch, and code changes are risky.

New development, on the other hand, means you can pick the features and compatibility you want.  In those environments…I still don’t see a lot of widespread SQL Azure adoption coming.  If I’m a developer building a new app from the ground up, I’m going to pick the cheapest, easiest database possible.  These days, that’s probably PostgreSQL.  Like SQL Server, it’s a platform that is available in a lot of different delivery mechanisms – on-premise on bare metal, on-premise virtualized, colo boxes, cloud-based services, etc.  It’s open source, and it has all kinds of cool features we don’t get in SQL Server.  I’m not changing horses mid-stream, but if I was starting again as a developer, I’d go that route.

What Do You See Changing?

When you look at your company’s needs, the job you do, and the tools you use, what do you see coming down the pike?  In 2018, what will be the defining thing that changed your work?

css.php