Blog

  1. Home
  2. Blog
  3. Page 7

Wanna Speak at the PASS Summit This Year? Move Fast.

#SQLPass
3 Comments

The PASS Summit is November 15-18 in Seattle this year. This time around, it’s a hybrid conference, with both speakers and attendees in a mix between in-person and remote.

I’m really excited to get back to in-person events. I recently attended SQLBits in London, and I actually got emotional at the sight of so many community people that I’ve known over the years. It was just amazing to see people in person again.

Here’s a PDF explaining how to submit a session for the PASS Summit, and here’s the Call for Speakers.

The deadline is March 31. I know. It’s coming up quickly. But block off some time in your calendar to think about what you wanna talk about, and knock out at least one abstract.

The Summit’s new owner, Redgate Software, is making some changes that I really love:

  • For the first time, session speakers are getting paid for their work (not just a free conference ticket, but $250 in person, $125 online.)
  • Pre-conference workshop speakers now get $200 per attendee (way, way up from past events.)
  • Pre-conference workshops will be conducted in-person, but attendees can be either in-person or online. The sessions will be recorded as they happen, and available to attendees for one week.
  • You (yes, you) can submit a session to be Community Keynote.

And Redgate’s being really transparent about what kinds of sessions they’re looking for, and how many will be accepted from in-person and remote speakers:

The call for speakers closes March 31st, and you’ll find out if yours made the cut on June 15-17. Hope to see you in Seattle!


SQLDay Poland Workshop: How I Use the First Responder Kit

Watch and learn

I’m presenting remotely at the upcoming SQLDay Poland on 9-11 May!

I’m teaching a one-day workshop, How I Use the First Responder Kit.

You’ve downloaded sp_Blitz, sp_BlitzIndex, and sp_BlitzCache from the First Responder Kit, and you’ve run them a few times. You know there’s a ton of advice in there, but…it’s kinda overwhelming. Where should you even start? What should you tackle first? Are there parameters that would help diagnose your specific situation?

In this one-day course, I’ll walk you through running them on your production server, and we’ll interpret your results together. Space is limited in these particular live classes because I’ll be giving you advice on your specific server.

Here’s how the day will go:

  • You’ll run sp_Blitz to do a server-wide health check
  • You’ll run sp_BlitzFirst to look at your server’s wait stats and storage throughput since it started up
  • You’ll run sp_BlitzCache to find the queries causing your top wait types, using my favorite parameters for deeper diagnosis and trending
  • You’ll run sp_BlitzIndex to diagnose the most urgent indexing issues specifically related to your top wait types
  • You’ll write up a report to hand off to the rest of your team about what you learned about your server today, and what actions everyone needs to take in order to see a performance boost

Throughout the day, as you have questions about the data you’re seeing, you can ask questions and get advice from me (and the rest of the students!) You’ll learn not just from your own questions, but also from the data that the other students are sharing about challenges they’re facing, too.

This course is 100% demos: the only slides are the introductions at the start of the day, and the recap at the end of the day. The rest of the time, we’ll be working in SQL Server Management Studio. Roll up your sleeves and join me!

Register here.


Live Office Hours Next Week

Company News
1 Comment

Office Hours - ask me anything.Tuesday & Wednesday afternoon at 2PM Pacific, 5PM Eastern, I’ll be streaming Office Hours live in my Twitch channel. I’ll post the recordings on my YouTube channel.

Post your questions and upvote the ones you’d like to see here.

Here are calendar reminders for Tuesday and for Wednesday.

I’ve been doing these disconnected – recording them whenever it’s convenient for me from the beach or wherever – but Tuesday & Wednesday, I’ll be in San Diego, so I figured I’d do ’em from my home office there. See you in the stream!


[Video] Office Hours: Ask Me Anything About SQL Server

Videos
0

Ask me questions at https://pollgab.com/room/brento and upvote the ones you’d like to see covered. I can pretty well guarantee that your favorite question this time is going to be the last one I answered.

Here’s what we discussed today:

  • 00:00 Introductions
  • 00:57 Sean W: New app with data model TBD; what is the thought process one should go thru and questions to be asked to determine if relational or nonrelational DB should be used? Thanks!!
  • 02:42 MyFriedAsk: Hi Brent, My friend server has 1 database and getting and several applications connecting to that database. For some reason they get sql timeouts in the applications on Tuesdays. How can we figure out what causing SQL timeouts?
  • 03:20 Goose: What is the danger zone rate for inserts per second in SQL Server? What’s the highest you have seen?
  • 04:54 i_use_lowercase_for_select: Hi Brent, my friend is looking at your alerting script and wants to implement it. However they also want their system (SSMS/PRTG) monitoring system to get notified using SNMP traps. Do you have any advise on this? PS looking forward to the upcoming live sessions!
  • 05:51 Sohil: Do physical reads vs logical reads ever matter for performance tuning?
  • 07:12 Robert S.: Hi Brent, do you have a Goldilocks methodology for determining the optimal number of inserts to perform in a transaction so that transaction log writes are optimized (i.e. not too few, not too many).
  • 08:23 RenegadeLarsen: What is best Brent…CPUs, Memory or Fast disks.
  • 09:00 Kendra Little: What is your favorite isolation level, and why is it NOLOCK?
  • 09:31 Sqlsucks: Hello Brent, In today’s world where things are changing so fast, what would you recommend to learn new to someone who has 5-6 years of experience with SQL server? I would assume cloud is the future, but please explain where should I focus? Thanks
  • 12:17 Greg Dodd: The leading column of an index is very selective (only around 5 rows for over 95% of the rows) but the stats in the histogram shows 100’s or 1000’s of rows expected. If I rebuild stats, it gets a lot better if I do a fullscan. Thoughts on how to make it better without fullscan?
  • 14:51 Leroy Jenkins: Hi Brent, what’s your opinion on some peep’s advise of “Do not install SSMS on the SQL Server / Do not remote desktop into the SQL Server” ?
  • 16:54 Megurine Luka: Hi Brent, I can run the same query in SSMS 18.x and Sentry One Plan explorer. Sentry one will show residual io bang – warnings that I don’t see in SSMS. Have you experienced this as well?
  • 18:32 ProductionDBA: Hi Brent, I’m very exited for the new SQL2022. I wanted to test it but I’m still new to the SQL world. I was wondering have you ever participated in an early adoption program of any of the old versions and what is expected to do and to test, what feedback you need to provide.
  • 20:59 VISHAKHAPATNAM: Hi Brent, what options should be used to create a new index in the fastest time possible? Use SORT_IN_TEMPDB?
  • 21:58 marcus-the-german: Hi Brent, could you tell us anything about the persisted log buffer or Non-Volatile Memory SQL Server Tail Of Log Caching on NVDIMM? Is it worth thinking about?
  • 24:30 LifelessDBA: Hi Brent, what would be the reason(s) behind creation of statistics on a column in read-only secondary database, while primary database already has a statistics on that very same column?
  • 25:50 Sean C: My (dumb) friend inherited an Azure instance with auto-create idx enabled/drop disabled. The sys created idx have non-standard names and the idx keys,etc are different in each DB. He’s can’t think of a way to implement DEATH. Any thoughts on how to manage/script this out? Thanks!
  • 28:50 Maurice Moss: Do you see same or different reoccurring issues in Azure SQL that you see in self managed / installed SQL Server?
  • 31:38 del-a-nooch: Hi Brent. Long time listener, first time caller. My backup admins are asking if they can exclude DB data files from the daily server backups. I’m taking regular SQL backups (to two different data centres), can you foresee any problems with this approach? Thanks, del-a-nooch.
  • 33:40 Kendra Little: Which breed of dog is SQL Server?

[Video] Office Hours: Morning in Mexico Edition

Videos
2 Comments

Got SQL Server questions? Post ’em at https://pollgab.com/room/brento and upvote the questions you’d like to see me discuss.

Here’s what we covered this morning:

  • 00:00 Introductions
  • 00:45 SQL Simian: Everybody loves free advice, but we want war stories. I have heard your 174 indexes story twice now during recorded classes. Got any more? What are the weirdest things you have seen implemented? OR: What are the most exotic places you have visited when you ‘parachute in’?
  • 02:49 CaveOfWonders: MS SQL Synonyms are a new concept to my friend and I (insert Brent’s “A Whole New World” line). We were wondering what your thoughts are on synonyms and if you had any pointers to keep in mind when writing or tuning queries that use them.
  • 04:27 Andrew: For some of our actual query plans in SSMS, the sum of the node costs add up to over 100% (116% in one case). Is this a bug in SQL Server or SSMS? Running SSMS 18.10, SQL Server 2014 SP2
  • 05:40 Rick: Brent, what is your experience with letting the SAN compress large (multi TB) SQL backups instead of doing the compression with SQL server?
  • 07:58 SwedeDBA: Hi! Our application has one database per customer. Currently, we run about 20-30 customers per instance, and three instances per physical server. Would running all this on one instance per server be more beneficial instead, to allow the server to balance resources better?
  • 08:58 Mahir: Brent, besides price, what criteria should we use when evaluating SQL monitoring software?
  • 09:39 Gabriele: If my friend had to monitor 500 remote clients postgres and 500 remote sql server what tool would you recommend?
  • 13:17 Calvin: Hi Brent, when should should you disable a non-clustered index vs drop it like it’s hot? Does disabling free up any space?
  • 14:55 Neil: My company runs 15 merges every minute to sync an online store with the ERP database. What’s the alternative? Can SSIS be used for this? What is SSIS? What is love?
  • 16:38 Drew: SQL is randomly throwing 824 corruption errors on tempDB on a brand new SAN. Vendor says there is no issue with the SAN, what are our next steps?
  • 18:21 Mihir: Hi Brent, We are running into an unusual tempdb contentions in multiple environments where despite having tempdb files equal to number of logical cores (28 in our case) we still get contentions on PFS and GAM page. What else can we do to resolve this?
  • 20:05 Jim: What should we configure MAXDOP to? Our 2019 physical server has 64 total cores including hyperthreaded cores.
  • 20:50 Count Chocula: Hi Brent, is there an easy way to clear the server’s index stats (used by sp_blitzindex) without rebooting the server? Don’t want to failover after making a bunch of index changes just to get updated index stats.
  • 22:14 Neil: Should we get a bigger, faster drive on Azure VM, or build a software raid in Windows with a bunch of smaller, slower ones ? Is there a way to trial azure hardware to test this? We tried the big drive with data, log, tempdb together and it was fast, havent tested raid
  • 23:21 Chris Thomas: Are there levels of Enterprise licensing for a VM Guest vs VM Host? We currently only license our VM, but have sufficient licenses to cover the host (we’re the only VM on the host). I’d like to try HT, but our Infra Manager says that we would need a different license to do this
  • 24:01 Margaret: Hi Brent — We have some Always on clusters with readonly secondaries SQL Server 2016. We run an index reorg daily and quite often that causes a lag in the secondary. Would changing the synchronization to async while the reorg runs, then changing it back help with this?
  • 26:00 Recap and plugging

Announcing Quest Database Training Days 2022

Wanna learn for free?

On April 13, I’m doing a session called My Cloud Database Costs are Too High.

You’ve moved to the cloud, and now management is asking questions about your monthly bills. They’ve asked you to cut costs. Now what? In this webcast, Brent Ozar will examine the top expense-causing configurations and workloads for Azure SQL DB and Microsoft SQL Server. You’ll learn how to tell when you’ve got too much hardware, and where you can safely cut costs without sacrificing end user performance experiences.

There are also sessions through April & May from Janis Griffin, Neil Buchwalter, and Pinal Dave, and we finish up with a panel Ask the Experts session. All of the sessions will be recorded, and if you register, you’ll have access to the recordings when they’re available.

Register now for free.


New Lower Prices on My Mastering Class Recordings and End-of-Year Live Class Season Pass

Company News
8 Comments

Brent with two Bloody MarysIf you’ve been watching my Instagram feed or any of my recent Office Hours videos, you’ve noticed that I’ve moved down to Cabo San Lucas, Mexico. I’m absolutely loving it, having a great time watching whales, sitting on the beach, and sipping margaritas.

It turns out that the cost of living in Mexico is cheaper than San Diego. A lot cheaper. So you know what? I’m gonna cut the costs on my training classes to pass the savings on to you.

My individual Mastering class recordings used to be $995. Now it’s just $395 for Mastering Index Tuning, Mastering Query Tuning, Mastering Parameter Sniffing, or Mastering Server Tuning.

The Recorded Class Season Pass: Mastering used to be $2,395 for all of my Masters classes. Now it’s just $995!

I’ve also heard from a few of you that had Live Class Season Passes that expired this month, and you wanted to be able to watch the rest of my live classes scheduled through October. To make that happen, I’m temporarily selling a 7-month Live Class Season Pass for $895.

You can also pick up the Level 3 Bundle, which includes the 7-month Live Class Season Pass, plus a full year of Recorded Class Season Pass Masters & Fundamentals, SQL ConstantCare, and the Consultant Toolkit for $1,595.

Wanna save even more? Wanna drink the discount tequila? You got it: use coupon code VivaMexico for 25% off the Level 3 Bundle and the Live Class Season Pass, this month only.

I’m not bringing back the full Live Class Season Pass permanently – I just wanted to let folks continue to access my scheduled live classes through October of this year. After that, I’ll be taking a well-deserved break from live teaching – and drinking more of those cheap Mexican margaritas.

Viva!


SQLBits 2022 Day 1 Keynote

SQLBits
9 Comments


Conferences are back, baby!

I’m in London for SQLBits 2022, the biggest Microsoft data conference in Europe. This year’s theme is classic video games, and it’s been really cool to see the 8-bit logos and games all over the ExCel Center.

Today is the Microsoft keynote. It’s called Level Up with Azure Data, hosted by Bob Ward, Buck Woody, and a whole crew of Microsoft folks.

Bob Ward started by taking a moment to think about what’s happening to our colleagues in Ukraine.

In an effort to make performance tuning easier, SQL Server 2022 is expanding the Intelligent Query Processing features, and Query Store will be on by default. (These aren’t new changes – they’ve been discussed in the past – but Microsoft’s just reminding folks of what’s coming.)

Pedro Lopes took the stage to talk about parameter-sensitive plan optimization, aka PSP Optimization. He demoed it with SQL Server 2022 CTP 1.3. I’ve written about this feature before, and there wasn’t anything new here in the demos. My opinion on this feature remains the same: I think it sounds like a phenomenal down payment. It won’t fix parameter sniffing, but I don’t think it’s going to backfire.

Bob Ward and Buck Woody then demoed SQL Server Ledger based on blockchain something something. They demoed it by showing a Python notebook in Azure Data Studio – they didn’t run the queries live interactively, just walked through the existing results in the notebook. They kinda hustled through it really quickly, so I don’t think most employees caught it, but here’s what happened:

  • Bob Ward ran update statements
  • Ledger caught that and showed what he did
  • Bob Ward changed pages in the database directly without using update statements
  • Ledger caught that indirectly by showing that the hash blocks don’t match
  • Bob Ward tried to change the ledger data itself
  • Ledger caught that too because it’s stored securely

That demo really needs a solid 20-30 minutes to walk through and explain, and it would be really powerful. They just ran through it in a matter of seconds, so the audience didn’t really get what was happening. I actually really love the idea of this – it’s just tricky because it’ll require a hell of a lot of storage if you want to audit a lot of changes on a lot of tables across a long period of time.

Next up, Bob demoed replicating an on-premises SQL Server 2022 instance up to an Azure SQL DB Managed Instance up in the cloud. The wizard built into SSMS automatically created an Availability Group and a Distributed Availability Group up to the cloud – even if the database you’re trying to protect isn’t currently in an Availability Group. I love the idea of this, but I’m terrified of the execution. The diagnostics and troubleshooting of Distributed Availability Groups is a really sore point – it’s practically nonexistent – and it’s only going to get worse when you throw the cloud into the mix.

Anna Hoffman took the stage and announced that the SQL best practices assessment is now available in the Azure portal. It’s like sp_Blitz, but more powerful because it can be scheduled and centralized.

Managed Instances can now query flat files like parquet formats directly from T-SQL with OPENROWSET. It’s like linked servers, but hard coded to file paths. That’s great because file paths never change, and even if they did, it would be easy for you to go back and refactor your queries to point to the new paths. Okay, no, sarcasm over, I don’t have a huge problem with this because I can see analysts who only know T-SQL, but want to run queries over files. I know they exist – I’ve seen my customers doing that – but putting it into long term production gives me the shivers.

Also new in preview are the Link feature for SQL MI, and Hybrid Service Broker.

Anna demoed Azure SQL Hyperscale’s new named replica feature by adding secondary replicas on the fly, using purely T-SQL. Like the earlier Ledger demo, this just went too quickly, but it looks really compelling for workloads where we want the code itself to crank up our server capacity. That isn’t automatic – automatic scale/up down is compelling too, sure – but I actually love the ability to manually scale capacity up/down with T-SQL. For example, one of my clients does nightly warehouse data loads, and when the load is verified, they run an insanely huge amount of scheduled reports. They can’t predict exactly when they’re going to need the read capacity every time because the load runtimes are wildly variant, and sometimes the loads just outright fail. I can totally imagine them modifying their SSIS processes to fire off T-SQL commands to amp up the read-only capacity when they’ve verified that the loads are done.

Bob came back and demoed something with Synapse Analytics, and I missed it. Look, I drank a lot of coffee before the session, and I had to run to the bathroom. Don’t judge me. I don’t do anything with Synapse Analytics, so if there’s a demo I could skip, it’s that one.

Patrick LeBlanc demoed a live-updating dashboard in PowerBI. Amusingly, he put in a video of Adam Saxton supposedly playing Pac-Man, but Adam was hitting a lot of buttons for a game where you just move Pac-Man around in different directions. I can see why Adam’s having a tough time catching up on the leaderboard. 😉

Evangeline White came up to demo Azure Purview, a map and catalog of your organization’s data. Purview is something I think every big company needs. I’ve seen so many big companies where people don’t understand where data is stored, or what the data means, or whether it should be secured or visible everywhere. However, I bet most companies won’t see the ROI on a tool like Purview, so I think even if companies try to implement it, few will configure correctly, and even fewer will keep it up to date. That’s a bummer. I’m not making it my personal mission to evangelize that stuff, though.

Evangeline mentioned that they’d just released the ability to track lineage on the input and output of stored procedures. Oof, that’s gonna be rough. People hardly document anything, let alone document lineage of what a stored procedure modifies. Again – I love the idea of this, but I know how little documentation people write. We’re lucky when we get any documentation about a stored procedure – much less details on the input data, output data, and the changes that were made to it.

Bob Ward came back onstage to announce that the first public preview of SQL Server 2022 will drop by the end of the first half of 2022.


[Video] Office Hours: Cabo Home Studio Edition

Videos
0

Got questions about SQL Server? Post them at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. This episode was the first live one I shot in my home studio in Cabo:

Here’s what we covered:

  • 00:00 Introductions
  • 02:19 Nullpointer: What are your thoughts on temporal tables, I don’t hear you talk much about them pros/cons.
  • 03:24 Trushit: If my query is returning returns quickly enough but my estimates vs actuals are more than 10x a part, do you think I should tune my query further even if the speed is acceptable?
  • 04:09 Burhan: Is there a good way to estimate how much a given transaction will impact the transaction log ahead of execution time? Concerned about filling up the transaction log with some DDL commands on a table with millions of rows.
  • 05:03 Roy Hinkley: What are the top issues your clients run into with SQL full text search?
  • 06:14 Mehdi: Hi Brent! What is a reliable way to transfer new records to another destination? It is insert-only and identity column should be OK.But if multiple sessions are inserting data to the table, we may have race condition and skipped records depending on how they commit.
  • 06:42 Hazan: Brent, what are the top multi-tenant / SQL related issues you see on a regular basis?
  • 08:23 Carl: Does the advice of always adding at least two data files to every SQL file group (for performance reasons) hold any water when those files are located on a NVME SSD SAN drive?
  • 08:55 Sultan: Hi Brent, how long do you wait after a SQL CU is released before you feel comfortable installing it?
  • 10:00 Tolga: Hi Brent, love your training classes / videos. Do you have any plans to create any AzureSQL training?
  • 11:38 Nadim: Besides compat version, what other settings should we consider updating when migrating from older to newer version of SQL server?
  • 13:03 Tahir: Hi Brnet, our DB’s currently have recovery interval of 0. Microsoft docs recommend keeping this value at 0 unless you experience problems. Have you ever experienced a problem with recovery interval of 0?
  • 13:44 Eran Haddad: We use sql server 2014 compatibility level and soon to upgrade to 2019. How can I convince my managers to upgrade the compatibility level without warning? Beside restoring to another db and make tests (it’s pretty hard to do)
  • 15:45 Leonard: What is your opinion of SQL Buffer pool extension? Has it ever gotten you over the performance finish line?
  • 17:26 Rahmi: Hi Brent, how do you know when it’s time to move a high write table to a new less busy filegroup / SAN volume? Will sp_blitzindex help here?
  • 18:23 Hadir: Hi Brent, Is SELECT TOP(N) without an ORDER BY bad for any particular reason?
  • 19:16 sol: Praised be the acclaimed tortugas savior! What drives you to answer the same questions over and over again during OH without smashing your iPad against a wall? Jokes aside, do you become hyped / addicted to this type of community interactions during now-all-worldly “trends”? Thx!
  • 20:33 Hasan: Hi Brent, is SQL availability groups any faster or more efficient for replicating large amounts of transaction log activity compared to SQL log shipping?
  • 22:12 a friend of mine: is saying that certification is worth nothing. What do you think about payed trainings? Are they always worth money spent?
  • 23:51 Rico Suave: Hi Brent, love your classes. How often do you update or re-shoot previous recorded class videos?
  • 24:35 Jonas Grumby: Is Query cost or logical reads a better identifier of expensive queries?

Office Hours: 7 Questions for You to Answer Edition

Architecture
42 Comments

Sometimes, I see questions and go, “You know, I’d rather let other people answer this.” And today, I’m gonna let you do it.

My blog post this week about PollGab.com and about its data model brought out all kinds of interesting Office Hours questions. I’ve selected 7 of them that I don’t wanna answer – instead, I wanna hear your answers.

  1. Gary Kendall: Hey Brent, Do you have any recommendations for version control or auditing to help manage code changes in SQL Server (stored procedures, functions, etc.)?. Developers say “we didn’t change anything” – we know SOMEONE did, but don’t always know who. Thanks! (p.s. nice beach!)
  2. Azure DBA: Hi Brent, what do you thing of the maximum synapse dedicated SQL pool backup retention of 7 days (42 points). Could this be considered a risk and is there any other way you can backup SQL pool to have a long term backups?
  3. I’m_a_null_set: Hey Brent. Love your content and classes. I bought your fundamentals bundle a while back which is fantastic. Unfortunately, my employers training budget isn’t enough to cover your mastering bundle. Can you recommend any resources which could help up skill me whilst I save up?
  4. Arnold: Hi Brent, what are your favorite options to use with sp_whoisactive?
  5. Roy Hinkley: Hi Brent, is there anything correction action that can be take when estimated rows vs actual rows is off by more than 10x for SQL fulltext search using “CONTAINSTABLE” function?
  6. DBA Taylor: Hey Brent, big fan of your content! I’m wondering if you know of any “gotchas” with distributed availability groups that are not present with normal availability groups? e.g.The verbiage Microsoft uses in their docs makes it sound more difficult to failover than your typical AG.
  7. Toucan Sam: Hi Brent, what is your opinion of the new query plan viewer in latest Azure Data Studio?

Got an opinion about any of ’em? Post it in the comments.


Building PollGab.com: February’s Hosting Bill: $1.85.

PollGab
10 Comments

Earlier this week, I wrote about how we designed the database for PollGab.com. Because we went with AWS DynamoDB for the database storage and AWS Lambda for the processing work, costs are really, really low:

PollGab hosting bill, page 1

That is not a typo: that’s $1.85 USD.

Richie puts the business logic in functions, and those functions run in AWS Lambda. That’s the compute power, or what used to be known as an application server. It was completely free last month because Amazon gives you the first 1,000,000 requests totaling less than 3.2M seconds of compute time (889 hours, aka 37 days) free of charge. We’re in the free tier for data transfer and email services as well.

We spent $0.04 on Amazon S3, aka file storage. I honestly don’t remember what we’re putting in file storage, and at four cents, I’m not gonna bother to look.

Scrolling on down to page 2 of the bill:

The free tier of DynamoDB includes up to 25GB data and enough capacity to handle about 200 million well-written queries per month. My kind of database right there.

The most expensive thing in our hosting bill is $1.01 for Route 53, which is DNS. When you type pollgab.com into a web browser, DNS is the magical service that tells your browser where to go.

This incredibly exorbitant pricing is due to the fact that each domain costs $0.50 per month, and we have two domains. ¯\_(?)_/¯ We’ll have to learn to live with that expense.

Next up is $0.74 for CloudWatch, which is monitoring. Yes, that’s right: monitoring is 5-6x the cost of the rest of our infrastructure combined, hahaha. Upon reading that, my first reaction was, “I have to stop what I’m doing and check CloudWatch’s settings.” After spending approximately $0.74 worth of my time, I realized that I was already sidetracked because we were getting write alerts in the dev environment. I abandoned that task.

So why does PollGab cost $59 per year?

Well, the hosting part is doggone near free, but the development costs certainly are not.

Building a cloud-native application is about spending more development time up-front in exchange for lower long-term costs of ownership – both lower hosting costs, and lower systems administration costs. By himself, Richie’s been able to crank out PasteThePlan, SQL ConstantCare®, and PollGab.com and more over the last few years, and we haven’t had to hire sysadmins or support teams.

The hosting costs on these do indeed go up as they become more popular – but as they gain more paying customers, it’s much easier to get apps like this to become profitable, and keep them that way. That’s where cloud-native pays off.


Who’s Hiring in the Database Community? March 2022 Edition

Who's Hiring
15 Comments

Is your company hiring for a database position as of March 2022? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.

If you want to be notified as people leave new jobs here, there’s a “subscribe without commenting” option where you leave a comment. I don’t use that for subscribing you to anything else.


Building PollGab.com: Designing the Data Model

PollGab
9 Comments

In the last post, I talked about how Richie and I sketched out the goals of PollGab.com, the site where you can post questions for other folks to answer during their live streams.

I challenged you to decide:

  1. What data needs to be stored to make all this work?
  2. What database or persistence layer should we use?

So let’s talk through each of ’em.

1. What data needs to be stored?

The 3 major components are:

  • Rooms – with attributes like the room name, URL, room owner email
  • Messages – including the message, person’s name, person’s email address. Initially, we didn’t want email address, but we added it and made it optional just so we could show Gravatar icons next to peoples’ names.
  • Votes – and this is where your opinions will play out in the comments. The data professional in you wants to capture the details of every vote, and you might want to add a separate normalized Votes table for it. However, for the sake of this app, we don’t need granular tracking of votes. We just need to increment the number of votes on a message – and for that, it makes sense to just have a VoteCount column on the Messages object. I know. You’re not happy about that. But remember, we have a goal to track as little data as possible.

Here’s one of the early data model sketches from Richie’s notebook:

Richie’s sketch adds a fourth data object, connection, that’s required for tracking each of the browsers connected at the moment. We don’t need that for evil purposes – we just need to push updates back to them so their screen can refresh in real time.

Just for clarity’s sake, this isn’t the exact structure of PollGab’s current data model – but I like showing this early draft because it shows the thought process. Data models change over time, especially when you’re building a brand new app like this one where we can iterate quickly, and we don’t need to deal with queries coming from other systems, like reporting tools.

There are additional things to think about as well, like banning users and making payments, but I’ll keep it simple for the sake of this discussion. (And really, in a modern app, you don’t want to hold the payments yourself – that’s what services like Stripe are for.)

In terms of the way we load and query this data, here are a few sample use cases:

  • View the list of rooms here
  • View the list of messages for one room (like Office Hours)
  • Insert a new message
  • Upvote a message
  • Flag a message
  • Delete all messages in a room

You’ll notice that the views are fairly tightly scoped: we’re not doing things like analytical reports.

2. What database or persistence layer should we use?

I know, most of us here use the Microsoft data platform to make a living. However, dating back several years ago when Richie built PasteThePlan.com, then when he built SQL ConstantCare®, we’ve long believed that where we can avoid using a relational database, we should.

NoSQL systems are generally cheaper and easier to manage than relational databases. Yeah, I said it. Come at me. Try hosting production-quality, auto-scaling relational databases for $3.43 per month. It’s damn hard.

So like PasteThePlan (but unlike SQL ConstantCare®), we ended up using Amazon DynamoDB. I know it’s marketing, but the main boxes on the DynamoDB product page do a pretty good job of summing up its advantages:

DynamoDB does have drawbacks for most of us in the audience: we can’t just whip open SSMS or Azure Data Studio, write a query with joins and grouping, and get reports out. However, for the sake of this web site, we just don’t need reporting across multiple rooms or questions.

During the design phase, I specifically said that any kind of marketing analytics were completely out of scope. I recognize that ad agencies and podcasting companies might want to do centralized purchases for lots of podcasts or rooms, and then get analytical data to know which users are asking the most questions, or what kinds of questions get the most upvotes. They’re just not our target market for the initial product release.

This is a cloud-native, locked-in app.

We designed this from the ground up to work exclusively with Amazon DynamoDB. This means we’re stuck there, for better or worse. If Amazon DynamoDB goes down, PollGab’s going down with it. As a small business owner, that’s a risk I’m totally willing to take. DynamoDB has much higher uptime than any server I’ve personally been involved with building and maintaining.

I’m sharing this with you because I want you, dear reader, to understand why startups and small businesses might choose to go cloud-native and build their applications without traditional relational database systems. I love databases! I still absolutely love ’em. But I’d rather not be on call for fixing them when (not if) they break.

I’m also being fine locked into AWS because we chose to keep building with Amazon Lambda, the same hosting and processing architecture that we chose for PasteThePlan and SQL ConstantCare®. Lambda lets you write short-lived functions, and you don’t have to know anything about the operating system or servers that run your code. When someone hits a URL, the code runs, period. When lots of people come in, AWS will gradually spin up more… somethings… that run your code. That’s just not really our problem.

Could we have used Azure or Google services instead? Absolutely! But at the time we started building this, Richie had already built up 5 solid years of experience with DynamoDB and Lambda, so we were able to start building more quickly without learning new tooling.

Could we have used a conventional server-based application, like a web site in IIS or a .NET app? Absolutely! But we wanted to be able to completely outsource uptime management and troubleshooting to Amazon, and that’s where Lambda makes things easier. The best evidence for that is that as Richie continues to build stuff, he’s able to spend an absolute minimum of time managing our already existing apps like PasteThePlan and SQL ConstantCare.

What questions do you have about PasteThePlan’s data model or persistence layer? Let us know in the comments.


Building PollGab.com: Design a Database for Live Session Questions

PollGab
19 Comments

Years ago when we did online presentations or live Office Hours webcasts, attendees would type their questions into the “Questions” section of whatever webcast platform we were using – WebEx, GoToWebinar, Zoom.

This sucked because:

  • People would put questions all over the place (Questions, Chat, email, etc)
  • It was hard to keep track of which questions were still left to be answered
  • People couldn’t write questions in advance
  • People kept copy/pasting the same questions over and over because they’d think we hadn’t seen them
  • We didn’t have a way to tell which questions other folks wanted us to answer

So when I decided to start taking Office Hours more seriously a couple of years ago, Richie and I sketched out a better way.

That way is www.PollGab.com, and it’s what Richie’s been working on for the past year. You might have seen links to it during my Office Hours:

Viewers can post messages, upvote messages that they’d like to see me discuss, and flag inappropriate messages.

To get an idea of how it works, you can see my room, see the list of active rooms, and create your own room for free. If you run into problems or have a question, email Help@PollGab.com.

When you’re clicking around in there, if we’ve done our job right, things are going to seem pretty intuitive and obvious. However, it takes a lot of planning and work to make an app seem effortless, so let’s talk about some of the design goals we came up with.

The data updates automatically. If you sit there watching a popular room, you can see new questions coming in, and see questions move around on the leaderboard, automatically without touching Refresh on your browser. In a a perfect world, you should be able to load a room’s page just once, and watch the action unfold.

Viewers & voters don’t need to log in. In today’s GDPR-sensitive world, we want to store as little data as practical, and delete it as quickly as possible. We do need a way for room owners to be recognized so we can let them see the list of flagged questions, un-flag them, ban problem users, or set a question as “now showing”, plus some other neat tools I’ll mention this week.

The lack of logins does mean people can game the system: if you keep spawning new incognito browser windows, you can repeatedly upvote your own question. For now, we’re not really concerned about that. (Gotta pick your battles when you’re building a new site.)

If anyone flags a message, it disappears from everyone’s screens. Given that we’re talking about the Internet, where people are endlessly creative and foul, we wanted to make it easy for the audience to help us police. On the flip side, if the room owner is looking at the list of flagged questions, and decides to UN-flag a question (because it’s appropriate and safe), then that message reappears automatically. Plus, it can’t be flagged again, because the room owner has already believed it to be appropriate.

The room owner has additional controls. They can stop taking messages, or clear the message queue entirely.

Tag, it’s your turn:
design the database.

Let’s pretend that someone sketched all this out for you and asked you the following questions:

  1. What data needs to be stored to make all this work?
  2. What database or persistence layer should we use?

I’m curious to hear what you think. Let me know in the comments, and then in the next blog post, we’ll discuss what we chose to do.


Big Data Clusters: Out of Aces

SQL Server
16 Comments

When this feature was announced in 2018, I wrote:

<sarcasm> It’s like linked servers, but since they don’t perform well, we need to scale out across containers. </sarcasm>

I just didn’t get it, and I continued:

I like that Microsoft is making a risky bet, planting a flag where nobody else is, saying, “We’re going to be at the center of the new modern data warehouse.” What they’re proposing is hard work – we all know first-hand the terrible performance and security complexities of running linked server queries, and this is next-level-harder. It’s going to take a lot of development investments to make this work well

The thing with bets, as the great strategist K. R. Rogers wrote, is you gotta know when to fold ’em, know when to walk away, and know when to run.

Discussing SQL Server licensing

Today, Big Data Clusters died in its sleep.

Back then, I said I liked that Microsoft was making risky bets. They’ve proven time and again that they’re willing to continue to do that with SQL Server, throwing all kinds of crazy features against the wall. Big Data Clusters in Kubernetes. Machine Learning Services on Linux. Calling Java code from T-SQL. I look at a lot of these bets and think, “Uh, I don’t think you can win a poker game with two threes, a Subway discount card, and a happy birthday card from Grandma. That’s not how this works.”

The great part about these utterly wacko bets is that they don’t break the database engine. Microsoft SQL Server is still a robust, powerful persistence layer with a lot of good features. Microsoft hasn’t made dumb moves like saying, “We’re going to replace T-SQL with interpretive dance,” or “We’re not going to run queries if they don’t end in a semicolon.” They’ve figured out that they have to keep the core engine working better and better with each release, too.

If you ask me, five nines is a better hand.

There’s a lesson in here when you gamble on your own career. I beat this drum a lot, but I’m gonna say it again: you only have so many hours in the day to learn new skills. Make sure every hour counts.


Meme Week: Setting Fill Factor to Fix Fragmentation

Index Maintenance
27 Comments

I just shake my head when I see people setting fill factor in an attempt to “fix” fragmentation.

The default fill factor is 100%. That means during an index rebuild, SQL Server packs 100% of your 8KB pages with sweet, juicy, golden brown and delicious data.

But somehow, some people have come to believe that’s bad, and to “fix” it, they should set fill factor to a lower number like 80% or 70%. But in doing so, they’re setting fragmentation to 20%-30%. They’re telling SQL Server to leave 20%-30% free space on every single page during an index rebuild.

That’s internal fragmentation – empty space on pages – and it’s bad. It makes your database 20-30% larger, makes your table scans take 20-30% longer, your maintenance jobs take 20-30% longer, and makes your memory 20-30% smaller.

And the very people who keep playing around with fill factor are the ones who wonder why their queries aren’t getting faster, why their maintenance is taking longer, and why their fragmentation isn’t getting better. The jobs still keep taking forever every night, and they don’t connect the dots that they’re the problem.


Meme Week: NOLOCK? NOSQL.

When you find yourself using NOLOCK, consider NOSQL instead.

NOLOCK has 4 big problems:

  • Your query can see the same rows twice
  • Your query can skip rows altogether
  • Your query can see data that was never committed
  • Your query can fail altogether

If you’re okay with all 4 of those, and you find yourself REGULARLY applying NOLOCK, you probably don’t need the expense of a relational database at all. You’d probably be better off with an eventual consistency platform that trades accurate results for speed.


Meme Week: SET DEADLOCK_PRIORITY HIGH

Hey, this seems harmless.

Seriously, don’t SET DEADLOCK_PRIORITY HIGH.

Even setting it to low can be a bad idea. I’ve heard people say, “Well, I’ve got a big process that runs nightly, but if it gets in anybody’s way, I would rather just kill my process than slow someone else down.” Keep in mind that SQL Server is still doing all that work until it hits the deadlock – and if we’re talking about an hours-long process, there can still be hours of work involved in rolling back your process to let someone else win the deadlock.

When you find yourself going near the DEADLOCK_PRIORITY option, take a step back and question every decision that brought you to this point.


Meme Week: Queue in the Database

Architecture
11 Comments

When you have a hammer, everything looks like a nail. Unfortunately, a lot of people don’t even know how to use a hammer correctly.

You really shouldn’t be doing queues in the database because:

  • They’re a blocking nightmare, constantly locking rows in highly concurrent tables
  • They’re a logging nightmare, constantly pouring data into the transaction log
  • They’re an HA/DR nightmare, constantly causing slow writes to your sync replicas
  • They’re a backup nightmare, constantly changing pages that bloat your differential backups
  • They’re a licensing nightmare, causing a lot of work on an expensive relational database for tables that have no relational joins to anything else whatsoever
  • There are simple, cheap queueing alternatives like Amazon SQS and a few in Azure

If you really gotta do queues in SQL Server, read how to use locking hints.