This post covers two scenarios
You either created a database, and the sync failed for some reason, or a database stopped syncing. Our setup focuses on one where sync breaks immediately, because whatever it’s my blog post. In order to do that, I set up a script to create a bunch of databases, hoping that one of them would fail. Lucky me, two did! So let’s fix them.
You have to be especially vigilant during initial seeding
Automatic failover can’t happen while databases sync up. The AG dashboard reports an unhealthy state, so failover is manual. The good news is that in the limited test scenarios I checked out, Direct Seeding to Replicas will pick back up when the Primary is back online, but if anything really bad happens to your Primary, that may not be the warmest or fuzziest news.
Here’s our database stuck in a restoring state.
Now let’s look in the error log. Maybe we’ll have something good there. On the Primary…
Unknown,The mirror database "Crap903" has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.
Okie dokie. Good to know. On the Replica, you’ll probably see something like this…
Automatic seeding of availability database 'Crap903' in availability group 'SQLAG01' failed with an unrecoverable error. Correct the problem then issue an ALTER AVAILABILITY GROUP command to set SEEDING_MODE = AUTOMATIC on the replica to restart seeding.
Oh, correct the problem. You hear that, guys? Correct the problem.
IF ONLY I’D THOUGHT OF CORRECTING THE PROBLEM.
So what do we do? We can check out the AG dashboard, see a bunch of errors, and then focus in on them.
Alright, let’s see what we can do! We can run a couple magical DBA commands and see what happens.
ALTER DATABASE [Crap903] SET HADR RESUME
ALTER DATABASE [Crap903] SET HADR AVAILABILITY GROUP = SQLAG01;
The two errors were:
Msg 35242, Level 16, State 16, Line 1
Cannot complete this ALTER DATABASE SET HADR operation on database ‘Crap903’.
The database is not joined to an availability group. After the database has joined the availability group, retry the command.
Msg 1412, Level 16, State 211, Line 1
The remote copy of database “Crap903” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.
Interesting! What the heck does that mean? If Brent would give me his number, I’d call and ask. I don’t understand why he won’t give me his number. Well, let’s just kick this back off. We kind of expected that not to work because of the errors we saw in the log before, but it’s worth a shot to avoid taking additional steps.
ALTER AVAILABILITY GROUP [SQLAG01] REMOVE DATABASE [Crap903]
ALTER AVAILABILITY GROUP [SQLAG01] ADD DATABASE [Crap903]
Right? Wrong. Digging into our DMVs and Extended Events, they’re telling us that a database with that name already exists. What’s really lousy here is that this error doesn’t appear ANYWHERE ELSE. It’s not in the dashboard, it’s not in regular, documented DMVs, nor in the XE health session. It’s only in the undocumented stuff. If you’re going to use this feature, be prepared to do a lot of detective work. Be prepared to cry.
What we have to do is go back, remove the database from the Availability Group again, then drop it from our other Replicas. We can’t just restore over what’s already there. That would break all sorts of laws of physics and whatever else makes front squats harder to do than back squats.
Since our database is in a restoring state, it’s a few steps to recover it, set it to single user so no one does anything dumber than our AG has done, and then drop it.
When we re-add the database to our Availability Group, it should start syncing properly. Lucky for us, it did!
There’s no Tinder for databases.
New features are hard
With direct seeding, you have to be extra careful about named instances and default database creation paths. If you used named instances with default database paths to Program Files, or different drive letters and folder names, this isn’t going to work. You don’t have an option to change those things. SQL expects everything to be there in the same place across all of your Replicas. I learned that the annoying way. Several times. Troubleshooting this was weird because I still can’t track down a root cause as to why anything failed in the first place. For the record, I created 50 databases, and two of them didn’t work for some reason.
Correct the problem. Just correct the problem.
Thanks for reading!