Availability Group Direct Seeding: How to fix a database that won’t sync

By Erik Darling · June 29, 2016 · 25 comments

This post covers two scenarios

You either created a database, and the sync failed for some reason, or a database stopped syncing. Our setup focuses on one where sync breaks immediately, because whatever it’s my blog post. In order to do that, I set up a script to create a bunch of databases, hoping that one of them would fail. Lucky me, two did! So let’s fix them.

You have to be especially vigilant during initial seeding

Automatic failover can’t happen while databases sync up. The AG dashboard reports an unhealthy state, so failover is manual. The good news is that in the limited test scenarios I checked out, Direct Seeding to Replicas will pick back up when the Primary is back online, but if anything really bad happens to your Primary, that may not be the warmest or fuzziest news.

Here’s our database stuck in a restoring state.

Now let’s look in the error log. Maybe we’ll have something good there. On the Primary…

Unknown,The mirror database "Crap903" has insufficient transaction log data to preserve the log backup chain of the principal database.  This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

1	Unknown,The mirror database "Crap903" has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

Okie dokie. Good to know. On the Replica, you’ll probably see something like this…

Automatic seeding of availability database 'Crap903' in availability group 'SQLAG01' failed with an unrecoverable error. Correct the problem then issue an ALTER AVAILABILITY GROUP command to set SEEDING_MODE = AUTOMATIC on the replica to restart seeding.

1	Automatic seeding of availability database 'Crap903' in availability group 'SQLAG01' failed with an unrecoverable error. Correct the problem then issue an ALTER AVAILABILITY GROUP command to set SEEDING_MODE = AUTOMATIC on the replica to restart seeding.

Oh, correct the problem. You hear that, guys? Correct the problem.

IF ONLY I’D THOUGHT OF CORRECTING THE PROBLEM.

Sheesh

So what do we do? We can check out the AG dashboard, see a bunch of errors, and then focus in on them.

Alright, let’s see what we can do! We can run a couple magical DBA commands and see what happens.

ALTER DATABASE [Crap903] SET HADR RESUME

ALTER DATABASE [Crap903] SET HADR AVAILABILITY GROUP = SQLAG01;

THE SALES GUY SAID THIS WOULD BE SO EASY WTF SALES GUY

The two errors were:
Msg 35242, Level 16, State 16, Line 1
Cannot complete this ALTER DATABASE SET HADR operation on database ‘Crap903’.
The database is not joined to an availability group. After the database has joined the availability group, retry the command.

And then

Msg 1412, Level 16, State 211, Line 1
The remote copy of database “Crap903” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.

Interesting! What the heck does that mean? If Brent would give me his number, I’d call and ask. I don’t understand why he won’t give me his number. Well, let’s just kick this back off. We kind of expected that not to work because of the errors we saw in the log before, but it’s worth a shot to avoid taking additional steps.

ALTER AVAILABILITY GROUP [SQLAG01] REMOVE DATABASE [Crap903]
GO

ALTER AVAILABILITY GROUP [SQLAG01] ADD DATABASE [Crap903]
GO

ALTER AVAILABILITY GROUP [SQLAG01] REMOVE DATABASE [Crap903]

ALTER AVAILABILITY GROUP [SQLAG01] ADD DATABASE [Crap903]

Right? Wrong. Digging into our DMVs and Extended Events, they’re telling us that a database with that name already exists. What’s really lousy here is that this error doesn’t appear ANYWHERE ELSE. It’s not in the dashboard, it’s not in regular, documented DMVs, nor in the XE health session. It’s only in the undocumented stuff. If you’re going to use this feature, be prepared to do a lot of detective work. Be prepared to cry.

Crud

What we have to do is go back, remove the database from the Availability Group again, then drop it from our other Replicas. We can’t just restore over what’s already there. That would break all sorts of laws of physics and whatever else makes front squats harder to do than back squats.

Since our database is in a restoring state, it’s a few steps to recover it, set it to single user so no one does anything dumber than our AG has done, and then drop it.

Drop it like it's crap. — Drop it like it’s crap.

When we re-add the database to our Availability Group, it should start syncing properly. Lucky for us, it did!

I'm not highly available and I'm so scared. — I’m not highly available and I’m so scared.

There’s no Tinder for databases.

I'm highly available. Call me. — I’m highly available. Call me.

New features are hard

With direct seeding, you have to be extra careful about named instances and default database creation paths. If you used named instances with default database paths to Program Files, or different drive letters and folder names, this isn’t going to work. You don’t have an option to change those things. SQL expects everything to be there in the same place across all of your Replicas. I learned that the annoying way. Several times. Troubleshooting this was weird because I still can’t track down a root cause as to why anything failed in the first place. For the record, I created 50 databases, and two of them didn’t work for some reason.

Correct the problem. Just correct the problem.

Thanks for reading!

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

25 comments

Michel Zehnder

June 29, 2016 at 7:30 am

I guess this is the result of a “Cloud First” approach. It doesn’t have to be user-friendly at all, it just has to work in the Microsoft Azure environment 🙂

Reply
Brandon M.

June 29, 2016 at 11:53 am

How much for a front squat form check?

Reply
1. Erik Darling
  
  June 29, 2016 at 12:04 pm
  
  Four scoops of dat dere CellTech.
  
  Reply
Thomas Pullen

June 30, 2016 at 3:38 am

Reading this article has made me feel like binge drinking absinthe, it’s only 9.37am and I don’t even use AGs. Good work, Eric.

Reply
Mike

January 10, 2017 at 8:50 am

hahaha your sarcasm toward MS error messages had me in stitches!

Reply
Mike

January 10, 2017 at 8:56 am

This is literally the funniest blog I’ve ever seen. BY FAR. Bravo.

Reply
DavidK

April 7, 2017 at 12:44 pm

“You don’t have an option to change those things. SQL expects everything to be there in the same place across all of your Replicas. I learned that the annoying way. Several times.”
This tidbit of critical knowledge is worth all that pain. Thank you so much for finding this out. I was going to start using direct seeding except our server setup uses slightly different naming conventions between servers. It would have been messy without knowing this first.

Reply
1. Erik Darling
  
  April 7, 2017 at 1:38 pm
  
  Thanks! While it may still be an issue, I always encourage people to try things on their own, especially after a few CUs or SPs have rolled out. You never know what may have changed.
  
  I have a feeling this is still a gotcha though.
  
  Reply
Pramod

June 11, 2018 at 8:36 am

Can you please add programmatic step to add database to availability group? Since as far as I know we need to provide credentials and network path for the logs.
If you share steps that will be awesome!

Reply
1. Brent Ozar
  
  June 11, 2018 at 8:40 am
  
  Pramod – no, we don’t really do free programming here, sorry.
  
  Reply
Razvan Zoitanu VIP Student since 2017

August 3, 2018 at 1:21 pm

Struggled with this on the latest version (2016 SP2 with CU2). Not using names instances, and all 3 availability replicas were deployed with the same settings (database files paths and all). Added 2 databases to an AG, hit this issue with the second database (secondary database not joined, files were not created). Cleaned up, ran the same script, all synced ok. Added another 5 databases, the third one failed. Cleaned it up, ran again for the failed one, now synced ok. And these are all small databases, 1GB each. Trace flag 9567 is not enabled.

For me, failure_state was: FAILED 21 Seeding Check Message Timeout

USE [master];
SELECT TOP 100 start_time, completion_time, is_source, current_state, failure_state, failure_state_desc, error_code FROM sys.dm_hadr_automatic_seeding ORDER BY start_time DESC;
SELECT * FROM sys.dm_hadr_physical_seeding_stats;

Reply
Joe Murray VIP Student since 2017

February 18, 2019 at 9:54 am

I also struggled with the same issues until I upgraded (azure VM’s) all of the servers to 8 cores. With 8 cores, all of these issues seem to go away.

Reply
SQLShark

June 17, 2019 at 8:56 am

I found a secret playing around. I experienced the same crap with failures doing 5 or 3 at a time adding to the AG group. So I said lets make tis really simple here for SQL. The below query is the meat & potatoes for a T-SQL script that first finds only 1 database ( even though there are 1200 waiting ) yes big smoke test and ran it out of threads doing so determining the Azure breaking point with thread exhaustion. SO back to what I was saying, snap in 1 database at a time using a SQL job running on 3-4min intervals seeded 1235 databases until SQL well was quite unresponsive. Script fetches the db name, runs a backup and pass in the AG group very successfully. My company hired a MS guy and he liked my code and ideas here since it worked.
Drum roll fore the smallest but most useful query: We have a utility db we do not to bother with in build which is always #5 dbid.
Select TOP 1 Name from master..sysdatabases
where name not in(
Select db_name(database_id) from sys.dm_hadr_database_replica_states)
and dbid > 5
I also added TDE encryption into this code and had to build a encryption watcher since while it is being encrypted you can not run the backup prior to adding to the AG group. Here is that small but very useful script:
–Monitor the encryption process

Declare @dbid int,@encryptionstate int
Select @dbid = dbid from sysdatabases where name = @dbname
looper:
SELECT @encryptionstate = encryption_state FROM sys.dm_database_encryption_keys
WHERE database_id = @dbid;
Print @encryptionstate
If @encryptionstate != 3
Begin
WAITFOR DELAY ’00:00:06′;
Print ‘Checking for Encryption to completion status’
GOTO looper
End
Else
Begin
Print ‘Encryption Process has Completed moving to adding to the AG Group now.’
end

Edward J Pochinski III

Reply
larry West

October 20, 2020 at 7:33 am

I thought I was the only one thinking the Always On error messages were useless….

“…needs to be in a SYNCHRONIZED state…”

No kidding…

Reply
Dragos T

February 2, 2021 at 6:16 am

Hey

Alternative for the automatic seeding error

Msg 1412, Level 16, State 211, Line 1
The remote copy of database “Crap903” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.

You can avoid dropping the database (especially when it’s a big database) and fix this problem by applying the missing transactional backups. On my case, the secondary copy of the Distributed Availability Group failed with this error after the seeding successfully completed and resume operation didn’t work.

The steps that i’ve did was to remove the db from AG ; when the db is in restore state apply the transactional backups with no recovery, backups that were taken during the seeding process.

After this operation you will be able to add the db back to the AG by joining the db to the AG.

Reply
Ed Pochinski

February 2, 2021 at 7:19 am

Make sure when you run the seeding perms each node is primary replica. I had the auto seeding working aross 4 node replica systems flawlessly. I used POSH to auto deploy the perms. Also land the 6 busiest databases first in the AG group. We automated SQL to pickup new databases and slap them into the auto seeding. We also automated if one comes out of sync to remove from all secondary replicas and then re send across. This also worked flawlessly.

Reply
1. FR
  
  June 25, 2021 at 6:13 pm
  
  CAN YOU PLEASE SHARE YOUR STEPS/SCRIPTS
  
  Reply
Edward Pochinski

June 25, 2021 at 6:53 pm

I can get a bunch of the code up here I’m sure, I need to edit out any company information. I had done much testing, we exhausted worker threads exceedding 1000 databases in 1 Ag group( that was fun). Performed failovers with PowerShell with 400-600 databases and also to poll the 3 replicas and “resume data movement where needed” from failinbg over to a asynch replica. Wicked busy but they had me converted this wonderful technology due to a data issue ( Drum Roll )yes to LOg Shipping. So I now have 1 job as the puppet master and move 1,100 databases using some OLa H backup procedures and my custom T-SQL code. I also built in a healing process for the LS, if out of sync dropped and re established. I will checkj with my company to be sure everytrhing is peachy with me publically posting the code. I’m sure some time next week I can have some sort of posting. I wonder if Brent want to create a page or something so it is not posted in a blog. Stay tuned,

Reply
Ronald

July 22, 2021 at 12:35 pm

thank you. however, I have a 2019 DAG configuration and here is the error message
Seeding encountered a transient failure, state ‘108’, maximum retries exceeded’

VM1 and VM2 (global) seed however, VM3 and VM4 (forward) did not

What do you think

Reply
1. Brent Ozar
  
  July 22, 2021 at 3:07 pm
  
  Hi! I don’t do free production support, unfortunately. You can either click Consulting at the top of the site, or contact Microsoft.
  
  Reply
Ed Pochinski

July 22, 2021 at 4:29 pm

Once the Auto seeding perms are configured correctly this script dispersed a database to 3 replicas. 1 in Azure East and 2 in Azure West was were my 3 were. East Replica was synchronous and West’s were Async. In testing we auto seeded over 1000 databases in a single AG group until yes, no more worker threads available. It was able to failover flawlessly with 400-600. We automated that with POSH and circled back to resume data movements on the new secondaries from the new Primary in West.

CLS
#add the databases to the AG Group Via Auto Seeding
$SQL_Replica1=”SQLName,port”

Write-Host “Performing the backup and seeding of new databases” -fore Magenta
Write-Host “Connecting to : ” $SQL_Replica1 -fore Yellow
Invoke-Sqlcmd -ServerInstance $SQL_Replica1 -verbose -Query ”

–# Purpose – This script will deploy DBs with TDE and Auto seed to the replicas

SET NOCOUNT ON
Declare @sqlstring nvarchar (555),@backuppath varchar(555),@sqlstring1 nvarchar (555),@sqlstring2 nvarchar(555)
DECLARE UserDBNotSyncd INSENSITIVE CURSOR

FOR

Select TOP 1 Name from master..sysdatabases
where name not in(
Select db_name(database_id) from sys.dm_hadr_database_replica_states)
and dbid > 5

FOR READ ONLY

OPEN UserDBNotSyncd

DECLARE @DBName varchar(50)

FETCH NEXT FROM UserDBNotSyncd INTO @DBName

WHILE (@@FETCH_STATUS = 0)

BEGIN

Print ‘Backing up database: ‘ + @DBName
Set @backuppath = ‘\\PhyNodeName\AGshare\’ + @DBName + ‘.bak’
Backup Database @DBName TO DISK = @backuppath

Print ‘Completed Database Backup to staging’

Print’*****Adding new database to AG Group for Auto Seeding: ‘ + @DBName + ‘ ******’
set @sqlstring=’ ALTER AVAILABILITY GROUP AG1 ADD DATABASE [‘ + @DBName + ‘];’

Print @sqlstring
EXECUTE sp_executesql @sqlstring

If @@error = 0
Print ‘Database Database Successfully added: ‘ + @DBName

FETCH NEXT FROM UserDBNotSyncd INTO @DBName
END

CLOSE UserDBNotSyncd

DEALLOCATE UserDBNotSyncd

–cleanup backup staged files

Declare @cmdshellvalue sql_variant
select @cmdshellvalue = value_in_use from sys.configurations where name = ‘xp_cmdshell’
If @cmdshellvalue = 0
Begin
Print ‘Turning on xp_cmdshell,it magically was turned off somehow.’
exec sp_configure ‘show advanced options’,’1′
RECONFIGURE WITH OVERRIDE
exec sp_configure ‘xp_cmdshell’,’1′
RECONFIGURE WITH OVERRIDE
END
waitfor Delay ’00:00:02′
Print ‘Cleaned up the full backup file from AG Prep & Auto Seeding’
exec xp_cmdshell ‘ del \\PhyNodeName\AGshare\*.bak’
Print ”
Print ‘The process has completed, refresh the AG Dash Board’
” -querytimeout ([int]::MaxValue)

Write-Host “Completed Seeding Process” -fore Green

–Worried about worker thread issues this worked great

–Author – Ed Pochinski 04232019
–Purpose – AG Worker thread alert
–SQL Server Central posting copy
Declare @availableThreads varchar(25), @workers_waitingForCPU varchar(25), @Request_waiting_ForThread varchar(25),@msgbody varchar(500), @msgsubject varchar(500)
Select @availableThreads = (select max_workers_count from sys.dm_os_sys_info)-sum(active_Workers_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @availableThreads
Select @workers_waitingForCPU = sum(runnable_tasks_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @workers_waitingForCPU
Select @Request_waiting_ForThread = sum(work_queue_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @Request_waiting_ForThread
–edit threshold to accomodate this is a test machine with no load so the humber is high to facilitate the alert
If @availableThreads <=50
Begin
Print ' Send a email alert here !!!!! Worker threads are low: ' + @availableThreads + ', Workers Waiting on CPU: ' + @workers_waitingForCPU + ' \ Requests Waiting for thread: ' + @Request_waiting_ForThread
set @msgbody ='Worker threads are low on Server: ' + @@servername + ' – Available Thread Count: ' + @availableThreads
set @msgbody = @msgbody + char(13);
set @msgbody = @msgbody + ' – Worker Threads Waiting on CPU count: ' + @workers_waitingForCPU + char(10);
set @msgbody = @msgbody + ' – Requests Waiting for worker thread count: ' + @Request_waiting_ForThread
set @msgsubject = '!!! AG Worker Thread Alert on SQL Server: ' + @@servername + ' !!!'
EXEC msdb.dbo.sp_send_dbmail
@recipients =N'email@somewhere.com',
@body = @msgbody,
@subject =@msgsubject ,
@profile_name ='SQLMail_Public_Profile'
End

Reply
Ronald

July 23, 2021 at 11:53 am

Thank you, for your reply. I found the solution. I did not failover the DAG correctly.
Once I executed the following command, the databases were created and sync on the other 2 replicas

alter availability group dag_samdc4892prod3 set (role = secondary)

the error message within the SQL log did not suggest a review of the DAG. – thanks again

Reply
Web William

February 23, 2022 at 3:55 pm

Had this issue last night at 1am! The AG successfully failed over, but none of the databases were syncing. The SQL logs were vague, the dashboard showed when the last logs were hardened. After digging through the cluster log (Powershell: get-clusterlog -uselocaltime) I saw that the internal cluster database was out of sync with 2 of the 3 nodes.

To fix this, within the Failover Cluster Manager on the current primary server, I stopped and restarted the SQL AG role. After the restart, all databases got re-synced and the dashboard showed healthy.

I hope this helps somebody else.

Reply
Geoff Langdon

November 10, 2022 at 10:51 am

With Server 2019 and automatic seeding, the sync will fail if there is a transaction log backup taken on the primary whilst it is seeding. Disabling transaction log backups whilst synchronising seems to resolve the issue.

Reply
John B

June 30, 2023 at 1:57 pm

I had the same issue with the synchronization of the Secondary and Primary not syncing for 2 days. This was a 7TB database so the Backup and restore solution was looking like the only method. Also the Data movement would not resume. So I took a backup of primary database, which took 10 hours, and tried restarting the SQL Server service, anything to avoid this restore… No joy. So, because I could without impacting the use of the primary, Restarted the Secondary server. This resolved my issue and the database was in sync in 3 hours.
Hope this helps others in similar situation.

Reply