This post covers two scenarios
You either created a database, and the sync failed for some reason, or a database stopped syncing. Our setup focuses on one where sync breaks immediately, because whatever it’s my blog post. In order to do that, I set up a script to create a bunch of databases, hoping that one of them would fail. Lucky me, two did! So let’s fix them.

You have to be especially vigilant during initial seeding
Automatic failover can’t happen while databases sync up. The AG dashboard reports an unhealthy state, so failover is manual. The good news is that in the limited test scenarios I checked out, Direct Seeding to Replicas will pick back up when the Primary is back online, but if anything really bad happens to your Primary, that may not be the warmest or fuzziest news.
Here’s our database stuck in a restoring state.

Now let’s look in the error log. Maybe we’ll have something good there. On the Primary…
1 |
Unknown,The mirror database "Crap903" has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database. |
Okie dokie. Good to know. On the Replica, you’ll probably see something like this…
1 |
Automatic seeding of availability database 'Crap903' in availability group 'SQLAG01' failed with an unrecoverable error. Correct the problem then issue an ALTER AVAILABILITY GROUP command to set SEEDING_MODE = AUTOMATIC on the replica to restart seeding. |
Oh, correct the problem. You hear that, guys? Correct the problem.
IF ONLY I’D THOUGHT OF CORRECTING THE PROBLEM.
Sheesh
So what do we do? We can check out the AG dashboard, see a bunch of errors, and then focus in on them.

Alright, let’s see what we can do! We can run a couple magical DBA commands and see what happens.
ALTER DATABASE [Crap903] SET HADR RESUME
ALTER DATABASE [Crap903] SET HADR AVAILABILITY GROUP = SQLAG01;


The two errors were:
Msg 35242, Level 16, State 16, Line 1
Cannot complete this ALTER DATABASE SET HADR operation on database ‘Crap903’.
The database is not joined to an availability group. After the database has joined the availability group, retry the command.
And then
Msg 1412, Level 16, State 211, Line 1
The remote copy of database “Crap903” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.
Interesting! What the heck does that mean? If Brent would give me his number, I’d call and ask. I don’t understand why he won’t give me his number. Well, let’s just kick this back off. We kind of expected that not to work because of the errors we saw in the log before, but it’s worth a shot to avoid taking additional steps.
1 2 3 4 5 |
ALTER AVAILABILITY GROUP [SQLAG01] REMOVE DATABASE [Crap903] GO ALTER AVAILABILITY GROUP [SQLAG01] ADD DATABASE [Crap903] GO |
Right? Wrong. Digging into our DMVs and Extended Events, they’re telling us that a database with that name already exists. What’s really lousy here is that this error doesn’t appear ANYWHERE ELSE. It’s not in the dashboard, it’s not in regular, documented DMVs, nor in the XE health session. It’s only in the undocumented stuff. If you’re going to use this feature, be prepared to do a lot of detective work. Be prepared to cry.


What we have to do is go back, remove the database from the Availability Group again, then drop it from our other Replicas. We can’t just restore over what’s already there. That would break all sorts of laws of physics and whatever else makes front squats harder to do than back squats.
Since our database is in a restoring state, it’s a few steps to recover it, set it to single user so no one does anything dumber than our AG has done, and then drop it.

When we re-add the database to our Availability Group, it should start syncing properly. Lucky for us, it did!

There’s no Tinder for databases.

New features are hard
With direct seeding, you have to be extra careful about named instances and default database creation paths. If you used named instances with default database paths to Program Files, or different drive letters and folder names, this isn’t going to work. You don’t have an option to change those things. SQL expects everything to be there in the same place across all of your Replicas. I learned that the annoying way. Several times. Troubleshooting this was weird because I still can’t track down a root cause as to why anything failed in the first place. For the record, I created 50 databases, and two of them didn’t work for some reason.
Correct the problem. Just correct the problem.
Thanks for reading!
24 Comments. Leave new
I guess this is the result of a “Cloud First” approach. It doesn’t have to be user-friendly at all, it just has to work in the Microsoft Azure environment 🙂
How much for a front squat form check?
Four scoops of dat dere CellTech.
Reading this article has made me feel like binge drinking absinthe, it’s only 9.37am and I don’t even use AGs. Good work, Eric.
hahaha your sarcasm toward MS error messages had me in stitches!
This is literally the funniest blog I’ve ever seen. BY FAR. Bravo.
“You don’t have an option to change those things. SQL expects everything to be there in the same place across all of your Replicas. I learned that the annoying way. Several times.”
This tidbit of critical knowledge is worth all that pain. Thank you so much for finding this out. I was going to start using direct seeding except our server setup uses slightly different naming conventions between servers. It would have been messy without knowing this first.
Thanks! While it may still be an issue, I always encourage people to try things on their own, especially after a few CUs or SPs have rolled out. You never know what may have changed.
I have a feeling this is still a gotcha though.
Can you please add programmatic step to add database to availability group? Since as far as I know we need to provide credentials and network path for the logs.
If you share steps that will be awesome!
Pramod – no, we don’t really do free programming here, sorry.
Struggled with this on the latest version (2016 SP2 with CU2). Not using names instances, and all 3 availability replicas were deployed with the same settings (database files paths and all). Added 2 databases to an AG, hit this issue with the second database (secondary database not joined, files were not created). Cleaned up, ran the same script, all synced ok. Added another 5 databases, the third one failed. Cleaned it up, ran again for the failed one, now synced ok. And these are all small databases, 1GB each. Trace flag 9567 is not enabled.
For me, failure_state was: FAILED 21 Seeding Check Message Timeout
USE [master];
SELECT TOP 100 start_time, completion_time, is_source, current_state, failure_state, failure_state_desc, error_code FROM sys.dm_hadr_automatic_seeding ORDER BY start_time DESC;
SELECT * FROM sys.dm_hadr_physical_seeding_stats;
I also struggled with the same issues until I upgraded (azure VM’s) all of the servers to 8 cores. With 8 cores, all of these issues seem to go away.
I found a secret playing around. I experienced the same crap with failures doing 5 or 3 at a time adding to the AG group. So I said lets make tis really simple here for SQL. The below query is the meat & potatoes for a T-SQL script that first finds only 1 database ( even though there are 1200 waiting ) yes big smoke test and ran it out of threads doing so determining the Azure breaking point with thread exhaustion. SO back to what I was saying, snap in 1 database at a time using a SQL job running on 3-4min intervals seeded 1235 databases until SQL well was quite unresponsive. Script fetches the db name, runs a backup and pass in the AG group very successfully. My company hired a MS guy and he liked my code and ideas here since it worked.
Drum roll fore the smallest but most useful query: We have a utility db we do not to bother with in build which is always #5 dbid.
Select TOP 1 Name from master..sysdatabases
where name not in(
Select db_name(database_id) from sys.dm_hadr_database_replica_states)
and dbid > 5
I also added TDE encryption into this code and had to build a encryption watcher since while it is being encrypted you can not run the backup prior to adding to the AG group. Here is that small but very useful script:
–Monitor the encryption process
Declare @dbid int,@encryptionstate int
Select @dbid = dbid from sysdatabases where name = @dbname
looper:
SELECT @encryptionstate = encryption_state FROM sys.dm_database_encryption_keys
WHERE database_id = @dbid;
Print @encryptionstate
If @encryptionstate != 3
Begin
WAITFOR DELAY ’00:00:06′;
Print ‘Checking for Encryption to completion status’
GOTO looper
End
Else
Begin
Print ‘Encryption Process has Completed moving to adding to the AG Group now.’
end
Edward J Pochinski III
I thought I was the only one thinking the Always On error messages were useless….
“…needs to be in a SYNCHRONIZED state…”
No kidding…
Hey
Alternative for the automatic seeding error
Msg 1412, Level 16, State 211, Line 1
The remote copy of database “Crap903” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.
You can avoid dropping the database (especially when it’s a big database) and fix this problem by applying the missing transactional backups. On my case, the secondary copy of the Distributed Availability Group failed with this error after the seeding successfully completed and resume operation didn’t work.
The steps that i’ve did was to remove the db from AG ; when the db is in restore state apply the transactional backups with no recovery, backups that were taken during the seeding process.
After this operation you will be able to add the db back to the AG by joining the db to the AG.
Make sure when you run the seeding perms each node is primary replica. I had the auto seeding working aross 4 node replica systems flawlessly. I used POSH to auto deploy the perms. Also land the 6 busiest databases first in the AG group. We automated SQL to pickup new databases and slap them into the auto seeding. We also automated if one comes out of sync to remove from all secondary replicas and then re send across. This also worked flawlessly.
CAN YOU PLEASE SHARE YOUR STEPS/SCRIPTS
I can get a bunch of the code up here I’m sure, I need to edit out any company information. I had done much testing, we exhausted worker threads exceedding 1000 databases in 1 Ag group( that was fun). Performed failovers with PowerShell with 400-600 databases and also to poll the 3 replicas and “resume data movement where needed” from failinbg over to a asynch replica. Wicked busy but they had me converted this wonderful technology due to a data issue ( Drum Roll )yes to LOg Shipping. So I now have 1 job as the puppet master and move 1,100 databases using some OLa H backup procedures and my custom T-SQL code. I also built in a healing process for the LS, if out of sync dropped and re established. I will checkj with my company to be sure everytrhing is peachy with me publically posting the code. I’m sure some time next week I can have some sort of posting. I wonder if Brent want to create a page or something so it is not posted in a blog. Stay tuned,
thank you. however, I have a 2019 DAG configuration and here is the error message
Seeding encountered a transient failure, state ‘108’, maximum retries exceeded’
VM1 and VM2 (global) seed however, VM3 and VM4 (forward) did not
What do you think
Hi! I don’t do free production support, unfortunately. You can either click Consulting at the top of the site, or contact Microsoft.
Once the Auto seeding perms are configured correctly this script dispersed a database to 3 replicas. 1 in Azure East and 2 in Azure West was were my 3 were. East Replica was synchronous and West’s were Async. In testing we auto seeded over 1000 databases in a single AG group until yes, no more worker threads available. It was able to failover flawlessly with 400-600. We automated that with POSH and circled back to resume data movements on the new secondaries from the new Primary in West.
CLS
#add the databases to the AG Group Via Auto Seeding
$SQL_Replica1=”SQLName,port”
Write-Host “Performing the backup and seeding of new databases” -fore Magenta
Write-Host “Connecting to : ” $SQL_Replica1 -fore Yellow
Invoke-Sqlcmd -ServerInstance $SQL_Replica1 -verbose -Query ”
–# Purpose – This script will deploy DBs with TDE and Auto seed to the replicas
SET NOCOUNT ON
Declare @sqlstring nvarchar (555),@backuppath varchar(555),@sqlstring1 nvarchar (555),@sqlstring2 nvarchar(555)
DECLARE UserDBNotSyncd INSENSITIVE CURSOR
FOR
Select TOP 1 Name from master..sysdatabases
where name not in(
Select db_name(database_id) from sys.dm_hadr_database_replica_states)
and dbid > 5
FOR READ ONLY
OPEN UserDBNotSyncd
DECLARE @DBName varchar(50)
FETCH NEXT FROM UserDBNotSyncd INTO @DBName
WHILE (@@FETCH_STATUS = 0)
BEGIN
Print ‘Backing up database: ‘ + @DBName
Set @backuppath = ‘\\PhyNodeName\AGshare\’ + @DBName + ‘.bak’
Backup Database @DBName TO DISK = @backuppath
Print ‘Completed Database Backup to staging’
Print’*****Adding new database to AG Group for Auto Seeding: ‘ + @DBName + ‘ ******’
set @sqlstring=’ ALTER AVAILABILITY GROUP AG1 ADD DATABASE [‘ + @DBName + ‘];’
Print @sqlstring
EXECUTE sp_executesql @sqlstring
If @@error = 0
Print ‘Database Database Successfully added: ‘ + @DBName
FETCH NEXT FROM UserDBNotSyncd INTO @DBName
END
CLOSE UserDBNotSyncd
DEALLOCATE UserDBNotSyncd
–cleanup backup staged files
Declare @cmdshellvalue sql_variant
select @cmdshellvalue = value_in_use from sys.configurations where name = ‘xp_cmdshell’
If @cmdshellvalue = 0
Begin
Print ‘Turning on xp_cmdshell,it magically was turned off somehow.’
exec sp_configure ‘show advanced options’,’1′
RECONFIGURE WITH OVERRIDE
exec sp_configure ‘xp_cmdshell’,’1′
RECONFIGURE WITH OVERRIDE
END
waitfor Delay ’00:00:02′
Print ‘Cleaned up the full backup file from AG Prep & Auto Seeding’
exec xp_cmdshell ‘ del \\PhyNodeName\AGshare\*.bak’
Print ”
Print ‘The process has completed, refresh the AG Dash Board’
” -querytimeout ([int]::MaxValue)
Write-Host “Completed Seeding Process” -fore Green
–Worried about worker thread issues this worked great
–Author – Ed Pochinski 04232019
–Purpose – AG Worker thread alert
–SQL Server Central posting copy
Declare @availableThreads varchar(25), @workers_waitingForCPU varchar(25), @Request_waiting_ForThread varchar(25),@msgbody varchar(500), @msgsubject varchar(500)
Select @availableThreads = (select max_workers_count from sys.dm_os_sys_info)-sum(active_Workers_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @availableThreads
Select @workers_waitingForCPU = sum(runnable_tasks_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @workers_waitingForCPU
Select @Request_waiting_ForThread = sum(work_queue_count) from sys.dm_os_Schedulers where status=’VISIBLE ONLINE’
Print @Request_waiting_ForThread
–edit threshold to accomodate this is a test machine with no load so the humber is high to facilitate the alert
If @availableThreads <=50
Begin
Print ' Send a email alert here !!!!! Worker threads are low: ' + @availableThreads + ', Workers Waiting on CPU: ' + @workers_waitingForCPU + ' \ Requests Waiting for thread: ' + @Request_waiting_ForThread
set @msgbody ='Worker threads are low on Server: ' + @@servername + ' – Available Thread Count: ' + @availableThreads
set @msgbody = @msgbody + char(13);
set @msgbody = @msgbody + ' – Worker Threads Waiting on CPU count: ' + @workers_waitingForCPU + char(10);
set @msgbody = @msgbody + ' – Requests Waiting for worker thread count: ' + @Request_waiting_ForThread
set @msgsubject = '!!! AG Worker Thread Alert on SQL Server: ' + @@servername + ' !!!'
EXEC msdb.dbo.sp_send_dbmail
@recipients =N'email@somewhere.com',
@body = @msgbody,
@subject =@msgsubject ,
@profile_name ='SQLMail_Public_Profile'
End
Thank you, for your reply. I found the solution. I did not failover the DAG correctly.
Once I executed the following command, the databases were created and sync on the other 2 replicas
alter availability group dag_samdc4892prod3 set (role = secondary)
the error message within the SQL log did not suggest a review of the DAG. – thanks again
Had this issue last night at 1am! The AG successfully failed over, but none of the databases were syncing. The SQL logs were vague, the dashboard showed when the last logs were hardened. After digging through the cluster log (Powershell: get-clusterlog -uselocaltime) I saw that the internal cluster database was out of sync with 2 of the 3 nodes.
To fix this, within the Failover Cluster Manager on the current primary server, I stopped and restarted the SQL AG role. After the restart, all databases got re-synced and the dashboard showed healthy.
I hope this helps somebody else.
With Server 2019 and automatic seeding, the sync will fail if there is a transaction log backup taken on the primary whilst it is seeding. Disabling transaction log backups whilst synchronising seems to resolve the issue.