Wednesday, September 23, 2015

Interesting issue with log file shrinks...

So, I was looking at this fine blog entry from Lisa Gardner, one of Microsoft's PFEs. And I decided to do a bit of futzing around with it.

If you're too lazy to follow the link (or if you can't see where the link goes and won't follow a link blindly - GOOD FOR YOU!), there's an undocumented command called DBCC LOGINFO in SQL Server. It will show you how many VLFs you have, and show you a nominal status for each of them - are they free, or in use?

I hope you all know that you don't want too many VLFs - less than a hundred is good, and more than a hundred only if you absolutely, positively have no other reasonable choice. Too many VLFs can slow down recovery(which includes database startup/availability when SQL starts!), transactional replication, mirroring/Availability Groups, and probably more that I can't think of. For all that it's undocumented, it's a good command.

Well, as Lisa points out, when you're using an availability group, DBCC LOGINFO has an interesting quirk.

Normally, when you run a log backup, DBCC LOGINFO will show all of the truncated VLFs as having a status of 0 - not in use. Lisa pointed out that if the server is in an availability group, while the log might be truncated (i.e., the VLFs are available for re-use) DBCC LOGINFO might still show them as having a status of 2 (in use).

Let's step back a bit to make sure we're clear on this. The Transaction Log File of a database is like a whiteboard. Each VLF is its own discrete section of whiteboard. When you "truncate" the log, all VLFs that are freed up by the process are now erased - clean, and clear, and ready to accept new data. These VLFs are still there; the log file is still the same size. But now the log file has more empty space (in the form of VLFs) to use. (This is why I use the whiteboard metaphor - erasing a section of whiteboard doesn't make the whiteboard smaller, it makes the blank space bigger.)

DBCC LOGINFO normally tells you which VLFs are at least partially in use, and which ones are blank.

But when the database is in an availability group, it may not be honest. You may have space (re-usable VLFs) but DBCC LOGINFO might show them as in use, status of 2.

Well, I ran into an oddball problem. On an availability group, I tried to shrink a log file - and it wouldn't. (Yes, I know. It had been bloated by one-time operations.) It had been backed up - I even triggered the backup 2-3 times. And DBCC LOGINFO showed all VLFs in use. DBCC SQLPERF(LOGSPACE) (which shows all databases, log file sizes, and percentage of log files in use) showed nearly nothing, less than 1%, of the log file was in use. So, I'm comfortable that I ran into this DBCC LOGINFO display strangeness.

When this happens, the usual trick is to run the log backup once or twice to truncate just a bit more data. I did that - and I still couldn't shrink the log, and DBCC LOGINFO still showed all VLFs in use.

In this case, I figured out what was happening. My log backups were all occurring on the secondary replica of the database. As soon as I took a log backup on the primary replica, DBCC LOGINFO showed almost all VLFs clear, and I could shrink the log file back to its normal size.

What's going on here? I'm not sure, but it would make sense if DBCC SHRINKFILE uses DBCC LOGINFO to find out which VLFs are not in use. When you shrink a log file, it must shrink on VLF boundaries. If it's using DBCC LOGINFO to find inactive VLFs, it might not be finding any. And that might be causing the problem.

It looks like a backup on the primary replica might fix this. I also saw someone fix it with several log backups, and several checkpoints - the log backups were on the secondary replica, and, of course, the checkpoints were issued on the primary. Some combination of Secondary Replica log backups, and Primary replica checkpoints cleared the DBCC LOGINFO VLF status values - and allowed the file to shrink.

One warning: if you do choose to take a log backup on your primary replica, when you normally run log backups on your secondary, I very strongly recommend that you move or copy the log backup to the same location as your normal log backups. When you have to restore log file backups, you don't want to have to dig around in multiple locations - you want them all in the same place!

No comments:

Post a Comment