Sunday, February 22, 2015

Meditations on backup and restore...

Ohhhhmmmmmmm... Ohhhhmmmmmmm...

Okay, bad joke.

Backups are a wonderful thing. If only life let you restore to a point in time backup the way a Full Recovery Model database might allow!

Backups and restores are generally easy to deal with and think about, but there are some tricky bits, so let's think about each of them.

The Too Long, Didn't Read summary:

Simple recovery mode: Nice and easy. You can do full backups, or differential. You can restore full backups or differentials. It can get complicated when you learn about "partial" backups - that will be another blog post, I hope! You can also do COPY_ONLY backups, and these maintain the differential base - vital for making sure no one is confused about which differential goes with which full, when making regularly scheduled backups.

Full: You must take log backups, or your log file will not truncate. If you every need to truncate your log file without taking a backup, you really need to consider using Simple recovery model instead! You can take full (including COPY_ONLY), differential, or transaction log (including COPY_ONLY) backups. You must pay attention to the log chain - mostly, a broken, or altered, log chain. You can restore a full, differential, or COPY_ONLY backup - and COPY_ONLY maintains its location in the log chain, and the differential base. You can restore from any one point in time to any other, so long as you have a full, or full+differential (or, technically, full+differential+log chain) to the first time, and a continuous (i.e., unbroken/unaltered) log chain to the second.

With Full, and an unbroken chain, you can perform miracles - you can restore the whole database to any point in time covered by the full and the log chain; you can restore filegroups, individual files, or even corrupted pages, and sometimes, do these things while the database remains online (though obviously, for file/filegroup restores, the affected filegroup will be offline) Note that this is separate from piecemeal backups and restores - you can restore a single damaged file from a full backup (optionally: plus differential), plus the correct log chain. You can also do this from a piecemeal backup, of course.

Bulk-Logged: Saves you space in the transaction log, but not in the backups, if you use "bulk" operations. Is precisely the same as Full if you perform no bulk operations; if you perform a bulk operation, any log backups taken from start to finish of the bulk operation may be all-or-nothing when restoring. As a best practice, consider staying in Bulk-Logged only when you know you should be, and immediately switching back, and closing the log chain, with a fresh full, differential, or transaction log backup.

And now: on to the post!

First, remember, SQL Server has 3 recovery models: Simple, Bulk-Logged, and Full.

In Simple recovery model, the transaction log truncates each time a checkpoint is run (either manually - you can issue a CHECKPOINT as a T-SQL command - or automatically). And you can take Full backups, and Differential backups - you can't take a log backup, and SQL would have no way to use it if you could. You can also take a COPY_ONLY backup to maintain your differential base.

"Differential base?" you ask. Okay: When you take a full backup, SQL Server makes changes to the database - it clears the differential bits that mark which extents have been modified. ("Extent"? SQL uses files in 8kb pages, and groups 8 8kb pages into one 64kb extent.) At this point, you have a new differential base. If you take a new differential the instant this process finishes, your differential backup should be essentially empty - header information only, with a record that no changes had been made.

Let's pretend you take your full backups each Sunday. This sets your differential base. You might think this is silly, but - you want to maintain that differential base until next Sunday, and yes, it really is important. Why? Because in emergencies, people are pumped up on adrenaline. And this is well established: when you're hyped on adrenaline, you are better at doing things that you've drilled in and are good at doing, and worse at things where you need to improvise. So you don't want anyone to have to remember that there's a new full backup, nor guess where to find it, nor have to compare file date/timestamps, and get them right. You want them to be able to do everything one particular way, and it should be a way you've practiced. (Yes, Mr. or Ms. Manager, on company time; nothing encourages good practice like being paid for something other than the daily grind. Do this, and you'll all be heroes when disaster strikes, and believe me, it will.)

So: you can restore a full backup, and that puts your database in a consistent state as of the time the backup completed. If you restore WITH NORECOVERY, you can also restore a differential backup on top of that full backup, which will be consistent as of the point in time that backup completed. That's pretty much all you can do with Simple recovery model, if you're not using partial backups and restores.

If, for some reason, you needed a copy of your database, you have some choices. First, you could take a new full backup. If you're doing fulls, plus differentials, this is a bad idea, unless it can replace your next scheduled full backup. Again: remember, if you need to restore in an emergency, people will be hyped - don't make them freak out when they find tomorrow's differential doesn't restore to the full backup taken at it's normal schedule, because someone took another full backup.

A better option is to take a fresh differential, and use your previously scheduled full, plus the new differential, to do whatever you need to do. But SQL Server has a nifty option to create a COPY_ONLY backup - this is the equivalent of a full + differential in one single file. And, as I said, this doesn't reset the differential base. I have tried to restore a differential taken after a COPY_ONLY backup, on top of the COPY_ONLY - it worked. It should work - it's the same differential base, and a later differential backup can be added to the restore chain if needed. So you can also restore a full backup, a differential backup, and then (assuming the first differential was restored WITH NORECOVERY) you can restore another, later, differential backup. But you shouldn't - the second differential backup contained all the same information as the first, so it was a waste of time to restore both.

(If you're a bit like me, once you heard that you 'shouldn't' restore a second differential, you started to imagine circumstances in which it might be necessary. That's fine - one example is if you're 90% sure you need Monday's backup, but you're checking records to see if maybe you need Tuesday's; and checking records takes longer than the restore. If that's the case, go ahead! This is good to think about; puzzles are great! But always remember that puzzles are puzzles and in the real world you'll almost always want one full, one differential, and that's it.)

(What do you do if you restore a full backup, restore Monday's backup WITH NORECOVERY, and find you don't need Tuesday's - how do you get the database online? You run RESTORE DATABASE My_Database_Name WITH RECOVERY -- except, of course, you replace My_Database_Name with the name of the database you've restored.)

One final bit of interesting knowledge: If you switch from Simple recovery model to Full recovery model, SQL Server still doesn't have any basis upon which to restore a log backup. None of your previous backups include enough log file information for a newly taken log backup to restore. So SQL Server will continue to truncate the log file at every checkpoint, until you establish a backup that will allow restore of a transaction log file. If you've ever taken a full backup of the database, you can just take a differential backup (though if your full backup was in the far-back past, you might as well take a full backup instead). If you've never taken a full backup, you'll need to take a full backup - obviously! - to allow log backups.

I saw at least one online resource state this incorrectly - they claimed that a Full recovery database acts like Simple until the first full database backup. This is incorrect! What probably happened was that person had their Model database set to Simple, created a new database without explicitly setting it to Full recovery model, and then switched it from Simple to Full - which triggered the behavior. If you create a new database and it starts existence in Full recovery model, it will not truncate its log until you take a full backup, and then the first transaction log backup. But if you create a database in Simple, and switch to Full, the log will truncate on checkpoint until the first full backup is taken.

Now, let's talk about Full recovery. In Full recovery model (and in Bulk-Logged), you can take full backups, and differential backups, just like in Simple recovery model, and you can take COPY_ONLY backups which don't change the differential base, and don't affect log backups. You can take a COPY_ONLY full backup, and a COPY_ONLY transaction backup, but not a COPY_ONLY differential - there would be no point, a differential doesn't reset anything! You also can, and "must", take log file backups to allow your log file to truncate - that is, to re-use space inside the LDF file for new transactions. Now, I said you "must" but there are other ways to truncate a log file - and you should never use them. "Never"? Well, not "never", quite. It's just - it's one of the most ridiculous things I've seen in my time as a DBA, and as a troubleshooter. People hear wonderful things about how you have all these great recovery options with full recovery model - but they don't plan for running the log backups, or having enough space to maintain a reasonable log chain. Sometimes it's not their fault - they need to use log shipping, mirroring, Availability Groups, or some other technology that mandates Full recovery model. But in other cases, there's no excuse. If you are in Full recovery model, it's a statement that what's happening in your database is important, over time, and you need to save it over time, so you save that information in log backups. Or you use Simple recovery model, which is far easier to use, and in many cases, more efficient, than Full recovery model.

I suppose I might sound a bit over-excited on this issue, and that's not entirely unfair. It's just - the longer I've worked in IT, the more appreciation I've gained for good design; good design means making the right choices, and if you're going to break that precious log backup chain other than in the most unexpected of emergencies, you should probably be using Simple recovery model as the correct tool for your task. And I've seen far too many people use Full when Simple would have been a better choice for them.

If you stop and think about it, SQL Server database's transaction log is an amazing thing. A huge number of things could be happening to a database - one or more files could be growing, or shrinking, or new files being added; it could be completely quiesced, or running wild with activity, including massive changes to both the contents and structure of the database, and all of the data in the buffer pool is completely volatile, and may have been changed, or not changed, or in the process of being changed and the transaction log tracks everything you need so that if something were to go horribly wrong at any moment, the log file has the information needed to put things right.

When you're in full recovery model, you save that recording of what has happened in the transaction log, until you back it up. (You may save the data until other things happen too - but for now, let's just think about backups.) Unless something breaks the log chain, you can restore from a good full backup, and a continuous string of transaction log backups, to any instant of time. Someone deleted a table at 5pm Friday? You could restore the database to 4:59:59, or (if you only realized this at 6:30!) restore a copy of the database to the same time, to grab just the data in the table.

I mentioned "restoring a good backup" - let me pause for a moment. When is a backup good? Well, when you've restored it, and run a DBCC CHECKDB on the database, you can now say "that backup was good when we restored it." I'm not just playing word games here - always remember that any magnetic media can fail, so you should test your backups once in a while, and the only test that's truly meaningful is a restore that gives you a healthy database. Don't be too paranoid - magnetic media is pretty darn reliable these days - but never forget that you never have complete certainty. If your data is valuable enough, you might want alternate recovery methods - log shipping, replication, mirroring, or availability groups can allow you to have (potentially) offsite, live copies of your data, because backups can fail to restore!

A more common issue than "a good backup" is a break of the log chain. What breaks the transaction log chain? Obviously, a restore from backup, that restores to a point in time prior to the last log backup, breaks the log chain as of that point in time. Switching to SIMPLE recovery model breaks the transaction log chain. God-help-me, backing up the transaction log with TRUNCATE_ONLY will break the log chain. Never backup with TRUNCATE_ONLY - it was removed after SQL Serer 2008R2, thank heaven! Changing to Simple is a better method for truncating a log file because it introduces a clear, obvious break in the chain. (Remember, as mentioned earlier, once you switch back to Full, you need a backup to re-seed your log backup chain - until you take that backup, you'll continue to truncate your log on checkpoint!)

Restoring the database to a snapshot breaks the chain - and it's also the only way I know of to stop a a massive recovery/rollback operation with a guarantee of a consistent state for the database. (Of course, it's rare that anyone is fortunate enough to have taken a snapshot just prior to discovering a need to stop a massive recovery operation.)

Two things that merit special mention: a damaged transaction log backup breaks the log chain, obviously. And an out-of-band transaction log backup breaks the chain, unless it is present and accounted for when someone is trying to restore. So if someone is setting up mirroring, say, and takes a log backup to prime the mirror, and that log backup isn't put in the normal location (hopefully with the same naming schema), then you can run into a broken log chain. This is needless - SQL Server allows you to take a COPY_ONLY log backup that can be used to prime a mirror or for any other use for a transaction log backup (besides the obvious - truncating the transaction log!).

Even without a convenience like a COPY_ONLY backup, it's always best to use your standard log backup routine, or to use a method that duplicates the results of those methods - same log backup location, same naming schema - where possible. In an emergency, you don't want people to find a gap in the log backup chain - and you don't want them to have difficulties scripting out the restore, if one (or worse, a few!) files have different naming schema. Availability Groups add a new twist - you can take a log backup from any availability group replica (yes, even one that's in asynchronous commit mode, so long as it's in a 'synchronizing' state). This is one of the primary reasons I call out "same location". You wouldn't want to have 3 availability group replicas, and your log files scattered to three different locations!

For a good riddle answer, a database that's gone into suspect mode, and has had its log rebuilt, has a broken log chain (obviously), but DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS does not break the log chain... it's fully logged! Of course, both of these situations quite naturally lead to the question "what ridiculous circumstances must we envision where it's sensible to run REPAIR_ALLOW_DATA_LOSS when we have a complete log chain we can worry about breaking? RESTORE FROM BACKUP!" If your database is corrupt, restore from backup. REPAIR_ALLOW_DATA_LOSS is always your last-gasp hope and prayer.

Okay: Good backups, and log chains. What's next? Restores, I suppose.

You can restore a full backup - you can also restore a differential backup. You can restore a COPY_ONLY backup, and it retains a differential base, so you can restore a later differential on top of a COPY_ONLY. You can restore a string of log backups that are part of a log chain, so long as your earliest log backup links to the full (or COPY_ONLY), or differential, that you restored, and you don't have a broken chain. The differential base's changing is logged, so if you lost this week's full backup, but had a log chain stretching to last week's, you could restore past the time the full backup completed, and then restore a differential on top of it. You can restore to a point in time . (This requires a STOPAT, STOPATMARK, or STOPBEFOREMARK - you can, and probably should, use these options on the full and differential backup. You can't restore a full or differential to a point in time, but the STOPAT options will prevent a restore that takes you past your point in time. You can also put these commands on all of your log backups, no matter how far back they go - this can be a good practice for the exact same reason.)

You can restore individual files, or entire filegroups, after taking a tail of the log backup, if you have the required log chain: You can even restore just damaged pages in a corrupted database with a good log chain and a tail of the log backup. Note: when I first heard of this option, I had assumed - incorrectly! - that you needed a file-level or filegroup level backup. You don't! You can restore a file, or filegroup, from a Full backup (or full+differential), if you have the correct log chain to make it consistent.

The key to all of this magic is the transaction log chain - it tracks all of the changes to SQL Server in such a way that it can make all the right changes, and ensure that the database is in a consistent state. So you see, it can be extremely useful to make sure you have a good, unbroken log chain. The transaction log has some other bits of magic to it - I think I'll save those for another post.

Finally: Bulk-Logged recovery model. Bulk-logged is almost like Full recovery model, except if and when you have a "bulk" - technically, a minimally logged - operation. During these bulk operations, if you're in Bulk-Logged recovery, SQL will not log all of the data changes to the transaction log - it will use a special form of minimal logging that can greatly increase your throughput in the minimally logged transaction, and prevent your log file from growing (or growing as much as Full recovery would require). While you save space in your log file, you will not save space in your log file backups! What the log file tracks are the extents changed by bulk operations, and it dumps all of those into the log file backup when it runs. Not only does this make your log file backup unexpectedly large compared to the size of the LDF file, it also explains why you can't restore to a point-in-time from a log file backup of this nature. The log backup can't make the database consistent at any point in time except the end of the log backup.

This leads us to a tricky bit regarding Bulk-Logged recovery model. Let's say you take a log backup at 8pm, and another at 8:15pm. If all of your minimally logged operations ended by 8pm, and none started until after 8:15pm (technically: until after the 8:15 log backup completed...), your log backup should be precisely the same as a Full recovery model log backup. You should be able to restore to a point in time. I have tested this in SQL Server 2008R2. It worked. And let me tell you something: if you count on this behavior, someone will have run a short SELECT INTO or an INSERT...SELECT, or an index rebuild, that was minimally logged. You can count on that - if you don't abide by Murphy's Law, you really need to work in something other than IT! So, as a best practice: set Bulk-Logged when you intend to use it, and then switch back to Full. If you choose to keep it in Bulk-Logged, always assume you can only backup to the end of any transaction log backup - but in an emergency, you can still try a STOPAT restore.

Obviously, if you get to know me, you'll know I stress certainty and clarity a lot of the time - so, while you can try a STOPAT restore on a Bulk-Logged database, you don't know what will happen. This is why I strongly agree with the best practice of using Bulk-Logged only when you expect to need it. And then, once you're done, switch back to Full, and run a backup - usually a differential or log backup, but if you've just loaded a huge dataset, maybe you want a full backup instead! - so that you're able to restore to this point in time - the point in time at which you once again have confidence that you can use any and all magic of your log file backups again.

No comments:

Post a Comment