Ohhhhmmmmmmm... Ohhhhmmmmmmm...
Okay, bad joke.
Backups are a wonderful thing. If only life let you restore to a point in time backup the way a Full Recovery Model database might allow!
Backups and restores are generally easy to deal with and think about, but there are some tricky bits, so let's think about each of them.
The Too Long, Didn't Read summary:
Simple recovery mode: Nice and easy. You can do full backups, or differential. You can restore full backups or differentials. It can get complicated when you learn about "partial" backups - that will be another blog post, I hope! You can also do COPY_ONLY backups, and these maintain the differential base - vital for making sure no one is confused about which differential goes with which full, when making regularly scheduled backups.
Full: You must take log backups, or your log file will not truncate. If you every need to truncate your log file without taking a backup, you really need to consider using Simple recovery model instead! You can take full (including COPY_ONLY), differential, or transaction log (including COPY_ONLY) backups. You must pay attention to the log chain - mostly, a broken, or altered, log chain. You can restore a full, differential, or COPY_ONLY backup - and COPY_ONLY maintains its location in the log chain, and the differential base. You can restore from any one point in time to any other, so long as you have a full, or full+differential (or, technically, full+differential+log chain) to the first time, and a continuous (i.e., unbroken/unaltered) log chain to the second.
With Full, and an unbroken chain, you can perform miracles - you can restore the whole database to any point in time covered by the full and the log chain; you can restore filegroups, individual files, or even corrupted pages, and sometimes, do these things while the database remains online (though obviously, for file/filegroup restores, the affected filegroup will be offline) Note that this is separate from piecemeal backups and restores - you can restore a single damaged file from a full backup (optionally: plus differential), plus the correct log chain. You can also do this from a piecemeal backup, of course.
Bulk-Logged: Saves you space in the transaction log, but not in the backups, if you use "bulk" operations. Is precisely the same as Full if you perform no bulk operations; if you perform a bulk operation, any log backups taken from start to finish of the bulk operation may be all-or-nothing when restoring. As a best practice, consider staying in Bulk-Logged only when you know you should be, and immediately switching back, and closing the log chain, with a fresh full, differential, or transaction log backup.
And now: on to the post!
First, remember, SQL Server has 3 recovery models: Simple, Bulk-Logged, and Full.
In Simple recovery model, the transaction log truncates each time a checkpoint is run (either manually - you can issue a CHECKPOINT as a T-SQL command - or automatically). And you can take Full backups, and Differential backups - you can't take a log backup, and SQL would have no way to use it if you could. You can also take a COPY_ONLY backup to maintain your differential base.
"Differential base?" you ask. Okay: When you take a full backup, SQL Server makes changes to the database - it clears the differential bits that mark which extents have been modified. ("Extent"? SQL uses files in 8kb pages, and groups 8 8kb pages into one 64kb extent.) At this point, you have a new differential base. If you take a new differential the instant this process finishes, your differential backup should be essentially empty - header information only, with a record that no changes had been made.
Let's pretend you take your full backups each Sunday. This sets your differential base. You might think this is silly, but - you want to maintain that differential base until next Sunday, and yes, it really is important. Why? Because in emergencies, people are pumped up on adrenaline. And this is well established: when you're hyped on adrenaline, you are better at doing things that you've drilled in and are good at doing, and worse at things where you need to improvise. So you don't want anyone to have to remember that there's a new full backup, nor guess where to find it, nor have to compare file date/timestamps, and get them right. You want them to be able to do everything one particular way, and it should be a way you've practiced. (Yes, Mr. or Ms. Manager, on company time; nothing encourages good practice like being paid for something other than the daily grind. Do this, and you'll all be heroes when disaster strikes, and believe me, it will.)
So: you can restore a full backup, and that puts your database in a consistent state as of the time the backup completed. If you restore WITH NORECOVERY, you can also restore a differential backup on top of that full backup, which will be consistent as of the point in time that backup completed. That's pretty much all you can do with Simple recovery model, if you're not using partial backups and restores.
If, for some reason, you needed a copy of your database, you have some choices. First, you could take a new full backup. If you're doing fulls, plus differentials, this is a bad idea, unless it can replace your next scheduled full backup. Again: remember, if you need to restore in an emergency, people will be hyped - don't make them freak out when they find tomorrow's differential doesn't restore to the full backup taken at it's normal schedule, because someone took another full backup.
A better option is to take a fresh differential, and use your previously scheduled full, plus the new differential, to do whatever you need to do. But SQL Server has a nifty option to create a COPY_ONLY backup - this is the equivalent of a full + differential in one single file. And, as I said, this doesn't reset the differential base. I have tried to restore a differential taken after a COPY_ONLY backup, on top of the COPY_ONLY - it worked. It should work - it's the same differential base, and a later differential backup can be added to the restore chain if needed. So you can also restore a full backup, a differential backup, and then (assuming the first differential was restored WITH NORECOVERY) you can restore another, later, differential backup. But you shouldn't - the second differential backup contained all the same information as the first, so it was a waste of time to restore both.
(If you're a bit like me, once you heard that you 'shouldn't' restore a second differential, you started to imagine circumstances in which it might be necessary. That's fine - one example is if you're 90% sure you need Monday's backup, but you're checking records to see if maybe you need Tuesday's; and checking records takes longer than the restore. If that's the case, go ahead! This is good to think about; puzzles are great! But always remember that puzzles are puzzles and in the real world you'll almost always want one full, one differential, and that's it.)
(What do you do if you restore a full backup, restore Monday's backup WITH NORECOVERY, and find you don't need Tuesday's - how do you get the database online? You run RESTORE DATABASE My_Database_Name WITH RECOVERY -- except, of course, you replace My_Database_Name with the name of the database you've restored.)
One final bit of interesting knowledge: If you switch from Simple recovery model to Full recovery model, SQL Server still doesn't have any basis upon which to restore a log backup. None of your previous backups include enough log file information for a newly taken log backup to restore. So SQL Server will continue to truncate the log file at every checkpoint, until you establish a backup that will allow restore of a transaction log file. If you've ever taken a full backup of the database, you can just take a differential backup (though if your full backup was in the far-back past, you might as well take a full backup instead). If you've never taken a full backup, you'll need to take a full backup - obviously! - to allow log backups.
I saw at least one online resource state this incorrectly - they claimed that a Full recovery database acts like Simple until the first full database backup. This is incorrect! What probably happened was that person had their Model database set to Simple, created a new database without explicitly setting it to Full recovery model, and then switched it from Simple to Full - which triggered the behavior. If you create a new database and it starts existence in Full recovery model, it will not truncate its log until you take a full backup, and then the first transaction log backup. But if you create a database in Simple, and switch to Full, the log will truncate on checkpoint until the first full backup is taken.
Now, let's talk about Full recovery. In Full recovery model (and in Bulk-Logged), you can take full backups, and differential backups, just like in Simple recovery model, and you can take COPY_ONLY backups which don't change the differential base, and don't affect log backups. You can take a COPY_ONLY full backup, and a COPY_ONLY transaction backup, but not a COPY_ONLY differential - there would be no point, a differential doesn't reset anything! You also can, and "must", take log file backups to allow your log file to truncate - that is, to re-use space inside the LDF file for new transactions. Now, I said you "must" but there are other ways to truncate a log file - and you should never use them. "Never"? Well, not "never", quite. It's just - it's one of the most ridiculous things I've seen in my time as a DBA, and as a troubleshooter. People hear wonderful things about how you have all these great recovery options with full recovery model - but they don't plan for running the log backups, or having enough space to maintain a reasonable log chain. Sometimes it's not their fault - they need to use log shipping, mirroring, Availability Groups, or some other technology that mandates Full recovery model. But in other cases, there's no excuse. If you are in Full recovery model, it's a statement that what's happening in your database is important, over time, and you need to save it over time, so you save that information in log backups. Or you use Simple recovery model, which is far easier to use, and in many cases, more efficient, than Full recovery model.
I suppose I might sound a bit over-excited on this issue, and that's not entirely unfair. It's just - the longer I've worked in IT, the more appreciation I've gained for good design; good design means making the right choices, and if you're going to break that precious log backup chain other than in the most unexpected of emergencies, you should probably be using Simple recovery model as the correct tool for your task. And I've seen far too many people use Full when Simple would have been a better choice for them.
If you stop and think about it, SQL Server database's transaction log is an amazing thing. A huge number of things could be happening to a database - one or more files could be growing, or shrinking, or new files being added; it could be completely quiesced, or running wild with activity, including massive changes to both the contents and structure of the database, and all of the data in the buffer pool is completely volatile, and may have been changed, or not changed, or in the process of being changed and the transaction log tracks everything you need so that if something were to go horribly wrong at any moment, the log file has the information needed to put things right.
When you're in full recovery model, you save that recording of what has happened in the transaction log, until you back it up. (You may save the data until other things happen too - but for now, let's just think about backups.) Unless something breaks the log chain, you can restore from a good full backup, and a continuous string of transaction log backups, to any instant of time. Someone deleted a table at 5pm Friday? You could restore the database to 4:59:59, or (if you only realized this at 6:30!) restore a copy of the database to the same time, to grab just the data in the table.
I mentioned "restoring a good backup" - let me pause for a moment. When is a backup good? Well, when you've restored it, and run a DBCC CHECKDB on the database, you can now say "that backup was good when we restored it." I'm not just playing word games here - always remember that any magnetic media can fail, so you should test your backups once in a while, and the only test that's truly meaningful is a restore that gives you a healthy database. Don't be too paranoid - magnetic media is pretty darn reliable these days - but never forget that you never have complete certainty. If your data is valuable enough, you might want alternate recovery methods - log shipping, replication, mirroring, or availability groups can allow you to have (potentially) offsite, live copies of your data, because backups can fail to restore!
A more common issue than "a good backup" is a break of the log chain. What breaks the transaction log chain? Obviously, a restore from backup, that restores to a point in time prior to the last log backup, breaks the log chain as of that point in time. Switching to SIMPLE recovery model breaks the transaction log chain. God-help-me, backing up the transaction log with TRUNCATE_ONLY will break the log chain. Never backup with TRUNCATE_ONLY - it was removed after SQL Serer 2008R2, thank heaven! Changing to Simple is a better method for truncating a log file because it introduces a clear, obvious break in the chain. (Remember, as mentioned earlier, once you switch back to Full, you need a backup to re-seed your log backup chain - until you take that backup, you'll continue to truncate your log on checkpoint!)
Restoring the database to a snapshot breaks the chain - and it's also the only way I know of to stop a a massive recovery/rollback operation with a guarantee of a consistent state for the database. (Of course, it's rare that anyone is fortunate enough to have taken a snapshot just prior to discovering a need to stop a massive recovery operation.)
Two things that merit special mention: a damaged transaction log backup breaks the log chain, obviously. And an out-of-band transaction log backup breaks the chain, unless it is present and accounted for when someone is trying to restore. So if someone is setting up mirroring, say, and takes a log backup to prime the mirror, and that log backup isn't put in the normal location (hopefully with the same naming schema), then you can run into a broken log chain. This is needless - SQL Server allows you to take a COPY_ONLY log backup that can be used to prime a mirror or for any other use for a transaction log backup (besides the obvious - truncating the transaction log!).
Even without a convenience like a COPY_ONLY backup, it's always best to use your standard log backup routine, or to use a method that duplicates the results of those methods - same log backup location, same naming schema - where possible. In an emergency, you don't want people to find a gap in the log backup chain - and you don't want them to have difficulties scripting out the restore, if one (or worse, a few!) files have different naming schema. Availability Groups add a new twist - you can take a log backup from any availability group replica (yes, even one that's in asynchronous commit mode, so long as it's in a 'synchronizing' state). This is one of the primary reasons I call out "same location". You wouldn't want to have 3 availability group replicas, and your log files scattered to three different locations!
For a good riddle answer, a database that's gone into suspect mode, and has had its log rebuilt, has a broken log chain (obviously), but DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS does not break the log chain... it's fully logged! Of course, both of these situations quite naturally lead to the question "what ridiculous circumstances must we envision where it's sensible to run REPAIR_ALLOW_DATA_LOSS when we have a complete log chain we can worry about breaking? RESTORE FROM BACKUP!" If your database is corrupt, restore from backup. REPAIR_ALLOW_DATA_LOSS is always your last-gasp hope and prayer.
Okay: Good backups, and log chains. What's next? Restores, I suppose.
You can restore a full backup - you can also restore a differential backup. You can restore a COPY_ONLY backup, and it retains a differential base, so you can restore a later differential on top of a COPY_ONLY. You can restore a string of log backups that are part of a log chain, so long as your earliest log backup links to the full (or COPY_ONLY), or differential, that you restored, and you don't have a broken chain. The differential base's changing is logged, so if you lost this week's full backup, but had a log chain stretching to last week's, you could restore past the time the full backup completed, and then restore a differential on top of it. You can restore to a point in time . (This requires a STOPAT, STOPATMARK, or STOPBEFOREMARK - you can, and probably should, use these options on the full and differential backup. You can't restore a full or differential to a point in time, but the STOPAT options will prevent a restore that takes you past your point in time. You can also put these commands on all of your log backups, no matter how far back they go - this can be a good practice for the exact same reason.)
You can restore individual files, or entire filegroups, after taking a tail of the log backup, if you have the required log chain:https://msdn.microsoft.com/en-us/library/aa337540.aspx. You can even restore just damaged pages in a corrupted database with a good log chain and a tail of the log backup. Note: when I first heard of this option, I had assumed - incorrectly! - that you needed a file-level or filegroup level backup. You don't! You can restore a file, or filegroup, from a Full backup (or full+differential), if you have the correct log chain to make it consistent.
The key to all of this magic is the transaction log chain - it tracks all of the changes to SQL Server in such a way that it can make all the right changes, and ensure that the database is in a consistent state. So you see, it can be extremely useful to make sure you have a good, unbroken log chain. The transaction log has some other bits of magic to it - I think I'll save those for another post.
Finally: Bulk-Logged recovery model. Bulk-logged is almost like Full recovery model, except if and when you have a "bulk" - technically, a minimally logged - operation. During these bulk operations, if you're in Bulk-Logged recovery, SQL will not log all of the data changes to the transaction log - it will use a special form of minimal logging that can greatly increase your throughput in the minimally logged transaction, and prevent your log file from growing (or growing as much as Full recovery would require). While you save space in your log file, you will not save space in your log file backups! What the log file tracks are the extents changed by bulk operations, and it dumps all of those into the log file backup when it runs. Not only does this make your log file backup unexpectedly large compared to the size of the LDF file, it also explains why you can't restore to a point-in-time from a log file backup of this nature. The log backup can't make the database consistent at any point in time except the end of the log backup.
This leads us to a tricky bit regarding Bulk-Logged recovery model. Let's say you take a log backup at 8pm, and another at 8:15pm. If all of your minimally logged operations ended by 8pm, and none started until after 8:15pm (technically: until after the 8:15 log backup completed...), your log backup should be precisely the same as a Full recovery model log backup. You should be able to restore to a point in time. I have tested this in SQL Server 2008R2. It worked. And let me tell you something: if you count on this behavior, someone will have run a short SELECT INTO or an INSERT...SELECT, or an index rebuild, that was minimally logged. You can count on that - if you don't abide by Murphy's Law, you really need to work in something other than IT! So, as a best practice: set Bulk-Logged when you intend to use it, and then switch back to Full. If you choose to keep it in Bulk-Logged, always assume you can only backup to the end of any transaction log backup - but in an emergency, you can still try a STOPAT restore.
Obviously, if you get to know me, you'll know I stress certainty and clarity a lot of the time - so, while you can try a STOPAT restore on a Bulk-Logged database, you don't know what will happen. This is why I strongly agree with the best practice of using Bulk-Logged only when you expect to need it. And then, once you're done, switch back to Full, and run a backup - usually a differential or log backup, but if you've just loaded a huge dataset, maybe you want a full backup instead! - so that you're able to restore to this point in time - the point in time at which you once again have confidence that you can use any and all magic of your log file backups again.
Sunday, February 22, 2015
Sunday, February 15, 2015
Who am I? And why would I want to be mentored by Paul Randal?
Paul Randal is one of the gods of the SQL Server business. I'm not saying that to butter him up - I'm saying it to acknowledge a fact. If the word "guru" is better than "god" - hey, whatever word you prefer :-). He posted an offer: http://www.sqlskills.com/blogs/paul/want-mentored/. Explain, in a blog post, why he should mentor you.
I didn't have a blog. What to do?
Well - I'd always wanted to be one of the well-known experts in SQL Server - one of those "someday... maybe... if all goes well!" dreams. And I'd kept putting off starting the blog because I have my chronic fatigue issues - I'm always too-damn-tired!
So, this kicked me into starting my SQL blog. So if you think my blog is horrible, you get to blame... me. No one else. But you can still glare at Mr. Randal for inspiring my appearance!
What reasons would I give to be mentored?
I've learned the basics of the way SQL works, from the inside. And I want to share them. And I want to talk about them, with others, in a way that I hope will help people draw connections and see patterns and understand the tools and workings that much better. If you think my other posts are interesting and amusing, great - if you think they're awful, oh well.
Okay, but there's a lot a person can't really learn about just puttering around, and I've surpassed that point a long time ago; I need someone who can help me formulate and ask, then answer, the right questions.
Such as? Well - query plans. "What? There's tons of information about query plans out there!" Sure - but there's not very much that I've seen that explains how to parse them, if you need to dig in to just that one particular detail about what's being found, and how it's being pulled, or how it's being manipulated afterward.
Or XEvent captures: Profiler is going away - XEvents are taking its place. How do they work? Well, events fire, and you can gather information when they do - but how do you know what information you can gather for which event firing? And what are some neat tricks you can use, so you soon find yourself glad, yes, glad that clunky old Profiler's going bye-bye? There's a lot of different bits about XEvents out there, but if there's a good intro for newbies I haven't found it - and I wouldn't mind changing that, and writing the durn thing if I have to!
I've seen a deadlocked scheduler warning on what seemed to be parallelism - but I only counted 300 parallel threads out of 545 threads in the dump - did I miss something else making SQL feel thread-starved? Or was that enough? If 300 was enough, why? 450 threads out of 545 is probably enough to put up a thread-shortage, but 300 seems not-quite-enough (no jokes about Sparta, please!). And again: help me understand, it won't just be me learning - I'm going to try to share it with the world.
Saw a fascinating case where read latch waits were causing massive contention - not locking, shared latches! We puzzled out trying Read Committed Snapshot - I believe that it worked, and I believe it should have worked - the only latches that might occur should be when building the initial version store - and after that, updates (new values in the version store) should occur as a transaction commits - as the lock gets released, without any risk of latch contention, since no other query "sees" the new value existing yet. But I couldn't be sure, because I didn't know the mechanism. Do you? I'd love to learn these kinds of thing!
Hekaton manages to avoid all locking and latching - I see this, I know this, but I haven't the foggiest how they pull that beauty off. I'd love to learn how that happens, if it's public facing (just awfully crunchy) knowledge.
And I'd like to try to link these things to other things, and try to build up a flow, a sense, a song, a spirit... of SQL. And share that with the world.
If you were bored, and checked out my other two, yes, *two!* substantive blog posts, you might decide you'd like to help me share what I learn with the world. Or you might decide to mentor me *in spite* of my crimes against the SQL blogging community :-). Regardless - that's why I'd like to learn more, from one of the people who can definitely point me toward learning more. So I can expand my own knowledge, and share it, in my own, uh, interesting, way.
I didn't have a blog. What to do?
Well - I'd always wanted to be one of the well-known experts in SQL Server - one of those "someday... maybe... if all goes well!" dreams. And I'd kept putting off starting the blog because I have my chronic fatigue issues - I'm always too-damn-tired!
So, this kicked me into starting my SQL blog. So if you think my blog is horrible, you get to blame... me. No one else. But you can still glare at Mr. Randal for inspiring my appearance!
What reasons would I give to be mentored?
I've learned the basics of the way SQL works, from the inside. And I want to share them. And I want to talk about them, with others, in a way that I hope will help people draw connections and see patterns and understand the tools and workings that much better. If you think my other posts are interesting and amusing, great - if you think they're awful, oh well.
Okay, but there's a lot a person can't really learn about just puttering around, and I've surpassed that point a long time ago; I need someone who can help me formulate and ask, then answer, the right questions.
Such as? Well - query plans. "What? There's tons of information about query plans out there!" Sure - but there's not very much that I've seen that explains how to parse them, if you need to dig in to just that one particular detail about what's being found, and how it's being pulled, or how it's being manipulated afterward.
Or XEvent captures: Profiler is going away - XEvents are taking its place. How do they work? Well, events fire, and you can gather information when they do - but how do you know what information you can gather for which event firing? And what are some neat tricks you can use, so you soon find yourself glad, yes, glad that clunky old Profiler's going bye-bye? There's a lot of different bits about XEvents out there, but if there's a good intro for newbies I haven't found it - and I wouldn't mind changing that, and writing the durn thing if I have to!
I've seen a deadlocked scheduler warning on what seemed to be parallelism - but I only counted 300 parallel threads out of 545 threads in the dump - did I miss something else making SQL feel thread-starved? Or was that enough? If 300 was enough, why? 450 threads out of 545 is probably enough to put up a thread-shortage, but 300 seems not-quite-enough (no jokes about Sparta, please!). And again: help me understand, it won't just be me learning - I'm going to try to share it with the world.
Saw a fascinating case where read latch waits were causing massive contention - not locking, shared latches! We puzzled out trying Read Committed Snapshot - I believe that it worked, and I believe it should have worked - the only latches that might occur should be when building the initial version store - and after that, updates (new values in the version store) should occur as a transaction commits - as the lock gets released, without any risk of latch contention, since no other query "sees" the new value existing yet. But I couldn't be sure, because I didn't know the mechanism. Do you? I'd love to learn these kinds of thing!
Hekaton manages to avoid all locking and latching - I see this, I know this, but I haven't the foggiest how they pull that beauty off. I'd love to learn how that happens, if it's public facing (just awfully crunchy) knowledge.
And I'd like to try to link these things to other things, and try to build up a flow, a sense, a song, a spirit... of SQL. And share that with the world.
If you were bored, and checked out my other two, yes, *two!* substantive blog posts, you might decide you'd like to help me share what I learn with the world. Or you might decide to mentor me *in spite* of my crimes against the SQL blogging community :-). Regardless - that's why I'd like to learn more, from one of the people who can definitely point me toward learning more. So I can expand my own knowledge, and share it, in my own, uh, interesting, way.
Spinlocks and lock hash...
Spinlock - just the name sounds like something meaningful and powerful in SQL Server. "Spinlock contention" - if you could dig into issues like that, surely you'd be a god among geeks in the SQL Server world, right?
What is a spinlock? And how does it apply to you? Further - when should you care?
The Too Long/Didn't Read Synopsis: Spinlocks can be a cause of contention, but it's actually relatively rare - before spinlock contention is clearly an issue, you have a lot of other possibilities to eliminate.
But I have observed some worrisome LOCK_HASH spinlock cases that suggested that it might be wise to dig into locking behavior if you see a lot of heavy activity on the Lock_Hash spinlock, either through KB 2926217 if you have lots of database locks, or by reviewing overall locking if you have large numbers (thousands and thousands) of individual locks of any type. Caveat emptor, test prior to use in production, package sold by weight, not volume, and contents may have settled....
And on to the body of the post:
I'm going be be forthright with you. I know about programming. For many years, I knew that the big bucks went to developers, and I wanted to be one of them. Alas, I majored in mathematics, not Computer Science. And where I went to school, Pascal was viewed as an up and coming language, and sure, while K&R floated around, that wasn't going to get me anywhere (it got some people places - but not me. Wrong kind of learning for me). I later studied Java and C++, but I kept missing some key that let me go from being able to read a lot of source code to being able to write good applications, even simple ones.
I know about programming - but I'm not a programmer. I can't actually tell you, with certainty, what a spinlock "is". But, as we all know, the best way to gather information on the internet is not to ask a question... it's to post something wrong. And I know what I have to post is probably at least 80% right, so let's take a whack at it, and see who helps me update this blog post!
Key thing about a spinlock is this: it's pure-D user mode. Moving between User Mode and System Mode is expensive - it's not a context switch, by the way, that's something else - so if you can use a spinlock to check for contention on a resource, you can avoid hitting system mode.
"Check for contention?" Okay, let's back up. You know in a multiprocessor system, you have to have isolation. Only one thing can change something at the same time. And you generally don't want anyone even reading something while it's being changed, or you have problems - e.g., a race condition (depending on who wins the race, you get a different value, right? but we generally want consistency).
A spinlock is a way of isolating a resource. And as near as I can tell, it goes like this: suppose you check something that's commonly used, but used for a very short time. So: if you can't have it this very instant, the odds are you can have it if you wait just a tiny bit longer. You'll probably get it faster than the OS can tell you to "sleep" and wake you up when it's available. So, a spinlock plays kid-in-the-back-seat, only instead of "are we there yet?... are we there yet?" it asks "can I have it now? Can I have it now?" over and over.
That's a greatly simplified version, of course. The spinlock request starts with a request for access; if it gets it, great! If not, it's a "collision". The spinlock request then runs a looping procedure (thus, the "spin") and then requests access. If it gets it, great. If it doesn't, it keeps spinning and requesting for a bit, and then it "backs off" - and here's where it goes to sleep.
One of the ways SQL Server maintains great scalability is by staying in User Mode almost exclusively, and using cooperative multitasking - tasks voluntarily yield on the CPU, to let other tasks have a turn, knowing full well that the other tasks will (probably - almost certainly) yield in turn.
(What is scalability? Well - let's say you have 4 CPUs running at 20% and processing a thousand tasks a second. If you throw two thousand tasks a second, and CPU goes up to 45-50%, you have a scalable system. If you jump to 80% - you probably don't. This is super, super simplified, of course, and it assumes that only CPU matters. But the idea of scalability is, your system grows in a relatively stable way - double capacity means a bit more than double resources. (Obviously, it might mean exactly double the resources, but that's far too clean for most real-world systems, which is why I suggest 2x transactions = a bit more than 2x CPU.))
So: where appropriate, SQL Server uses spinlocks. It uses a lot of different spinlocks. You can check spinlock counters using DBCC SQLPERF(SPINLOCKSTATS) and sys.dm_os_spinlock_stats. ("Hey, 'Weirdo - why don't you have links to the Microsoft documentation about these valuable commands?" Because I can't find a link to them! Neither Bing nor Google brings up an MSDN page from Books Online. But a useful resource is here: Diagnosing and Resolving Spinlock Contention on SQL Server. Note: instantaneous values are useless - at the very least, you need to run these twice and compare changes over time.)
When is spinlock contention an issue? Wow.
With certainty, spinlock contention is an issue when you see CPU shoot unexpectedly higher, as transactions climb, without the expected increase in throughput, and you see huge, huge increases in spins (and maybe back-off events) in one (most likely) or more (less likely, but possible) spinlock counters.
So: if our hypothetical server shot up to 80% when we tried to move from 1000 transactions per second, to 2000 transactions per second (instead of the 45-50% we'd expect) - we might be seeing spinlock contention. Spinlocks take up CPU, and, in fact, can "waste" CPU if they spin too long when (if we had an omniscient scheduler) they'd be better off sleeping.
With greater certainty, if your CPU is spiking to the point that CPU usage is actually slowing you down, and the cause is excessive spins on spinlocks, your problem is spinlock contention. (Someone out there just said "well, duh - too much CPU wasted in spinlocks means not enough CPU for other tasks." Good! That's precisely the point I was making, but I try to avoid first use of "well, duh"!)
But can spinlocks cause slowdowns before we reach that point? Yes, probably. One key point is that we have to rule out a bunch before we should settle on spinlock contention as the root cause of your problems. Odds are, if you look for another problem, you'll find one, and get more "bang for your buck" from that search for slowdowns.
Okay: but I evaluate performance (and other stuff) on SQL Server for a living. And I've seen some interesting numbers. And I think - just a feeling, mind you - that we might be able to work on this idea a bit.
One of the places where I see a lot of spinlocks, with high spins/collision and lots of backing off, is in Lock Hash.
"Lock hash? Does that mean you fry up lox with potatoes and onion...?"
Hashing in IT is a different process. A "hash" is a way of encoding data in a set of bits so that only "close" data has the same hash value. If you hash "1313 Mockingbird Lane NE" and "1331 Mockingbird Lane NE" you might get the same hash value - but if you hash "1314 Mockingbird Lane NE" you (should) get different one.
With a good hash algorithm, you can quickly eliminate an overwhelming majority of your possible matches for data. In fact, if you construct a "good" hash table of your data, the expected time to search that table is expected to be based upon a constant. You should be able to search for one value in a thousand roughly as fast as you can search for one value in a million. Note that a binary search algorithm is very fast - based upon the Log-base-2 of the data. And a B-tree search can be proportionately faster, but not much - it's still based upon the logarithm of the number of values. But the hash, for a large enough pool of values, always wins.
SQL stores its locks in a hash - it wants to be able to search for incompatible locks in a very swift manner, no matter how many locks it has.
I've evaluated a few interesting performance situations, and I've seen a few cases where I see some very high results for the Spins Per Collision on the Lock_Hash spinlock, and a great many Back Off events. Only rarely do I see something that obviously calls for http://support.microsoft.com/kb/2926217 - but I've sometimes recommended testing it anyway - note you need trace 1236 turned on for it to work! Now, note that KB article, and read it very carefully.
It calls out database locks, and database locks only - on 16+ core servers, table-level locks are partitioned already, and this trace flag partitions database locks. What else have I seen?
Well - I've seen queries taking out hundreds and thousand of row or key locks, and seen huge, huge values on spins per collision growing over time, and a scary number of back off events.
What do I do when I see this? Well - I sometimes suggest people consider testing a change in their locking behavior. If a big query is going to block hundreds of other queries, even if only briefly, and it's also going to take thousands and thousand of row or index locks, we might want to hint it to take just page or table locks. Hey - it's already going to block like a ... uh, let's be polite and say "big blocky blocking thing", but page or table locks are easier to find and more efficient to create (but might take longer to obtain!).
You can also set up an index so it doesn't allow row locks, or page locks (depending on your needs). This is, obviously, a trickier thing - a hint affects one query, this affects all queries. Then again, if you find that the majority of your queries are going to affect multiple rows on a page anyway, you might just find that you're not really losing much in concurrency.
You can also use Snapshot Isolation - your read queries won't (generally) be locking the table - they'll be using a copy of the relevant values (the version store) in TempDB.
Paul Randal sets up another interesting situation here: SOS_SCHEDULER_YIELD waits and the LOCK_HASH spinlock.
So: While I would have a very hard time proving that Lock_Hash spinlocks are the cause of (or even "a contributing factor to") a performance problem, if I start to see high spins/collision and back offs, I try to nudge people toward thinking about locking behavior. For other spinlocks? Remember: odds are, your root cause is something else. But check out the link to the white paper on resolving spinlock contention - it has some very good tips and techniques.
What is a spinlock? And how does it apply to you? Further - when should you care?
The Too Long/Didn't Read Synopsis: Spinlocks can be a cause of contention, but it's actually relatively rare - before spinlock contention is clearly an issue, you have a lot of other possibilities to eliminate.
But I have observed some worrisome LOCK_HASH spinlock cases that suggested that it might be wise to dig into locking behavior if you see a lot of heavy activity on the Lock_Hash spinlock, either through KB 2926217 if you have lots of database locks, or by reviewing overall locking if you have large numbers (thousands and thousands) of individual locks of any type. Caveat emptor, test prior to use in production, package sold by weight, not volume, and contents may have settled....
And on to the body of the post:
I'm going be be forthright with you. I know about programming. For many years, I knew that the big bucks went to developers, and I wanted to be one of them. Alas, I majored in mathematics, not Computer Science. And where I went to school, Pascal was viewed as an up and coming language, and sure, while K&R floated around, that wasn't going to get me anywhere (it got some people places - but not me. Wrong kind of learning for me). I later studied Java and C++, but I kept missing some key that let me go from being able to read a lot of source code to being able to write good applications, even simple ones.
I know about programming - but I'm not a programmer. I can't actually tell you, with certainty, what a spinlock "is". But, as we all know, the best way to gather information on the internet is not to ask a question... it's to post something wrong. And I know what I have to post is probably at least 80% right, so let's take a whack at it, and see who helps me update this blog post!
Key thing about a spinlock is this: it's pure-D user mode. Moving between User Mode and System Mode is expensive - it's not a context switch, by the way, that's something else - so if you can use a spinlock to check for contention on a resource, you can avoid hitting system mode.
"Check for contention?" Okay, let's back up. You know in a multiprocessor system, you have to have isolation. Only one thing can change something at the same time. And you generally don't want anyone even reading something while it's being changed, or you have problems - e.g., a race condition (depending on who wins the race, you get a different value, right? but we generally want consistency).
A spinlock is a way of isolating a resource. And as near as I can tell, it goes like this: suppose you check something that's commonly used, but used for a very short time. So: if you can't have it this very instant, the odds are you can have it if you wait just a tiny bit longer. You'll probably get it faster than the OS can tell you to "sleep" and wake you up when it's available. So, a spinlock plays kid-in-the-back-seat, only instead of "are we there yet?... are we there yet?" it asks "can I have it now? Can I have it now?" over and over.
That's a greatly simplified version, of course. The spinlock request starts with a request for access; if it gets it, great! If not, it's a "collision". The spinlock request then runs a looping procedure (thus, the "spin") and then requests access. If it gets it, great. If it doesn't, it keeps spinning and requesting for a bit, and then it "backs off" - and here's where it goes to sleep.
One of the ways SQL Server maintains great scalability is by staying in User Mode almost exclusively, and using cooperative multitasking - tasks voluntarily yield on the CPU, to let other tasks have a turn, knowing full well that the other tasks will (probably - almost certainly) yield in turn.
(What is scalability? Well - let's say you have 4 CPUs running at 20% and processing a thousand tasks a second. If you throw two thousand tasks a second, and CPU goes up to 45-50%, you have a scalable system. If you jump to 80% - you probably don't. This is super, super simplified, of course, and it assumes that only CPU matters. But the idea of scalability is, your system grows in a relatively stable way - double capacity means a bit more than double resources. (Obviously, it might mean exactly double the resources, but that's far too clean for most real-world systems, which is why I suggest 2x transactions = a bit more than 2x CPU.))
So: where appropriate, SQL Server uses spinlocks. It uses a lot of different spinlocks. You can check spinlock counters using DBCC SQLPERF(SPINLOCKSTATS) and sys.dm_os_spinlock_stats. ("Hey, 'Weirdo - why don't you have links to the Microsoft documentation about these valuable commands?" Because I can't find a link to them! Neither Bing nor Google brings up an MSDN page from Books Online. But a useful resource is here: Diagnosing and Resolving Spinlock Contention on SQL Server. Note: instantaneous values are useless - at the very least, you need to run these twice and compare changes over time.)
When is spinlock contention an issue? Wow.
With certainty, spinlock contention is an issue when you see CPU shoot unexpectedly higher, as transactions climb, without the expected increase in throughput, and you see huge, huge increases in spins (and maybe back-off events) in one (most likely) or more (less likely, but possible) spinlock counters.
So: if our hypothetical server shot up to 80% when we tried to move from 1000 transactions per second, to 2000 transactions per second (instead of the 45-50% we'd expect) - we might be seeing spinlock contention. Spinlocks take up CPU, and, in fact, can "waste" CPU if they spin too long when (if we had an omniscient scheduler) they'd be better off sleeping.
With greater certainty, if your CPU is spiking to the point that CPU usage is actually slowing you down, and the cause is excessive spins on spinlocks, your problem is spinlock contention. (Someone out there just said "well, duh - too much CPU wasted in spinlocks means not enough CPU for other tasks." Good! That's precisely the point I was making, but I try to avoid first use of "well, duh"!)
But can spinlocks cause slowdowns before we reach that point? Yes, probably. One key point is that we have to rule out a bunch before we should settle on spinlock contention as the root cause of your problems. Odds are, if you look for another problem, you'll find one, and get more "bang for your buck" from that search for slowdowns.
Okay: but I evaluate performance (and other stuff) on SQL Server for a living. And I've seen some interesting numbers. And I think - just a feeling, mind you - that we might be able to work on this idea a bit.
One of the places where I see a lot of spinlocks, with high spins/collision and lots of backing off, is in Lock Hash.
"Lock hash? Does that mean you fry up lox with potatoes and onion...?"
Hashing in IT is a different process. A "hash" is a way of encoding data in a set of bits so that only "close" data has the same hash value. If you hash "1313 Mockingbird Lane NE" and "1331 Mockingbird Lane NE" you might get the same hash value - but if you hash "1314 Mockingbird Lane NE" you (should) get different one.
With a good hash algorithm, you can quickly eliminate an overwhelming majority of your possible matches for data. In fact, if you construct a "good" hash table of your data, the expected time to search that table is expected to be based upon a constant. You should be able to search for one value in a thousand roughly as fast as you can search for one value in a million. Note that a binary search algorithm is very fast - based upon the Log-base-2 of the data. And a B-tree search can be proportionately faster, but not much - it's still based upon the logarithm of the number of values. But the hash, for a large enough pool of values, always wins.
SQL stores its locks in a hash - it wants to be able to search for incompatible locks in a very swift manner, no matter how many locks it has.
I've evaluated a few interesting performance situations, and I've seen a few cases where I see some very high results for the Spins Per Collision on the Lock_Hash spinlock, and a great many Back Off events. Only rarely do I see something that obviously calls for http://support.microsoft.com/kb/2926217 - but I've sometimes recommended testing it anyway - note you need trace 1236 turned on for it to work! Now, note that KB article, and read it very carefully.
It calls out database locks, and database locks only - on 16+ core servers, table-level locks are partitioned already, and this trace flag partitions database locks. What else have I seen?
Well - I've seen queries taking out hundreds and thousand of row or key locks, and seen huge, huge values on spins per collision growing over time, and a scary number of back off events.
What do I do when I see this? Well - I sometimes suggest people consider testing a change in their locking behavior. If a big query is going to block hundreds of other queries, even if only briefly, and it's also going to take thousands and thousand of row or index locks, we might want to hint it to take just page or table locks. Hey - it's already going to block like a ... uh, let's be polite and say "big blocky blocking thing", but page or table locks are easier to find and more efficient to create (but might take longer to obtain!).
You can also set up an index so it doesn't allow row locks, or page locks (depending on your needs). This is, obviously, a trickier thing - a hint affects one query, this affects all queries. Then again, if you find that the majority of your queries are going to affect multiple rows on a page anyway, you might just find that you're not really losing much in concurrency.
You can also use Snapshot Isolation - your read queries won't (generally) be locking the table - they'll be using a copy of the relevant values (the version store) in TempDB.
Paul Randal sets up another interesting situation here: SOS_SCHEDULER_YIELD waits and the LOCK_HASH spinlock.
So: While I would have a very hard time proving that Lock_Hash spinlocks are the cause of (or even "a contributing factor to") a performance problem, if I start to see high spins/collision and back offs, I try to nudge people toward thinking about locking behavior. For other spinlocks? Remember: odds are, your root cause is something else. But check out the link to the white paper on resolving spinlock contention - it has some very good tips and techniques.
SQL Server and PerfMon counters - including some tricky ones!
First thing you'll learn about me: I'll talk. A lot.
So: let's try something out, which might become a tradition for this blog, based upon those "crazy kids and their intertubes" - the TL/DR synopsis.
The Too Long/Didn't Read Synopsis:
Good Perfmon Counters for SQL Diagnostics and forensics:
Logical (or Physical) Disk: Seconds Per Transfer - <10ms good, 10-20 usually adequate, 20-50 usually painful (at least occasionally) 50+ms: usually a serious bottleneck (This applies to MDF and LDF disks - but might to the ERRORLOG disk - ERRORLOG writes synchronously! - and definitely to pagefile disks. Others - you be the judge.)
Bytes Per Second: How much traffic is the disk seeing? How much should it handle? It's reasonable to see more latency at the upper end of capacity. If an 80 meg/second disk is slowing down, only at 80 megs per second, we're not surprised. But if it's slowing down at 30 meg/second, or heaven help us, *1* meg/second - yes, it happens! - worry.
% Idle Time: A good disk is 90+% idle, almost always. If not, something odd is going on. (Could be getting hammered - could be misconfigured, outdated driver, or on its last legs.)
Memory: Available Megabytes - higher than 1000 and stable - stable is important! - for x64. 800 stable is more clearly acceptable than 1000-2500 (but both can be healthy); more isn't bad, and 5% as a minimum for "big RAM" seems like a reasonable safe harbor. (So: never dropping below 6gb free on your 128gb server is a reasonable safe harbor.)
Memory: Pages Per Second - if Available RAM is changing rapidly or is obviously low, pay attention to paging. Otherwise, Memory Mapped Files can make this look scary high when it's normal.
(OSgeeksgurus have a lot more memory stuff to look at - but we're DBAs!)
Processor: %Total CPU: Less than 80% sustained - you're probably doing fine, but most healthy SQL Servers just tend to have 30% or less, most of the time. If you're growing over time, that's relatively normal - if you're spiking, worry. Sometimes looking at individual cores is useful - especially if MaxDOP is 1. But do you really have time to investigate 32 (or lord have mercy, 256?) cores' behavior?
%User time: SQL runs almost - almost - exclusively in user mode. This is the cap on (most) SQL CPU usage, except for Filestream calls.
(Again: OS gurus have more to look at - but DBAs can usually stop here.)
SQL Server:Buffer Manager:Page Life Expectancy:
Please see http://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/, section titled "What’s Wrong about Page Life Expectancy >= 300?".
I like to see about 50seconds per gb in the buffer pool as an absolute minimum - a tad bit easier than "300/4gb" = 75seconds per gb. Super-zippy disks can make lower values acceptable. See also: http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/.
LazyWrites Per Second: This is buffer pool contention - SQL needed to call the lazy writer to dump some pages to make room for other pages. Some lazywrites, sometimes, is (are?) fine. If your average is meaningfully > 0, and not caused by a huge, huge spike, you've got buffer pool contention.
Buffer Cache Hit Ratio: I check this once in a while, but usually for shock value. It used to be meaningful, but buffer pools have gotten so huge these days, and read-ahead so good, I've never seen it much less than 99%. But if it is, it might be the final stone in the bucket.
SQL Server: Buffer Node: did you check Mr. Randal's advice above? Why the heck not? Paul Randal helped build the durn product! Listen to the man!
Sneaky Statistics - for forensics, mostly.
Process: SQLServr: Threads - if SQL suddenly uses a lot more threads, it's building up a backlog of connections/queries. WHY?
Process: SQLServr: Working set. Did it just drop? Bad news. Grow? Something just 'released' or SQL needs a boatload of memory. Remember: LOCKED PAGES IN THE BUFFER POOL AREN'T COUNTED!!!
SQLServer:SQL Statistics:Batch Requests/Sec: High? Low? (Dropped to 0 meaning SQL's locked?)
SQLServer:SQL Statistics:(re)Compilations/second: Compared to batch requests, if this is high, we'll see slowdowns and higher CPU (yep, compiling is often more expensive than running!)
SQLServer:Wait Statistics:Waits In Progress: I've had a hard time mapping the other wait stats to hard and fast SQLDiag collected waits - but "Waits In Progress" seems accurate. This can point to thread exhaustion, or simply give you a good starting point for digging into oddball waits you don't quite see in other areas.
And now: I'm allowed to ramble to my heart's content! I hope you find this entertaining or informative. If not - why'd you keep reading, silly-person?
PerfMon is one of the best tools for checking performance of almost anything you're working with on the Windows platforms. There might be better methods of finding answers - some "scorecard metrics" that you can pull up more quickly and easily - but for me, there's something special and magical about diving into perfmon counters. Look - disk usage shoots up. What else happens? OH! Paging! I bet our memory is moving - sure enough, look! Megabytes Available just plummeted from 1500 to 400 - this server is in big trouble!
It's not always that obvious, of course. But while Sherlock Holmes calls out the Baker Street Irregulars, or Dr. House plays cane-ball, I open up a set of perfmon counters and pull out my favorite bits of data to let me think about things.
What are my favorites?
Well - remember, this is for SQL Server. So it's all balanced in that direction.
First I'll pull up Logical Disk - logical disks are your C: drives, your D: drives, and so forth. They might also be your L:\SQLLogs\MSSQL\ drive, if you're using Mount Points. If so, you probably have a separate L: drive - and I won't even try to guess what counters will show where. Does a write to L:\SQLLogs\MSSQL only show up there? Or does it also show in L:? I don't think so - but trust your perfmon counters here - if they're showing latency, there's latency somewhere!
The drive letter is a key difference between Logical and Physical drives - physical drives might be 0, 1, 2, 3... with no (obvious) mapping to the drive letter or directory. Sometimes the letter mappings show, but not always. So I use logical. But checking both can be useful, because if physical and logical drives show different values, it can help isolate where a performance issue is occurring. But I never learned how to determine which counter means what - I have another team to do that for me! - so I just grab logical drives. (I believe that physical is pure hardware level - physical seconds per transfer measures from the last handoff from logical IO to physical IO. But I've been wrong before.)
My best metric here is "Seconds Per Transfer". A former colleague of mine, John Gose, explained that this is a great scorecard because SQL Server really doesn't care a whole bunch about "% Disk Time" (which can go > 100% if you have a RAID array), or disk queue length ("up to 1 per spindle is acceptable" - if you have a SAN backed by 250 spindles, shared by 14 applications, what disk queue is acceptable?). If SQL asks for a read or a write, and it occurs in under 10ms, SQL is probably very happy. I've seen some sources say that 7ms is better for a log file transfer (which will almost always be a sequential write) - but 10ms is generally good enough there, too, if you don't have poor wait statistics.
What if you don't have "10ms or less?" Well, that depends. The rule of thumb I heard from Microsoft CSS was "10ms or less is good; 10-20ms is bad enough that you'll notice; 20-50ms is definitely a bottleneck; 50+ms - we can stop looking for bottlenecks. This is the worst bottleneck we'll find." Ah - but that was back in the day of x86!
For most x86 servers, the biggest ordinary buffer pool you'd find was about 1500 meg; you could find more - even back in the days of Windows 2000, there were builds that could run with up to 32 gig. Well, databases were a lot smaller, sure, but data still grew, and someone always wanted "just one more database" on the server. It was pretty common for SQL Server to be disk-bound in this kind of situation. 50ms reads and writes probably put a huge bottleneck on your server in this case - you always needed to go to disk in the relatively near future.
With x64 servers, 32gig for a server isn't a bit surprising, and you can manage a 26 gig buffer pool on that. You can have a pretty darn big database before your truly hot data is bigger than 26 gigabytes. (Technically, 20-25 gigabytes - you always have some "stolen" pages used for other purposes.) 50ms disk reads and writes might not slow your server down all that much. Even 100ms latency might not kill you.
But 50+ms latency will do two things, for sure.
It'll wound the very soul of your DBA, if your DBA is good. (If you're good enough to recognize that this particular app is doing just fine with 50ms latency, you're also good enough to know how good you are, and how rare that is - so, to quote Boris Badenov, "sha up you mouth" about your unwounded soul!)
And it means that if you ever need those disks, your performance isn't just going to drop - it's going to go off a cliff.
So: seconds per transfer. Good statistic. By the way, remember that 10 ms is .010 seconds - count those decimal places carefully!
The other side of seconds per transfer is Bytes Per Second. A decent laptop hard drive should handle 20 meg/second, no problems. A good desktop hard drive should handle 60 meg/second, no problems. Obviously, this is a rough guideline - it's not just bytes, there's number and types of operations, and how big they are, and are they random access or sequential. But if your laptop can't handle 20 megabytes per second, with latency generally lower than 10ms, fix your hard drive - similarly, for your desktop, and 60 meg/second, fix your drive. And if you're using a solid state drive, and are scoffing at me, quit gloating! Sure, I have and love solid state drives, but sometimes we have to use the work-supplied laptop - 'nuff said.
The point is, if you see bad latency, but you're running at the limits of what you can reasonably expect, well, we have to balance the latency against the demand.
If you're on a decent (non-SSD) hard drive, and your latency increases when you hit 60-80meg/second on your desktop - well, that's probably okay. You might need more, or better, disks, but we expect to see some latency increase at the top end.
Scary thing: I've looked at a lot of perfmon that shows average latency of 10ms or less... unless you restrict the capture to times when the disks are running 1 meg/second or greater - and suddenly, latency shoots up to 20ms or worse.
Yes, I said "one megabyte per second" - when I said it was scary, did you think I was joking? If you're looking at PerfMon, and you see latency and Bytes Per Second jump up, always, always, always, set your PerfMon to an appropriate zoom, and see if anything looks bad when you don't have minutes and hours of idle disks being averaged into your PerfMon counters!
What do I advise when I see poor disks? Well - my advice is to talk to your hardware support team if you have one, or your hardware vendor, if you don't. Go to them with a PerfMon capture, and demand that they explain the results to you. Don't let them say "we tested it, and it all showed no problems." A SAN or a RAID controller will usually show no problems unless there's an actual failure detected. But if you know you bought a RAID controller, or a SAN, that can handle 150 meg/second of traffic, and you're seeing poor latency at 25 meg/second, you know something's wrong.
It might be BIOS on the controller or SAN. It might be firmware. It might be a driver, or a filter.
(What's a filter? Well - anything that intercepts IO calls. It might not actually "filter" IO calls in the normal sense of letting some stuff through but blocking others. For example, some IO system drivers are "filters" even though they pass everything through. Anti-virus programs might be filters - and they do act as a traditional filter, blocking certain activity.)
Or it could be hardware - but if it is, your hardware support (or vendor) can probably see an alert, and there was no need to "demand" they explain your performance.
As I mentioned, in my day job, I work with SQL Support - and if they run into a non-obvious disk problem, they'll call up the Microsoft Platforms team. If you need help proving you have storage issues, Microsoft Platforms support can help. That's the big reason why I don't need to dig too deeply into disk performance counters. But I do like to look at % Idle Time, too.
A disk should almost always be 90% or more idle. If a disk is only 50% idle, and latency is still less than 10ms, and it's pushing about as many megabytes per second as you're paying for - okay, that's good. One case I worked, I saw a disk that was 10% idle, running around 200 meg/second, and still showing no more than 8ms latency - I called this out to the customer, because they deserved to know that whoever set up their IO did a hell of a great job on it.
But more frequently, you see latency that's 10-15ms, bytes per second up around 20 meg, and you think the disks are maybe okay - and then you see that they're only showing % Idle Time of 75, or 50, or worse(!!!). This is a final danger sign that might get you to dig in more deeply before real trouble starts.
So, there's your disks - a good basic check on them.
Seconds Per Transfer: .010 or better.
Bytes Per Transfer: use this to judge whether your latency is acceptable. Higher latency at the upper limits of your capacity is okay
% Idle Time: A possibly canary in your coal mine - it sometimes catches incipient problems before they're more clearly visible.
What's next? Memory.
If you're old-school, like me, you used to hear about "Pages Per Second". Unfortunately, while this matters, it's tricky. Windows can use memory mapped files - and reading from a memory mapped file is "paging". And the first time you need to use a memory mapped file, well, of course it's doing a fair amount of paging! That's a feature (not a bug)!
If your server is x64 (and dear lord, I hope it is!), you should instead look at Megabytes Available. Typically, for SQL, we want to see about 1000 megabytes available, and we want it stable. I'd rather see 900 meg available (Min 850, max 950) than 1200 available (min 1000, max 1400), but both are perfectly fine - and while stability is nice, if you need to launch large-ish processes from time to time, you just can't maintain perfect stability. And if you need to launch memory hungry applications, you also need more megabytes available to run them!
Anything much less than 900, though, and we start to panic. Anything less than 500? My friend, if you told me the Titanic sunk because it's x64 navigation system had less than 500 meg available, I'd say "doesn't surprise me a bit!" until I realized you meant the actual Titanic, and weren't speaking metaphorically. (64 bit servers didn't arrive on the scene until well after the Titanic was provisioned - and I know it didn't undergo a retrofitting of its nav system!)
Now: if you see a lot of paging, and a big shift in Megabytes available (or in Working Set in your Process counters) - you might be running into working set trim, and your server can lock up while it's happening. And if you see a fair amount of paging, and Megabytes Available shifting, you might want to see what's happening, to see if you can smooth it out a bit. But a fair amount of paging, and stable Megabytes Available is usually (not always!) innocuous.
What's next? Well, Processor is a good thing to check. Generally, for SQL Server, anything less than 80% sustained CPU usage is acceptable. SQL Server will use parallelism to speed up queries, and use more CPU to try to get better performance. Until you hit 80%, adding more or beefier CPUs probably won't help much. However, semi-paradoxically, most SQL Server instance tend to run on low CPU - 20% CPU or less is not at all uncommon, and I expect to see 30% or less in a healthy server - 40% isn't bad, or even a warning sign, it's just "surprisingly high" and I'll make a mental note. SQL just doesn't have a lot to do with its CPUs most of the time.
Typically, if I see really high CPU, I'll hear that the CPU is normally less than 50%, and suddenly shot up to 100% today - that's a hallmark of a really bad query plan, often one based on data that's purely contained in the buffer pool. (A huge nested loop join on data in the buffer pool can spin CPU like nobody's business.) Update stats, and see if it helps - an update with FULLSCAN isn't necessary, but it can be a valuable troubleshooting step, since it eliminates the chance that a random sampling caused the problem - and consider dumping your procedure cache, or the query plan for the problem-query if you know it.
Dumping your procedure cache is expensive - SQL queries now have to recompile. But think about it this way - if your CPU is pegged at 100%, does it really matter if they're at 100% because queries are recompiling, or because the CPU is spinning endlessly on a bad plan? It's often - not always - worth a shot.
Now we get into SQL Server statistics. If CPU, Memory, and disks are all healthy, we know SQL Server is resting on a good foundation, and we can dig into some specific counters that tell us more about SQL Server.
One of the biggest, most obvious, counter sets is the Buffer Manager counters - how healthy is the SQL Server buffer pool?
Buffer pool is something that confuses a lot of people. They think of it is the place SQL puts data for faster access than from the disks. It's a bit more than that. You see, SQL Server never directly accesses data on disk - it reads data into the buffer pool first, then it might make changes, (hopefully computes the checksum - you're set to CHECKSUM page verification, right?), and then writes it back if changed. So, for example, if you update statistics on a huge table, WITH FULLSCAN, that whole table's going to have to be read into buffer pool before you finish.
The first good scorecard counter is Buffer Manager: Page Life Expectancy. This is SQL's estimate of how long a page that's accessed a single time in the buffer pool will sit (in seconds) before SQL would decide it has a better use for that page. So a page life expectancy of 300 means that if you read a (well, "the") page from an 8kb table that you use once a day, that page will probably stay in the buffer pool for 5 minutes - 5 x60 seconds.
Another way to think of this is, your entire buffer pool could be shuffled every 5 minutes. Now, back in the x86 days, you might have a 1 gig buffer pool; that's about 3 megs of reads per second, and maybe 1-2 megs of writes per second, if you have a busy database with lots of read/write activity.
Back in those days, Page Life Expectancy was judged to be healthy if you had 900, or worst case, 300. But in https://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/, Jonathan Kehayias makes a good case for basing it on buffer pool size. He recommends about 300 per four gigs of buffer pool.
As a person who digs into SQL Server performance for a living, I'll throw out another point. The real goal is to have SQL Server be able to flush dirty pages and pull in newly needed pages "at leisure". If you have a good SAN system capable of running 300 meg/second, then for you, 50 meg/second is pretty low speed. 50 meg per second is 3gb per minute - call it 2.5 gb to account for reads and dirty page writes. You could handle about 12.5 gb in 5 minutes, never going about about 17% of your disk capacity. But I'll also point out that this requires knowing and trusting your capacities, making assumptions, and so forth.
The main point is, Page Life Expectancy is an idea about how quickly your buffer pool is churning, and this is partially determined by the speed of your disks. A damn good rule of thumb is what Mr. Kehayias suggests. If you're in that range, you know that if you have a problem with buffer pool, it's probably the disks, not the buffer pool size.
One warning: Yes, we're suggesting that the higher the buffer pool, the higher the page life expectancy we'd like to see.
Does that mean if you're at 8gb buffer pool, you might want to drop to 7gb so you don't need so much Page Life Expectancy? Stop - laugh. Of course not; you know that. Dropping the buffer pool will reduce page life expectancy far faster, and bumping up the buffer pool will increase PLE far faster, than these targets will grow.
(Did you laugh? You should. There's a very human tendency to get wrapped up in the benchmark and forget what it means. A good laugh now at how ridiculous it would be to forget "bigger buffer pool = better PLE" might help you avoid that trap some day when you're exhausted and stressed and trying to fix the (expletive deleted) server - or when a company officer demands to know if you're suggesting a smaller buffer pool. You'll recover, of course - but a good laugh might keep you from stumbling, and laughter is a necessity in this world.)
Me, I've noticed something interesting. There's a great tendency to see small, tight buffer pools, and huge buffer pools (tens of thousands in Page Life Expectancy). This might mean there's two types of servers, under- and over-provisioned. It might also mean that the difference between "just enough" and "way plenty" is relatively small. So: don't be afraid to test out just a bit more RAM when you have the opportunity.
If you're using NUMA - Non-Uniform Memory Allocation, and the SQL Server log will list out your NUMA nodes if you are - check out Paul Randal's valuable advice: http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/. If a NUMA node is tight on PLE, it's going to hurt. And this could happen if you have a lot of stolen pages - e.g., a large memory grant - on one node. You probably should take at least a cursory glance at the NUMA levels.
That's SQL Server:Buffer NODE: Page Life Expectancy.
Now: I mentioned that SQL doesn't work on disks, it works in the buffer pool. So if you do some work on massive amounts of data - say, UPDATE STATISTICS...WITH FULLSCAN on large tables, or a massive index rebuild, or a huge bulk insert - you'll see Page Life Expectancy drop. And then you'll see it start to grow. That's normal. Look at overall averages, obviously - but if it keeps dropping really low, that might be a problem, which leads us to our next counter.
Whatever your page life expectancy is, look for LazyWrites Per Second. The LazyWriter is usually the timekeeper in the buffer pool - when SQL has nothing better to do, it sweeps through the buffer pool, reducing TTL scores and flushing dirty pages - hence, "lazy" writer. But if SQL needs more free pages, it will call up the LazyWriter to be a bit more aggressive. If you see occasional LazyWrites, no big deal. If you see that your average LazyWrites Per Second is significantly higher than 0, that probably means you have some buffer pool contention, and might want to see what you can do about that. (Yes, I said "significantly higher than 0". If it was 1 or higher, it's probably gotten to be a bit of a problem already.)
If you look through all of these counters, and you see them all looking good, you've got a good, healthy server, most likely. But even good healthy servers can run into problems.
These next counters aren't always good at finding server health, but they can be very useful in one part of my job: figuring out what went wrong. I'll call this "forensics", though this might trouble some linguistics geeks.
A web application stops responding. Who's at fault? SQL Server (the web back-end) or the web server? Or you get a deadlocked, or non-yielding scheduler warning in your SQL Server log. What happened?
In PerfMon, if you collect Process counters, for all processes, there are some things you can look at in SQL Server (usually sqlservr.exe, in the Process counters) counters.
First is Threads. If SQL Server suddenly started using a lot more threads, just before, or soon before, a lockup occurred, that gives you a big hint that something changed. SQL Server uses worker threads to service queries. A relatively small number of threads can handle a lot of queries, because most queries take just a few milliseconds. The worker thread accepts the query, runs the query, returns the results, and moves on to the next query. If SQL needs a lot more threads, it might mean a lot more queries are coming in, that they're all taking a longer time to run, or some combination of these things. Of course, SQL can be sitting stable at its top number of working threads and suddenly lock up as well, due to a change in query performance, or number of queries, as well - but I've often seen SQL jump in thread count before a problem, and this often gives us a time tick to investigate the logs on the front end server.
Another good counter to review is Working Set. Working Set is how much RAM the process is using, but only counts RAM that's subject to paging. So: if you're using Lock Pages In Memory (so SQL Server's buffer pool is using locked memory), the RAM SQL is using is the working set, plus the buffer pool. If you're not using Lock Pages In Memory, working set (hopefully!) includes the entire buffer pool.
If Working Set changes - especially if it drops precipitously! - that can be valuable evidence that something is going on. Working Set Trim - where Windows decides that it's time to grab a bunch of Working Set memory for other purposes - is a real possibility. Also, if SQL Server working set drops, see what happens to the Total working set counter - does it mirror SQL? And here's a time where we should see what's happening in Memory: Available Megabytes and Pages Per Second.
Note that these won't provide any hard and fast answers - these are just giving you ideas of what could be going on, and might help you figure out when things started to go south.
Another fascinating counter I found a use for recently was SQL Server:Wait Statistics:Waits in Progress.
This counter family seems super-valuable; checking SQL waits is one of the top methods for improving SQL Server performance. The trouble is, when I've tried to cross check them against a SQLDiag report, I've often had troubles connecting them - even the seemingly obvious, like "Cumulative wait time (ms) per second". But I did find that "Waits In Progress" can be an awfully useful indicator of what might be happening. If you keep seeing more than a few waits for any wait type showing up, you can get a sense for contention. Page IO Latch waits continuously spiking? You probably know the answer to that - disks, or buffer pool, or some combination. Page Latches? Start digging into what kind of pages, what kinds of latches, and where? Lock waits? Blocking. And the neat thing is, you can see how many of these collect at any one time, and how far apart they're spread. Lots of page IO latch waits, clustered near a lot of lock waits? This suggests disk insufficiency (or a very tight buffer pool) causing extended blocking, whereas just the lock waits, without any others, suggests a common query being blocked for an extended time.
None of these are as useful as a live capture of a SQLDiag report (or PSSDiag if you work with Microsoft SQL Support). Ah, but what if you can't get authorized to run a full SQLDiag (even without a trace or Xevents collection)? PerfMon is often easier to get permission for; Windows is always collecting PerfMon counters; you're just asking to write a set to disk. And they can give you some ideas on what to dig into more deeply - they can help give you an overall picture of the state of the system.
So: let's try something out, which might become a tradition for this blog, based upon those "crazy kids and their intertubes" - the TL/DR synopsis.
The Too Long/Didn't Read Synopsis:
Good Perfmon Counters for SQL Diagnostics and forensics:
Logical (or Physical) Disk: Seconds Per Transfer - <10ms good, 10-20 usually adequate, 20-50 usually painful (at least occasionally) 50+ms: usually a serious bottleneck (This applies to MDF and LDF disks - but might to the ERRORLOG disk - ERRORLOG writes synchronously! - and definitely to pagefile disks. Others - you be the judge.)
Bytes Per Second: How much traffic is the disk seeing? How much should it handle? It's reasonable to see more latency at the upper end of capacity. If an 80 meg/second disk is slowing down, only at 80 megs per second, we're not surprised. But if it's slowing down at 30 meg/second, or heaven help us, *1* meg/second - yes, it happens! - worry.
% Idle Time: A good disk is 90+% idle, almost always. If not, something odd is going on. (Could be getting hammered - could be misconfigured, outdated driver, or on its last legs.)
Memory: Available Megabytes - higher than 1000 and stable - stable is important! - for x64. 800 stable is more clearly acceptable than 1000-2500 (but both can be healthy); more isn't bad, and 5% as a minimum for "big RAM" seems like a reasonable safe harbor. (So: never dropping below 6gb free on your 128gb server is a reasonable safe harbor.)
Memory: Pages Per Second - if Available RAM is changing rapidly or is obviously low, pay attention to paging. Otherwise, Memory Mapped Files can make this look scary high when it's normal.
(OS
Processor: %Total CPU: Less than 80% sustained - you're probably doing fine, but most healthy SQL Servers just tend to have 30% or less, most of the time. If you're growing over time, that's relatively normal - if you're spiking, worry. Sometimes looking at individual cores is useful - especially if MaxDOP is 1. But do you really have time to investigate 32 (or lord have mercy, 256?) cores' behavior?
%User time: SQL runs almost - almost - exclusively in user mode. This is the cap on (most) SQL CPU usage, except for Filestream calls.
(Again: OS gurus have more to look at - but DBAs can usually stop here.)
SQL Server:Buffer Manager:Page Life Expectancy:
Please see http://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/, section titled "What’s Wrong about Page Life Expectancy >= 300?".
I like to see about 50seconds per gb in the buffer pool as an absolute minimum - a tad bit easier than "300/4gb" = 75seconds per gb. Super-zippy disks can make lower values acceptable. See also: http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/.
LazyWrites Per Second: This is buffer pool contention - SQL needed to call the lazy writer to dump some pages to make room for other pages. Some lazywrites, sometimes, is (are?) fine. If your average is meaningfully > 0, and not caused by a huge, huge spike, you've got buffer pool contention.
Buffer Cache Hit Ratio: I check this once in a while, but usually for shock value. It used to be meaningful, but buffer pools have gotten so huge these days, and read-ahead so good, I've never seen it much less than 99%. But if it is, it might be the final stone in the bucket.
SQL Server: Buffer Node: did you check Mr. Randal's advice above? Why the heck not? Paul Randal helped build the durn product! Listen to the man!
Sneaky Statistics - for forensics, mostly.
Process: SQLServr: Threads - if SQL suddenly uses a lot more threads, it's building up a backlog of connections/queries. WHY?
Process: SQLServr: Working set. Did it just drop? Bad news. Grow? Something just 'released' or SQL needs a boatload of memory. Remember: LOCKED PAGES IN THE BUFFER POOL AREN'T COUNTED!!!
SQLServer:SQL Statistics:Batch Requests/Sec: High? Low? (Dropped to 0 meaning SQL's locked?)
SQLServer:SQL Statistics:(re)Compilations/second: Compared to batch requests, if this is high, we'll see slowdowns and higher CPU (yep, compiling is often more expensive than running!)
SQLServer:Wait Statistics:Waits In Progress: I've had a hard time mapping the other wait stats to hard and fast SQLDiag collected waits - but "Waits In Progress" seems accurate. This can point to thread exhaustion, or simply give you a good starting point for digging into oddball waits you don't quite see in other areas.
And now: I'm allowed to ramble to my heart's content! I hope you find this entertaining or informative. If not - why'd you keep reading, silly-person?
PerfMon is one of the best tools for checking performance of almost anything you're working with on the Windows platforms. There might be better methods of finding answers - some "scorecard metrics" that you can pull up more quickly and easily - but for me, there's something special and magical about diving into perfmon counters. Look - disk usage shoots up. What else happens? OH! Paging! I bet our memory is moving - sure enough, look! Megabytes Available just plummeted from 1500 to 400 - this server is in big trouble!
It's not always that obvious, of course. But while Sherlock Holmes calls out the Baker Street Irregulars, or Dr. House plays cane-ball, I open up a set of perfmon counters and pull out my favorite bits of data to let me think about things.
What are my favorites?
Well - remember, this is for SQL Server. So it's all balanced in that direction.
First I'll pull up Logical Disk - logical disks are your C: drives, your D: drives, and so forth. They might also be your L:\SQLLogs\MSSQL\ drive, if you're using Mount Points. If so, you probably have a separate L: drive - and I won't even try to guess what counters will show where. Does a write to L:\SQLLogs\MSSQL only show up there? Or does it also show in L:? I don't think so - but trust your perfmon counters here - if they're showing latency, there's latency somewhere!
The drive letter is a key difference between Logical and Physical drives - physical drives might be 0, 1, 2, 3... with no (obvious) mapping to the drive letter or directory. Sometimes the letter mappings show, but not always. So I use logical. But checking both can be useful, because if physical and logical drives show different values, it can help isolate where a performance issue is occurring. But I never learned how to determine which counter means what - I have another team to do that for me! - so I just grab logical drives. (I believe that physical is pure hardware level - physical seconds per transfer measures from the last handoff from logical IO to physical IO. But I've been wrong before.)
My best metric here is "Seconds Per Transfer". A former colleague of mine, John Gose, explained that this is a great scorecard because SQL Server really doesn't care a whole bunch about "% Disk Time" (which can go > 100% if you have a RAID array), or disk queue length ("up to 1 per spindle is acceptable" - if you have a SAN backed by 250 spindles, shared by 14 applications, what disk queue is acceptable?). If SQL asks for a read or a write, and it occurs in under 10ms, SQL is probably very happy. I've seen some sources say that 7ms is better for a log file transfer (which will almost always be a sequential write) - but 10ms is generally good enough there, too, if you don't have poor wait statistics.
What if you don't have "10ms or less?" Well, that depends. The rule of thumb I heard from Microsoft CSS was "10ms or less is good; 10-20ms is bad enough that you'll notice; 20-50ms is definitely a bottleneck; 50+ms - we can stop looking for bottlenecks. This is the worst bottleneck we'll find." Ah - but that was back in the day of x86!
For most x86 servers, the biggest ordinary buffer pool you'd find was about 1500 meg; you could find more - even back in the days of Windows 2000, there were builds that could run with up to 32 gig. Well, databases were a lot smaller, sure, but data still grew, and someone always wanted "just one more database" on the server. It was pretty common for SQL Server to be disk-bound in this kind of situation. 50ms reads and writes probably put a huge bottleneck on your server in this case - you always needed to go to disk in the relatively near future.
With x64 servers, 32gig for a server isn't a bit surprising, and you can manage a 26 gig buffer pool on that. You can have a pretty darn big database before your truly hot data is bigger than 26 gigabytes. (Technically, 20-25 gigabytes - you always have some "stolen" pages used for other purposes.) 50ms disk reads and writes might not slow your server down all that much. Even 100ms latency might not kill you.
But 50+ms latency will do two things, for sure.
It'll wound the very soul of your DBA, if your DBA is good. (If you're good enough to recognize that this particular app is doing just fine with 50ms latency, you're also good enough to know how good you are, and how rare that is - so, to quote Boris Badenov, "sha up you mouth" about your unwounded soul!)
And it means that if you ever need those disks, your performance isn't just going to drop - it's going to go off a cliff.
So: seconds per transfer. Good statistic. By the way, remember that 10 ms is .010 seconds - count those decimal places carefully!
The other side of seconds per transfer is Bytes Per Second. A decent laptop hard drive should handle 20 meg/second, no problems. A good desktop hard drive should handle 60 meg/second, no problems. Obviously, this is a rough guideline - it's not just bytes, there's number and types of operations, and how big they are, and are they random access or sequential. But if your laptop can't handle 20 megabytes per second, with latency generally lower than 10ms, fix your hard drive - similarly, for your desktop, and 60 meg/second, fix your drive. And if you're using a solid state drive, and are scoffing at me, quit gloating! Sure, I have and love solid state drives, but sometimes we have to use the work-supplied laptop - 'nuff said.
The point is, if you see bad latency, but you're running at the limits of what you can reasonably expect, well, we have to balance the latency against the demand.
If you're on a decent (non-SSD) hard drive, and your latency increases when you hit 60-80meg/second on your desktop - well, that's probably okay. You might need more, or better, disks, but we expect to see some latency increase at the top end.
Scary thing: I've looked at a lot of perfmon that shows average latency of 10ms or less... unless you restrict the capture to times when the disks are running 1 meg/second or greater - and suddenly, latency shoots up to 20ms or worse.
Yes, I said "one megabyte per second" - when I said it was scary, did you think I was joking? If you're looking at PerfMon, and you see latency and Bytes Per Second jump up, always, always, always, set your PerfMon to an appropriate zoom, and see if anything looks bad when you don't have minutes and hours of idle disks being averaged into your PerfMon counters!
What do I advise when I see poor disks? Well - my advice is to talk to your hardware support team if you have one, or your hardware vendor, if you don't. Go to them with a PerfMon capture, and demand that they explain the results to you. Don't let them say "we tested it, and it all showed no problems." A SAN or a RAID controller will usually show no problems unless there's an actual failure detected. But if you know you bought a RAID controller, or a SAN, that can handle 150 meg/second of traffic, and you're seeing poor latency at 25 meg/second, you know something's wrong.
It might be BIOS on the controller or SAN. It might be firmware. It might be a driver, or a filter.
(What's a filter? Well - anything that intercepts IO calls. It might not actually "filter" IO calls in the normal sense of letting some stuff through but blocking others. For example, some IO system drivers are "filters" even though they pass everything through. Anti-virus programs might be filters - and they do act as a traditional filter, blocking certain activity.)
Or it could be hardware - but if it is, your hardware support (or vendor) can probably see an alert, and there was no need to "demand" they explain your performance.
As I mentioned, in my day job, I work with SQL Support - and if they run into a non-obvious disk problem, they'll call up the Microsoft Platforms team. If you need help proving you have storage issues, Microsoft Platforms support can help. That's the big reason why I don't need to dig too deeply into disk performance counters. But I do like to look at % Idle Time, too.
A disk should almost always be 90% or more idle. If a disk is only 50% idle, and latency is still less than 10ms, and it's pushing about as many megabytes per second as you're paying for - okay, that's good. One case I worked, I saw a disk that was 10% idle, running around 200 meg/second, and still showing no more than 8ms latency - I called this out to the customer, because they deserved to know that whoever set up their IO did a hell of a great job on it.
But more frequently, you see latency that's 10-15ms, bytes per second up around 20 meg, and you think the disks are maybe okay - and then you see that they're only showing % Idle Time of 75, or 50, or worse(!!!). This is a final danger sign that might get you to dig in more deeply before real trouble starts.
So, there's your disks - a good basic check on them.
Seconds Per Transfer: .010 or better.
Bytes Per Transfer: use this to judge whether your latency is acceptable. Higher latency at the upper limits of your capacity is okay
% Idle Time: A possibly canary in your coal mine - it sometimes catches incipient problems before they're more clearly visible.
What's next? Memory.
If you're old-school, like me, you used to hear about "Pages Per Second". Unfortunately, while this matters, it's tricky. Windows can use memory mapped files - and reading from a memory mapped file is "paging". And the first time you need to use a memory mapped file, well, of course it's doing a fair amount of paging! That's a feature (not a bug)!
If your server is x64 (and dear lord, I hope it is!), you should instead look at Megabytes Available. Typically, for SQL, we want to see about 1000 megabytes available, and we want it stable. I'd rather see 900 meg available (Min 850, max 950) than 1200 available (min 1000, max 1400), but both are perfectly fine - and while stability is nice, if you need to launch large-ish processes from time to time, you just can't maintain perfect stability. And if you need to launch memory hungry applications, you also need more megabytes available to run them!
Anything much less than 900, though, and we start to panic. Anything less than 500? My friend, if you told me the Titanic sunk because it's x64 navigation system had less than 500 meg available, I'd say "doesn't surprise me a bit!" until I realized you meant the actual Titanic, and weren't speaking metaphorically. (64 bit servers didn't arrive on the scene until well after the Titanic was provisioned - and I know it didn't undergo a retrofitting of its nav system!)
Now: if you see a lot of paging, and a big shift in Megabytes available (or in Working Set in your Process counters) - you might be running into working set trim, and your server can lock up while it's happening. And if you see a fair amount of paging, and Megabytes Available shifting, you might want to see what's happening, to see if you can smooth it out a bit. But a fair amount of paging, and stable Megabytes Available is usually (not always!) innocuous.
What's next? Well, Processor is a good thing to check. Generally, for SQL Server, anything less than 80% sustained CPU usage is acceptable. SQL Server will use parallelism to speed up queries, and use more CPU to try to get better performance. Until you hit 80%, adding more or beefier CPUs probably won't help much. However, semi-paradoxically, most SQL Server instance tend to run on low CPU - 20% CPU or less is not at all uncommon, and I expect to see 30% or less in a healthy server - 40% isn't bad, or even a warning sign, it's just "surprisingly high" and I'll make a mental note. SQL just doesn't have a lot to do with its CPUs most of the time.
Typically, if I see really high CPU, I'll hear that the CPU is normally less than 50%, and suddenly shot up to 100% today - that's a hallmark of a really bad query plan, often one based on data that's purely contained in the buffer pool. (A huge nested loop join on data in the buffer pool can spin CPU like nobody's business.) Update stats, and see if it helps - an update with FULLSCAN isn't necessary, but it can be a valuable troubleshooting step, since it eliminates the chance that a random sampling caused the problem - and consider dumping your procedure cache, or the query plan for the problem-query if you know it.
Dumping your procedure cache is expensive - SQL queries now have to recompile. But think about it this way - if your CPU is pegged at 100%, does it really matter if they're at 100% because queries are recompiling, or because the CPU is spinning endlessly on a bad plan? It's often - not always - worth a shot.
Now we get into SQL Server statistics. If CPU, Memory, and disks are all healthy, we know SQL Server is resting on a good foundation, and we can dig into some specific counters that tell us more about SQL Server.
One of the biggest, most obvious, counter sets is the Buffer Manager counters - how healthy is the SQL Server buffer pool?
Buffer pool is something that confuses a lot of people. They think of it is the place SQL puts data for faster access than from the disks. It's a bit more than that. You see, SQL Server never directly accesses data on disk - it reads data into the buffer pool first, then it might make changes, (hopefully computes the checksum - you're set to CHECKSUM page verification, right?), and then writes it back if changed. So, for example, if you update statistics on a huge table, WITH FULLSCAN, that whole table's going to have to be read into buffer pool before you finish.
The first good scorecard counter is Buffer Manager: Page Life Expectancy. This is SQL's estimate of how long a page that's accessed a single time in the buffer pool will sit (in seconds) before SQL would decide it has a better use for that page. So a page life expectancy of 300 means that if you read a (well, "the") page from an 8kb table that you use once a day, that page will probably stay in the buffer pool for 5 minutes - 5 x60 seconds.
Another way to think of this is, your entire buffer pool could be shuffled every 5 minutes. Now, back in the x86 days, you might have a 1 gig buffer pool; that's about 3 megs of reads per second, and maybe 1-2 megs of writes per second, if you have a busy database with lots of read/write activity.
Back in those days, Page Life Expectancy was judged to be healthy if you had 900, or worst case, 300. But in https://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/, Jonathan Kehayias makes a good case for basing it on buffer pool size. He recommends about 300 per four gigs of buffer pool.
As a person who digs into SQL Server performance for a living, I'll throw out another point. The real goal is to have SQL Server be able to flush dirty pages and pull in newly needed pages "at leisure". If you have a good SAN system capable of running 300 meg/second, then for you, 50 meg/second is pretty low speed. 50 meg per second is 3gb per minute - call it 2.5 gb to account for reads and dirty page writes. You could handle about 12.5 gb in 5 minutes, never going about about 17% of your disk capacity. But I'll also point out that this requires knowing and trusting your capacities, making assumptions, and so forth.
The main point is, Page Life Expectancy is an idea about how quickly your buffer pool is churning, and this is partially determined by the speed of your disks. A damn good rule of thumb is what Mr. Kehayias suggests. If you're in that range, you know that if you have a problem with buffer pool, it's probably the disks, not the buffer pool size.
One warning: Yes, we're suggesting that the higher the buffer pool, the higher the page life expectancy we'd like to see.
Does that mean if you're at 8gb buffer pool, you might want to drop to 7gb so you don't need so much Page Life Expectancy? Stop - laugh. Of course not; you know that. Dropping the buffer pool will reduce page life expectancy far faster, and bumping up the buffer pool will increase PLE far faster, than these targets will grow.
(Did you laugh? You should. There's a very human tendency to get wrapped up in the benchmark and forget what it means. A good laugh now at how ridiculous it would be to forget "bigger buffer pool = better PLE" might help you avoid that trap some day when you're exhausted and stressed and trying to fix the (expletive deleted) server - or when a company officer demands to know if you're suggesting a smaller buffer pool. You'll recover, of course - but a good laugh might keep you from stumbling, and laughter is a necessity in this world.)
Me, I've noticed something interesting. There's a great tendency to see small, tight buffer pools, and huge buffer pools (tens of thousands in Page Life Expectancy). This might mean there's two types of servers, under- and over-provisioned. It might also mean that the difference between "just enough" and "way plenty" is relatively small. So: don't be afraid to test out just a bit more RAM when you have the opportunity.
If you're using NUMA - Non-Uniform Memory Allocation, and the SQL Server log will list out your NUMA nodes if you are - check out Paul Randal's valuable advice: http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/. If a NUMA node is tight on PLE, it's going to hurt. And this could happen if you have a lot of stolen pages - e.g., a large memory grant - on one node. You probably should take at least a cursory glance at the NUMA levels.
That's SQL Server:Buffer NODE: Page Life Expectancy.
Now: I mentioned that SQL doesn't work on disks, it works in the buffer pool. So if you do some work on massive amounts of data - say, UPDATE STATISTICS...WITH FULLSCAN on large tables, or a massive index rebuild, or a huge bulk insert - you'll see Page Life Expectancy drop. And then you'll see it start to grow. That's normal. Look at overall averages, obviously - but if it keeps dropping really low, that might be a problem, which leads us to our next counter.
Whatever your page life expectancy is, look for LazyWrites Per Second. The LazyWriter is usually the timekeeper in the buffer pool - when SQL has nothing better to do, it sweeps through the buffer pool, reducing TTL scores and flushing dirty pages - hence, "lazy" writer. But if SQL needs more free pages, it will call up the LazyWriter to be a bit more aggressive. If you see occasional LazyWrites, no big deal. If you see that your average LazyWrites Per Second is significantly higher than 0, that probably means you have some buffer pool contention, and might want to see what you can do about that. (Yes, I said "significantly higher than 0". If it was 1 or higher, it's probably gotten to be a bit of a problem already.)
If you look through all of these counters, and you see them all looking good, you've got a good, healthy server, most likely. But even good healthy servers can run into problems.
These next counters aren't always good at finding server health, but they can be very useful in one part of my job: figuring out what went wrong. I'll call this "forensics", though this might trouble some linguistics geeks.
A web application stops responding. Who's at fault? SQL Server (the web back-end) or the web server? Or you get a deadlocked, or non-yielding scheduler warning in your SQL Server log. What happened?
In PerfMon, if you collect Process counters, for all processes, there are some things you can look at in SQL Server (usually sqlservr.exe, in the Process counters) counters.
First is Threads. If SQL Server suddenly started using a lot more threads, just before, or soon before, a lockup occurred, that gives you a big hint that something changed. SQL Server uses worker threads to service queries. A relatively small number of threads can handle a lot of queries, because most queries take just a few milliseconds. The worker thread accepts the query, runs the query, returns the results, and moves on to the next query. If SQL needs a lot more threads, it might mean a lot more queries are coming in, that they're all taking a longer time to run, or some combination of these things. Of course, SQL can be sitting stable at its top number of working threads and suddenly lock up as well, due to a change in query performance, or number of queries, as well - but I've often seen SQL jump in thread count before a problem, and this often gives us a time tick to investigate the logs on the front end server.
Another good counter to review is Working Set. Working Set is how much RAM the process is using, but only counts RAM that's subject to paging. So: if you're using Lock Pages In Memory (so SQL Server's buffer pool is using locked memory), the RAM SQL is using is the working set, plus the buffer pool. If you're not using Lock Pages In Memory, working set (hopefully!) includes the entire buffer pool.
If Working Set changes - especially if it drops precipitously! - that can be valuable evidence that something is going on. Working Set Trim - where Windows decides that it's time to grab a bunch of Working Set memory for other purposes - is a real possibility. Also, if SQL Server working set drops, see what happens to the Total working set counter - does it mirror SQL? And here's a time where we should see what's happening in Memory: Available Megabytes and Pages Per Second.
Note that these won't provide any hard and fast answers - these are just giving you ideas of what could be going on, and might help you figure out when things started to go south.
Another fascinating counter I found a use for recently was SQL Server:Wait Statistics:Waits in Progress.
This counter family seems super-valuable; checking SQL waits is one of the top methods for improving SQL Server performance. The trouble is, when I've tried to cross check them against a SQLDiag report, I've often had troubles connecting them - even the seemingly obvious, like "Cumulative wait time (ms) per second". But I did find that "Waits In Progress" can be an awfully useful indicator of what might be happening. If you keep seeing more than a few waits for any wait type showing up, you can get a sense for contention. Page IO Latch waits continuously spiking? You probably know the answer to that - disks, or buffer pool, or some combination. Page Latches? Start digging into what kind of pages, what kinds of latches, and where? Lock waits? Blocking. And the neat thing is, you can see how many of these collect at any one time, and how far apart they're spread. Lots of page IO latch waits, clustered near a lot of lock waits? This suggests disk insufficiency (or a very tight buffer pool) causing extended blocking, whereas just the lock waits, without any others, suggests a common query being blocked for an extended time.
None of these are as useful as a live capture of a SQLDiag report (or PSSDiag if you work with Microsoft SQL Support). Ah, but what if you can't get authorized to run a full SQLDiag (even without a trace or Xevents collection)? PerfMon is often easier to get permission for; Windows is always collecting PerfMon counters; you're just asking to write a set to disk. And they can give you some ideas on what to dig into more deeply - they can help give you an overall picture of the state of the system.
An introduction and some ground rules...
Hello, everyone. I used to do some political blogging, and plan to do some other blogging (about shamanism) under the nom du blog LongHairedWeirdo. Well - in my day job, I'm a SQL Server troubleshooter and DBA. I have done a fair amount of work with Microsoft SQL Support.
Do I work "for" Microsoft SQL Support? Heh. Well, if I was, someone might suggest that what I say is Official Microsoft Best Practices, or that I'm writing "for" Microsoft, or somesuch. So, let's set a ground rule here: I work "with" Microsoft SQL Support. And I know a few bits some of the insiders know. None of what I say is Microsoft documentation (except when I call it out - and provide a link!), nor does Microsoft take any responsibility forany damn-fool thing I say what I post in this blog.
At the same time, I'm good at my job, and my job includes diving into some godawful messes... I can be wrong, sure. I have been wrong. For example, just last week when I thought I'd made a mistake... no, never mind. Old joke, and I've never been able to get it to flow right.
In honesty: I have been wrong, just like anyone else. Sometimes, I'm stone-cold stupidly wrong. Everyone is an idiot, once in a while - it's the experienced idiots who manage to put together a string of idiocies in a row. But more frequently, I'll be subtly wrong, and in some golden moments, I'm wrong on a level at which the average schmoe can only dream of being wrong!
But I've been doing SQL for over a dozen years now, I've learned a heck of a lot, and while I can be wrong, far, far more often, I'm right.
I hope to post a mixture of things here - some dead-level basic, and some more advanced - and sometimes, I hope to manage to put together a combination of the two. Sometimes the most brilliant ideas are nothing but a combination of the things that are perfectly obvious, looked at from another perspective.
If you ask me questions here - well, I might try to help. I like helping people. But please keep in mind that I have only so much time and energy, and if I can't help, well, I can't. And if I won't help, I won't - sorry. It's not that I don't love you, it's that I have to choose how to spend my energies.
Second ground rule: I'm allowed to be weird and say stupid things like "...it's not that I don't love you...". I'm also the only person allowed to mock me over this, unless you and I become close, trusted friends.
But I should warn you: I have serious problems with mental fatigue, for medical reasons. That means that if I fail to respond to a question or a comment for a long time, or at all - well, it might not be anything personal. Have you ever driven and suffered from highway hypnosis, where you can pound your fist into your upper thigh as hard as you can, and it only wakes you up for a moment? If you have - imagine spending an entire day like that. Now imagine a week.
That's not how I am every day - but I have spent a week in that state; I've survived a month in that state. And it's only been recently that I've been learning what causes it. So, you see, when I say I might not be willing or able to help you out, it's serious. Many's the day I work, eat, sleep... and that's about it. But if you commented, I read it - and I'll try to answer, if I have the energy.
I'm not talking about my problems to gin up sympathy, nor do I intend to say "That's right, I'm super-tired a lot of the time and I'm still really, really damn good at what I do" (though on my worst days, that is one of my points of pride).
But that does touch on the title of the blog a bit. Sometimes, if you touch the spirit of something, you have a different kind of insight about it - one where you don't always need the full measure of raw brainpower and energy needed to get to the bottom of a tricky problem. That's what I try to develop in me; that's part of what I want my readers to pick up from this blog if I'm doing my job as well as I'd like.
Most of what I post here isn't necessarily going to be original. I do intend to cite anyone who's done something first, especially if I'm copying their work directly - so, please let me know if I forget to link back to your site! But I've come to realize over my career that original thoughts aren't necessarily required for a good blog post. Have you ever heard something a dozen different times, and only on the the lucky-13th (or 14th, or...), realized that you finally understood it? We all have different mind maps, and sometimes a different person's view of things will be the one that finally sinks in.
So, I'm not going to try to be either bold or original. And heck, this blog will probably include some notes to myself - where I finally describe something in a way that I finally get, or where I transcribe a technique that works for me.
Final ground rule for now: back when I was mouthing off politically, one did not make blog posts vanish, or edit them, without calling it out. Otherwise, people would viciously mock one for cowardice, and post screencaps of the earlier version, etc.. I do hope to keep "honest" in my future posts, calling out changes and edits, for anything other than typos or obvious clarity enhancement. But this will be open to being added to or altered, as I wish. This is, after all, my own self portrait, and that means it's mine to fiddle with.
Do I work "for" Microsoft SQL Support? Heh. Well, if I was, someone might suggest that what I say is Official Microsoft Best Practices, or that I'm writing "for" Microsoft, or somesuch. So, let's set a ground rule here: I work "with" Microsoft SQL Support. And I know a few bits some of the insiders know. None of what I say is Microsoft documentation (except when I call it out - and provide a link!), nor does Microsoft take any responsibility for
At the same time, I'm good at my job, and my job includes diving into some godawful messes... I can be wrong, sure. I have been wrong. For example, just last week when I thought I'd made a mistake... no, never mind. Old joke, and I've never been able to get it to flow right.
In honesty: I have been wrong, just like anyone else. Sometimes, I'm stone-cold stupidly wrong. Everyone is an idiot, once in a while - it's the experienced idiots who manage to put together a string of idiocies in a row. But more frequently, I'll be subtly wrong, and in some golden moments, I'm wrong on a level at which the average schmoe can only dream of being wrong!
But I've been doing SQL for over a dozen years now, I've learned a heck of a lot, and while I can be wrong, far, far more often, I'm right.
I hope to post a mixture of things here - some dead-level basic, and some more advanced - and sometimes, I hope to manage to put together a combination of the two. Sometimes the most brilliant ideas are nothing but a combination of the things that are perfectly obvious, looked at from another perspective.
If you ask me questions here - well, I might try to help. I like helping people. But please keep in mind that I have only so much time and energy, and if I can't help, well, I can't. And if I won't help, I won't - sorry. It's not that I don't love you, it's that I have to choose how to spend my energies.
Second ground rule: I'm allowed to be weird and say stupid things like "...it's not that I don't love you...". I'm also the only person allowed to mock me over this, unless you and I become close, trusted friends.
But I should warn you: I have serious problems with mental fatigue, for medical reasons. That means that if I fail to respond to a question or a comment for a long time, or at all - well, it might not be anything personal. Have you ever driven and suffered from highway hypnosis, where you can pound your fist into your upper thigh as hard as you can, and it only wakes you up for a moment? If you have - imagine spending an entire day like that. Now imagine a week.
That's not how I am every day - but I have spent a week in that state; I've survived a month in that state. And it's only been recently that I've been learning what causes it. So, you see, when I say I might not be willing or able to help you out, it's serious. Many's the day I work, eat, sleep... and that's about it. But if you commented, I read it - and I'll try to answer, if I have the energy.
I'm not talking about my problems to gin up sympathy, nor do I intend to say "That's right, I'm super-tired a lot of the time and I'm still really, really damn good at what I do" (though on my worst days, that is one of my points of pride).
But that does touch on the title of the blog a bit. Sometimes, if you touch the spirit of something, you have a different kind of insight about it - one where you don't always need the full measure of raw brainpower and energy needed to get to the bottom of a tricky problem. That's what I try to develop in me; that's part of what I want my readers to pick up from this blog if I'm doing my job as well as I'd like.
Most of what I post here isn't necessarily going to be original. I do intend to cite anyone who's done something first, especially if I'm copying their work directly - so, please let me know if I forget to link back to your site! But I've come to realize over my career that original thoughts aren't necessarily required for a good blog post. Have you ever heard something a dozen different times, and only on the the lucky-13th (or 14th, or...), realized that you finally understood it? We all have different mind maps, and sometimes a different person's view of things will be the one that finally sinks in.
So, I'm not going to try to be either bold or original. And heck, this blog will probably include some notes to myself - where I finally describe something in a way that I finally get, or where I transcribe a technique that works for me.
Final ground rule for now: back when I was mouthing off politically, one did not make blog posts vanish, or edit them, without calling it out. Otherwise, people would viciously mock one for cowardice, and post screencaps of the earlier version, etc.. I do hope to keep "honest" in my future posts, calling out changes and edits, for anything other than typos or obvious clarity enhancement. But this will be open to being added to or altered, as I wish. This is, after all, my own self portrait, and that means it's mine to fiddle with.
Subscribe to:
Posts (Atom)