Friday, June 5, 2015

TempDB MDFs and spinlocks - and other time sinks

I don't want to start any great blog wars, but I saw a blog post today that made me wanna say WTF[1]. It wasn't that it was wrong or bad, but it was pushing some ideas that you shouldn't push too hard, because they can waste your time.

First tip I saw was "one TempDB MDF file per processor core, every time." Hey, I'll be the last person to scoff at the possible problems that TempDB contention can cause, but....

The problem started in SQL Server 2000. Sometime around SP3... or was it 2? 4? Yanno what? It's 15 years old! Do we care precisely when?

Anyway: The SQL product group set it so that SQL would go through TempDB MDF files in a round robin fashion, to avoid the worst aspects of contention in TempDB. And here, they recommended that you create extra TempDB files to go through in a round-robin fashion, to avoid contention, one per CPU, up to 8, and then stop - unless you needed more, then add 2-4 at a time until you saw contention stop.

The thing is, in 2005, a new feature was added where TempDB could cache table structure (unless you did certain things to those tables - this looks right,re: the things you can and can't do but I can't vouch for it.[3] This means that there's already greatly reduced contention on those TempDB files. I'll grant you: when talking to non-technical bosses, pointy-haired or not, "One file per core" is better than the more complicated "40 cores, so we'll start at 8, no, make it 12, and add more as needed". But this fine blogger was suggesting this as an absolute rule. And it's not.

The other thing that was discussed was tracking spinlock stats. Me? I like spinlocks. Just the name is cool. "Spinlock". And dropping it in to the conversation seems crazy-smart: "While your CPUs are over burdened, I think the bigger reduction in throughput might be caused by spinlock contention." Won't you just sound like such a Brainiac your skin might turn green? (Well, for the classical Braniac(tm), at least.)

It isn't a bad idea to track spinlock stats. It really isn't. But I got to see a fascinating case recently and since the subject came up (don't be unkind and mention that I brought it up!) let me tell you something I got to see.

Problem was CPU usage. CPU was 80% or more on a lot of CPU cores. And this wasn't normal on this server - they'd run before at 30, 40% CPU. Well, unexpectedly high CPU means "check statistics, and chase down problems in query plans." That was done. CPU remained high. So more digging was done. I felt oh-so-proud when I checked spinlocks and saw an amazing 1000-2000 spins, per core, per millisecond. That had to be a big, big deal, right? Right?

I'll skip to the punchline. Later on, I saw that this same server was running at 40% CPU... with the same 1000-2000 spins, per core, per millisecond. I was surprised. After all 1500 spins per millisecond is 1.5 million spins per second, which is ... um... order of magnitude, 0.1% of the speed of a 1.5ghz processor. Ouch.

This might still have been a point of contention - it might be that if we could reduce this contention we'd see throughput increase, and without needless spins, maybe we'd see CPU drop to the healthier 25-30% I like to see.

(Um. Let me explain. When I look at a healthy SQL Server, CPU tends to be less than 30%. 35% isn't a problem, heck, 50% isn't a problem. 50% busy is 50% idle, so while some people see the CPU as half busy, and some see it as half idle, I've moved on to other performance counters rather than making up a punchline to this joke. More seriously: if I see CPU > 30%, I'm interested - but not concerned. "Interesting" things are the things you keep in your mind while troubleshooting - but there's a vast difference between "interesting" and "important.")

But even 2000 spins, per core, per millisecond, wasn't causing serious problems on the server as far as I could tell. Sure, keep an eye on spinlocks, spins per core per ms, and especially on backoffs. But keep in mind you can have a pretty darn high set of spins per core per ms, and still not seriously affect your server.

I'm not linking back to the site I found, because it had some darn good advice, and, as I said, I don't want to get into back and forth unkindness. But there are always caveats and there is always care to be taken.



[1] As you might guess, that means "Why... The Frilly-panties-of-Venus!". She left them on my chandelier. I suggested that might make people thought I viewed her as a conquest, and she quite reasonably pointed out that she was one of my conquests.... [2]

[2] Chess. Best out of three games. She's weak on queen side openings. There's probably a joke there, but I ain't going going for it, or she might get miffed. Miffed goddesses are never a good thing.

[3] Edited with strike through - in my day job, I do sometimes have to vouch for knowledge (or stress that I can't), but that doesn't belong here, especially not when I'm linking to someone more knowledgeable on a subject than I. I humbly apologize.

No comments:

Post a Comment