Tuesday, October 13, 2015

XML and SQL Server - with some actual Extended Events content, I promise!!!

As promised, we're going to start in on Extended Events now - kind of. We're going to do a bit of looking at Extended Events around the edges, and I'm going to give you some examples of working with XML in SQL Server - because trust me, manipulating XML in SQL is a bit of an art. Now: Extended Events are intended to replace most of what Profiler does - not everything that Profiler does, but a lot of it. And when you created a data collector in Profiler it was called a Trace. In Extended Events, it's called a "session". I like to think that this terminology made if far more excusable for me to fail to put Extended Events together for a bit!

Now, a very nice thing about extended events is that you don't just have to collect them in a trace, either in memory or in a file. You can have multiple nifty kinds of targets. You can send them to a file; the approved extension is XEL. In SQL Server 2008, there was also an XEM file, a format file - but those aren't used in SQL Server 2012 and later. You can also send extended events to a ring buffer - that's a data structure that wraps around and overwrites older entries with newer entries when it fills up. And you can use some other, special purpose targets that can tally up data for you in a variety of ways, or even collect unpaired events (for example, Batch Starting with no Batch Completed). Those other targets are valuable, but we did all of this XML so we could start by looking at files and ring buffers. To do that, let's take a look at two views: sys.dm_xe_sessions and sys.dm_xe_session_targets:



I know... I cut off the output from dm_xe_sessions - there's an entire books online page for it you can review if you want! But I wanted to show you a few obvious bits. First, you can see that the join column is the address column of sys.dm_xe_sessions and the event_session_address of dm_xe_session_targets. For files, target_data contains file information - in the event_file row for event_session_address 0x00000003ED4D10C1, I copied the XML text in the event_data to the query window. You see that it shows you information about the file.

But for the ring_buffer target, the target_data column is the entire XML representation of the ring buffer. Yes, I said "XML representation". Time for a bit of pedanticism: the ring buffer data isn't actually stored in XML, which is a pretty inefficient method of storing data. But when we query it, we'll see it in XML. And, if we query it, and cast the result as XML, we can throw it into an XML variable, or column, and draw information from it as needed. So, here's a query you could use to get that XML data:
declare @QueryProcessingXML xml
select @QueryProcessingXML = cast(xst.target_data as xml) from sys.dm_xe_sessions xs inner join sys.dm_xe_session_targets xst on xs.address=xst.event_session_address where xs.name='system_health' and xst.target_name = 'ring_buffer'


We're not going to work with that - a 4meg ring buffer is a bit too much to display on a blog entry. So, let's take a look at XEL files instead! To draw information from the file, we can use fn_xe_file_target_read_file. This reads an Extended Event XEL file. I'm working from SQL Server 2012, so I'm not worrying about XEM files, and, I'm able to show off some slightly more interesting data from the server_health extended event session.

Just like with the sys.dm_xe_session_targets, the fn_xe_target_read_file has an "event_data" column, and you can also cast this as XML. You can pull in one row per event captured, with an XML representation of all data captured in the event_data column. I did this while creating this blog post, and pulled a column from one of the Query_Processing diagnostic results, so I could demonstrate how we're going to slice and dice our XML data from extended events.

I then did a DECLARE @QueryProcessingXML XML; SELECT @QueryProcessingXML= '
...and then my blob of XML data, and what I put in my half-quotes was this:


<event name="sp_server_diagnostics_component_result" package="sqlserver" timestamp="2015-10-04T21:11:43.193Z">
  <data name="component">
    <value>2</value>
    <text>QUERY_PROCESSING</text>
  </data>
  <data name="state">
    <value>1</value>
    <text>CLEAN</text>
  </data>
  <data name="data">
    <value>
      <queryProcessing maxWorkers="544" workersCreated="57" workersIdle="17" tasksCompletedWithinInterval="1537" pendingTasks="0" oldestPendingTaskWaitingTime="0" hasUnresolvableDeadlockOccurred="0" hasDeadlockedSchedulersOccurred="0" trackingNonYieldingScheduler="0x0">
        <topWaits>
          <nonPreemptive>
            <byCount>
              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="287139" averageWaitTime="825" maxWaitTime="89175239" />
              <wait waitType="IO_COMPLETION" waits="17158" averageWaitTime="3" maxWaitTime="1605" />
              <wait waitType="WRITE_COMPLETION" waits="16468" averageWaitTime="0" maxWaitTime="12" />
              <wait waitType="THREADPOOL" waits="5068" averageWaitTime="0" maxWaitTime="41" />
              <wait waitType="CLR_AUTO_EVENT" waits="3192" averageWaitTime="148586" maxWaitTime="89227233" />
              <wait waitType="QDS_PERSIST_TASK_MAIN_LOOP_SLEEP" waits="2454" averageWaitTime="96626" maxWaitTime="89232595" />
              <wait waitType="PAGEIOLATCH_SH" waits="946" averageWaitTime="99" maxWaitTime="5705" />
              <wait waitType="PAGEIOLATCH_EX" waits="464" averageWaitTime="6" maxWaitTime="21" />
              <wait waitType="CMEMTHREAD" waits="272" averageWaitTime="0" maxWaitTime="2" />
              <wait waitType="CLR_MANUAL_EVENT" waits="222" averageWaitTime="9" maxWaitTime="115" />
            </byCount>
            <byDuration>
              <wait waitType="CLR_AUTO_EVENT" waits="3192" averageWaitTime="148586" maxWaitTime="89227233" />
              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="287139" averageWaitTime="825" maxWaitTime="89175239" />
              <wait waitType="QDS_PERSIST_TASK_MAIN_LOOP_SLEEP" waits="2454" averageWaitTime="96626" maxWaitTime="89232595" />
              <wait waitType="LCK_M_S" waits="26" averageWaitTime="4808" maxWaitTime="11197" />
              <wait waitType="PAGEIOLATCH_SH" waits="946" averageWaitTime="99" maxWaitTime="5705" />
              <wait waitType="IO_COMPLETION" waits="17158" averageWaitTime="3" maxWaitTime="1605" />
              <wait waitType="FFT_RECOVERY" waits="30" averageWaitTime="929" maxWaitTime="2329" />
              <wait waitType="FT_IFTSHC_MUTEX" waits="9" averageWaitTime="2273" maxWaitTime="17228" />
              <wait waitType="SLEEP_DBSTARTUP" waits="153" averageWaitTime="100" maxWaitTime="112" />
              <wait waitType="PWAIT_ALL_COMPONENTS_INITIALIZED" waits="3" averageWaitTime="2316" maxWaitTime="2326" />
            </byDuration>
          </nonPreemptive>
          <preemptive>LHW:Preemptive waits deleted for space - but they normally list both count/duration as well</preemptive>
        </topWaits>
        <cpuIntensiveRequests />
        <pendingTasks />
        <blockingTasks />
      </queryProcessing>
    </value>
  </data>
</event>' -- that's my closing half-quote from above!


Pictures, and a scrollbox - I've never done either of these before, and I just know this post is going to look awful. Give me a mysterious performance issue any day of the week over trying to make a nice looking blog entry.

If I just wanted to know a single fact - e.g.: do I have a lot of threadpool waits by duration? Any signs of IO issues? - this would be fine. But the odds are, I want to do some more in-depth analysis. Do I see my top waits by count or duration changing? How many workers do I have, with how many idle, over time? How often do we have blocking tasks included in this report?

For this, we will use a few methods that SQL Server has built in.

Time for a quick vocabulary lesson: "Methods" is a term from object oriented programming. It's really just a name for a function, but a function that does something. The goal of OOP is to have your object do things (or have things done to them). Well, how do you do things to an object in a consistent way? You have a set of steps - a method, if you will - of doing the same thing to different objects. And voila! we have a name for this particular type of function. So, a method's just a function, but usually one attached to an object. In this case, we might have a method like @QueryProcessingXML.SomeMethod() - and that's the key difference between a method and a function. A method is written x.f(), whereas we normally think of a function as being f(x) - because in OOP, x, the object, is what matters.

The two methods I want to play with right now are the query method, and the nodes method. We'll run those off of the @QueryProcessingXML that I created earlier. Soon, we'll add in the value method ... but first, we're going to make sure we can find the data whose values we want. And that is why I had to write up a bit about XPath first.

Query is an amazing method - you'll see it does a lot of work for us. For right now, we're going to use it just to review our XPath infomation. For example:

Things to note here:
  1. In each case, we have a something dot method-name: xcol.query(), and @QueryProcessingXML.nodes(). We'll explain the names in just a moment :-)
  2. The "nodes" method returns raw data with no table structure, so SQL Server demands that it must be aliased. We pretend (because SQL Server makes us) that it creates a table, with a named XML column. Our table name in this example is tbl, and our XML column is xcol. See? That's why it's xcol.query() above.
  3. This would point to all data nodes with a name attribute of "component" that are children of event, the root node - we know we only have one here, but you should be aware of this - for example, if /event had a thousand data children with a name of "component" we'd get a thousand XML blocks here.
  4. The output of this is a set of XML blocks, generally smaller ones. While I'm not sure, I believe that complicated queries are faster against small blocks of XML - and this is why so many people use this to slice-n-dice their XML into smaller blocks
  5. Another advantage of slicing off blocks of XML is that there's more freedom to use those XML blocks in subsequent queries, with no fear that they might contain multiple data[@name="component"] further down the line
  6. You've just been handed a toy with which to futz around with the XPath skills that you picked up in the last post - go use it!
I'm only half-kidding on the last point. I'm going to show you a few more examples of the kinds of things we can do with the query and nodes tools, but if you're like me, you need to practice, practice, practice with these things, because otherwise every time you think about digging into XML, you'll freeze up because you don't want to have to re-review all that XPath stuff.

So:

We could pull in the waits by duration using this:
select xcol.query('.') from @QueryProcessingXML.nodes('/event/data[@name="data"]/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait') as tbl(xcol). That's an absolute path description.

Or if we thought it was a terrible drag to type in all those values, and we only wanted the first 8 values, we could use this:
select xcol.query('.') from @QueryProcessingXML.nodes('//byDuration/wait[position()<=8]') as tbl(xcol)
IMPORTANT NOTE: As you'll see if you read through the XML above, there are both nonPreemptive and preemptive waits in these results - so on a normal QUERY_PROCESSING block, you'll get 16 results. That's the downside of zipping through XML with a too-broad XPath. I've grown to love the // XPath operator, but it can quickly cause more problems than it solves, if you're not careful.

Or for a more reasonable predicate, we could choose only those whose waits that were highest byCount, and also were averaging higher than 0:
select xcol.query('.') from @QueryProcessingXML.nodes('//byCount/wait[@averageWaitTime>0]') as tbl(xcol)

It might be really annoying that Extended Events dump data into XML, rather than nice rowsets, but doesn't that last example show off some of the flexibility of using XML?

As I noted earlier, with these last two examples, you're taking a bit of a risk. You might think that "byCount/wait" isn't going to be re-used for any other block of data, but you can't be sure. You'd probably be better off making sure you're in a QUERY_PROCESSING component block. How could you do that?

As you'll find out a bit further down, you don't generally need to. But let's create a more complicated XPath expression, just for fun. We want precision, here, and we know that we want an event. But that event has to have a special property, so that will be event[]. What property is that? Well - it has a data child, with a name of "component". So, that would be event[data[@name="component"]]. But that's not enough - the "component" must have text of "QUERY_PROCESSING".

Well, then: that's event[data[@name="component]/text="QUERY_PROCESSING"]

So far so good. We then want the data child of this event, that has a name attribute equal to "data". And under that, we know there's a byCount set of waits. Ready?
select xcol.query('.') from @QueryProcessingXML.nodes('event[data[@name="component"]/text="QUERY_PROCESSING"]/data[@name="data"]//byCount/wait') as tbl(xcol)

That's: Give me the event, having a data node with a name attribute="component". But that component name must include a child, text="QUERY_PROCESSING". Once we have that event, we look at its child node data (name attribute="data" as well), and then, under that, we look for any and all "byCount" nodes with "wait" children. And if we decide that we don't want the preemptive waits, we can just change this to:

select xcol.query('.') from @QueryProcessingXML.nodes('event[data[@name="component"]/text="QUERY_PROCESSING"]/data[@name="data"]//nonPreemptive/byCount/wait') as tbl(xcol)

This is still pretty darn wordy, but it might save us a bit of trouble if we're digging through a larger XML document, where there might be a lot of similarly named node paths. Conceivably.

"Conceivably?" you ask? I'm afraid so. Believe it or not, while that was a good exercise in developing an XPath descriptor from the data we have, and increasing our understanding of predicates... we didn't need to do that. In this particular case,
@QueryProcessingXML.nodes('event/data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait')

will give you the same set of nodes. We don't need to specify the attributes in this case, because the path alone uniquely takes us to the values we need.

So, now that I led you down one path to a red herring, let's give you something you actually want. How do we pull values from this? That's done with the value() method. The value method has two arguments - the XPath(/XQuery) that isolates a single value, and then a data type. So:

select @QueryProcessingXML.value('(event/data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait/@waitType)[1]','nvarchar(50)')

... will pull out the first wait type. We need to use the brackets to pull just the first. As you can see we would need to do a lot of writing if we wanted to pull out all 20 waits here with all 4 attributes for each wait - and I've cut out the preemptive waits, so we'd normally have 40! That leads us to one of my primary annoyances with this particular capture.

As you can imagine, it's really easy to pull out all waits attributes:

select n.value('(wait/@waitType)[1]','varchar(50)') [Wait Type], n.value('(wait/@waits)[1]','bigint') [Wait Count], n.value('(wait/@averageWaitTime)[1]','bigint') [Avg Wait],n.value('(wait/@maxWaitTime)[1]','bigint') [Max Wait] from ( select xcol.query('.') as n from @QueryProcessingXML.nodes('//wait') as tbl(xcol) ) as tbl_outer

The problem we run into here is that, while we've pulled out the individual waits, we lose the upper attributes - like whether the wait is top ten byCount or byDuration, preemptive or nonPreemptive, or, more importantly, the timestamp of the capture. The time stamp is an attribute of the top level event in our example. In case you're curious:



I think that's the great, great, great, great, great great grandparent. So, sometimes, XML can be a pain. Is this a serious problem? Not really. Since I analyze SQL Server performance as part of my job, I yearn for the ability to review the top ten waits for each time frame, mostly byDuration (but byCount might matter too). Well, we can get all of that, but when we use this "slice and dice" method, we get one row per node - if we run through on the event level, it's hard to grab each set of ten waits out in one fell swoop. So how do we fix this?

First, obviously, you could just select the XML bits into XML columns in a table. Do you want one column with all waits, preemptive and non? Or two, or four columns, for preemptive and non, and maybe byCount and byDuration? For raw data capture and storage, and display, if you've been following along and playing with XML, it should be starting to lose any intimidation it may have had. This idea has a certain appeal.

However, in accordance with an old Internet Maxim[1], the obvious, best solution is to dump the timestamp and the XML into a temporary table, and then cursor your way through it, running 4x10 selects to pull out the separate rows, then pick your dates, and see how your waits changed... which we can do almost as easily if we've saved the waits as XML. And we'd lose a lot of the flexibility, and we'd have ten rows with the same timestamp, and.... yeah, let's go with the XML.

So: let's review. Over the past three blog posts, we've reviewed the very basics of XML, and then a bit about how XPath works - hopefully enough so you can see how these examples worked. But honestly? My real hope is that you can start to read through something like Jonathan Kehayias's excellent XEvent A Day series. Not because that's the be-all and end-all of Extended Events but because when I started to read through it, I found that my XML difficulties kept me from following along. I could kind-of understand what he was doing, but not being able to read the XPath information was driving me crazy.

Well, in life, it's good to be the change you want to see - so, if you want a world where there's a quick-start set of blog posts on XML, XPath, and SQL XML methods, and how they relate to Extended Events - sometimes the best option is to try to create one. So here's my effort. Don't worry (or do worry?) - I don't plan to stop blogging about Extended Events yet. But I feel that this set of posts gets over the big XML hump, and I hope they help someone out there. Or, I hope they're so terrible they convince someone else to do them better - it's good to be an inspiration, after all....

[1] That maxim states: "The best way to learn something is not to ask a question; it's to post something wrong."[2]
[2]Worked pretty quickly this time, didn't it?

Sunday, September 27, 2015

Brief intro to XPath, pursuant to Extended Events discussion

Okay, as I mentioned earlier, if you want to work with Extended Events in SQL Server, you'll be best served if you know enough XML manipulation to work with the output. And that means you'll want to be able to manipulate XML enough to find, and present, the data that matters to you. For example, if you pull the data from a ring buffer, you'll end up with a big XML document containing everything in the ring buffer; with a file, you can usually pull in smaller XML documents, one per each event, but there may be a fair amount of information in each - and it will vary, depending on what you decided to capture.

So you'll need a way to identify, and obtain data from a mess of XML. XPath is the first part of being able to do this. Earlier, I mentioned XQuery - but it turns out that all examples I've seen use XPath alone - XPath has query-like properties, because it describes parts of the XML document. An XPath construction, by itself, points to data, and you need only use a few tools to extract what you need from what you've pointed to.

Terminology first. XML has "elements". These are like <AnElement AnAttribute="AttributesAreAlwaysInQuotes">Some Text</AnElement>. Everything - open tag, all attributes, all text, closing tag. XPath also talks about "nodes" - these can be an element, but they can also be an attribute, or the text (and some other things, but we don't need to worry about those). These can be part of the XPath expression. Although we won't use this much, it helps to realize that XPath uses familial terminology - a tag that encloses another is a "parent" tag, and parents, plus "grandparents", etc., are all "ancestors"; similarly, the tag enclosed by the parent is a "child". Fellow children of the same parent are siblings, and children, plus "grandchildren" and so forth, are descendents.

Okay: since XML has a root element, and every all new tags must close within their parent tags, we can think of an XML document as kind of like a Unix directory. If you've never used Unix (or not knowingly), the top level directory of a Unix file system is /. You then have a subdirectory (which might be a disk mapped to that directory name). You could easily represent the Windows file system this way by defining / as the root, and the /c as the C: drive, /d for the D: drive, etc.. There are a fair number of similarities. For example, XPath uses both absolute and relative paths. Here's some sample XML: <Vehicle> <Car Make="Chevy" Model = "Sprint"> <Car_Data>My first car...</Car_Data> </Car> </Vehicle>. An absolute path there would be /Vehicle/Car/Car_Data - or even /Vehicle/Car/Car_Data/text(). You can also use relative paths. If Vehicle is not the root of our XML document (maybe it's part of "Assets"), then we can think of a relative path: Vehicle/Car/Car_Data, or Car/Car_Data. For relative paths, especially, there's also another similarlity to file systems: . (a single dot/period) refers to the current location/node, and .. (two dots) refers to the parent location/node. This is the basic "bread and butter" of a search through an XML document. But you quickly run into problems with this.

First: look at my XML - what if I expand this document to a list of all cars I have ever owned? Then, /Vehicle/Car won't be unique, right?

This is actually one of the wonderful things about XPath. /Vehicle/Car can be non-existent, unique, or have hundreds of values. XPath is sort of like an auto-select; "this is the path to all Root-Level Vehicle, child Car, elements." This is really nice, and really slick, when you think about it - it's like a path, and a select, all in one! But a select without a where clause is like a banana split without the banana. It's nice, but there's something critical missing! Fortunately, XPath has all kinds of where clauses.

A simple one is an array-like quantifier. /Vehicle/Car[1] means the first Car that's a child of Vehicle. There are also some functions you can use in the square brackets: last() will get you the last item, so last()-1 will get you the penultimate, and last()-2 will get you the antepenultimate. These are really useful when you need to deterministically return a singular item (rather than a list). Remember, /Vehicle/Car is all Car values that are children of the root Vehicle - so /Vehicle/Car[1] means "the first (and in our case, only) Car". Another neat example for square brackets, one that's a good reminder that XPath is path, plus select, is how you can point to up to the first three cars that are siblings (same parent) with /Vehicle/Car[Position()<=3]. Obviously, that last one won't allow you to return a single item/value!. Still, if Microsoft had a common event for QUERY_PROCESSING health, and it included wait information, and you wanted the top 5 waits... oops, I'm getting ahead of myself.

Also, remember how we said that attributes, and text, are nodes as well? They are - but when you want an absolute path to an attribute, you need to preface the attribute name with @. So, the path to the Car Make would be /Vehicle/Car/@Make. Pause here for a moment - while this is a path, it's a path to an attribute value, while /Vehicle/Car is a path to an element. This will matter - the "return value" is a bit different, right? But we'll get to that later.

For the text data in an element, you need to point to the text() operator. So we'd want /Vehicle/Car/Car_Data/text() to point to "My first car...".

When you start with a / and the root element, you're using an absolute path. Per all XPath information I've found, a relative path, like "Car/Car_Data" should be a way to find any Car_Data child of a Car node. However well this is documented, it doesn't seem to help in the Microsoft XML methods I'm going to be demonstrating later. But it's a good way to introduce another operator: //, a double slash - this is an operator that tells XPath that you don't care where something occurs, so long as it does occur, and this gives us a way to use those relative paths. So, //data will pull any data node, and //data/value will point to any value node that's a child of a data node - or, //Car/Car_data will find the values we hoped for earlier.

For additional searching fun, /event//data will find any wait node that's a descendent of an event node. Or, /DataCollection//event//data will point to all data nodes that are descendents of an event node that are part of the DataCollection XML document, however deeply either the event, or the data, is nested. You can combine these with the array-quantifiers above, but be careful. /DataCollection//event//data[1] means the first wait event in every set of /DataCollection//event//data elements - if you have 10 events, each with data as a descendent, you'll get 10 values. If you want just the first value in the document, you want to use parentheses to clarify this. So, (/DataCollect//event//data)[1] means the first match in the document. Similarly, /DataCollection//event//data[position()<20] would point to the first 19 data siblings in each collection of data-siblings; (/DataCollection//event//data)[position()<20] grabs the first 19 in the entire document. So if you have three sets of 10 data nodes, the first gets all of the data nodes (up to 19 siblings of each group of 10) whereas the second doesn't even quite collect the first two (all 10 siblings in the first group, and only 9 slots remain for 9 of the 10 siblings in the second group).

Okay, stop for a moment, and think about what we've discussed. You can find all nodes of a particular absolute path, and you can find all occurrences of certain relative paths. You can trim these to the first n items - you actually have all the knowledge you need to create an XPath to any single or top-n group of items in a simple XML document whose structure you know beforehand. If XPath stopped there, you'd already be able to do some relatively useful things.

But XPath does more than that. Earlier, I said that the square bracket operator was array-like. It isn't. You can use the powerful selection criteria allowed within the square brackets for something as simple as position in a list. But you can put a great many different selection criteria inside those brackets. For example, if we wanted to be sure we looked only at Chevys in our Vehicle list, we could use the absolute path /Vehicle/Car[@Make="Chevy"].

That's kind of cool, isn't it? But let's continue to pretend we have a lot of cars here, okay? This is way cool - I promise. You can select /Vehicle/Car[Car_Data="My first car..."]. You can say "give me Cars, but only if they have a Car_Data child with a specific text value.

But it gets even cooler. let's redo our XML a bit:

<Vehicle>
<Car Make="Chevy" Model = "Sprint">
<Car_Data>My first car...<Contents>My_Keys</Contents>
</Car_Data>
</Car>
</Vehicle>

We could use /Vehicle/Car//Contents[.="My_Keys"] to find the car that has my keys in it. Interesting note here: I have Contents tagged as it is because while HTML is mostly whitespace agnostic, XML is funny. In text, whitespace is preserved (but all newlines are converted to line feeds). The text of Contents starts the character after the > and continues until the next tag definition, or the closing tag of Contents. So I closed up Contents to avoid having whitespace matching, or line feed aliasing, issues.

Also, just to make sure it's clear, although the predicate in the brackets is often at the last item of the list, it doesn't have to be. /Vehicle/Car[@Make="Chevy"]//Contents[.=My_Keys"] is perfectly acceptable - but if I use that, I better hope I didn't leave my keys in a Ford! As I'm sure you can guess, when you do this, all predicates must be satisfied - it's an AND combination, not an OR.

Okay: now, if I've done my job well, after reading this, you have an idea how to read some simple XPath statements. You can follow an absolute path, or a relative path; you can you can make decisions based upon attributes or child values, or even descendant values. If I threw a blob of XML at you, and showed you how to extract information using SQL Server's XML methods, you could piece together what I'm doing. And that's my next step.

Wednesday, September 23, 2015

Very brief XML introduction - pursuant to Extended Events discussions

I haven't been blogging nearly as much as I hoped - and I hadn't hoped for much more than a single post a month, which means blogging has been pretty damn skimpy!

But I decided that the world needed a bit more information about Extended Events in SQL Server, and, since I was studying them for my own professional development, it was time to try to share them with others.

Problem: if you want to do much of anything useful with Extended Events, you need to know a bit about XML. Talk about a pain!

XML is one of those things, you know? On the face of it, it's stone-cold simple. Enclose text in tags, sometimes even nested tags. You can also have attributes within the tags. And it's extensible, because you can add new attributes, or new tags, and you won't (necessarily) break something that doesn't know to look for the new tags - just like with old web browsers, if you don't know what a tag means, ignore it!

Ah, but XML in SQL Server is a brand new beast. Sometimes you want to check a few things - are they there, or not? - and sometimes you want extract a few values, from text, or attributes. Sometimes you want to pull out some XML, sometimes you need to poke around in different places... you can do all of this, and because you can do all of this, you can process Extended Events results a bit easier than otherwise.

Step one: we need to discuss a bit about XML so you understand enough to know how it works.

There are many, many places to learn about XML out on the web. I'm not going to try to re-invent the wheel, so I'm going to spell out the most important parts here:

  1. XML is eXtensible Markup Language - the "markup" is done by tagging, putting opening and closing tags around a section of text.
  2. There must be a root tag to the document - all other tags will open and close within this.
  3. XML is case sensitive! Never forget this, it will save you a lot of grief if you remember. But you'll probably forget, as I do, far too often.
  4. Tagging is atomic - you can't cross your tags. You can have <Car><Make>Ford<Model>Fiesta</Model></Make></Car>- your tags are all nested - but you can't have <Car><Make>Ford<Model>Fiesta</Make></Model></Car>. If Model starts in Make, it must end there too.
  5. Tags must be closed - you can create a self-closing tag by adding a slash near the end. We could have, for example, </Model Name="Fiesta" />
  6. Tags can have text data - the text enclosed in the tags - or attributes, like the "Name=" in the previous point.
To expand upon this:
<?xml version="1.0"?>
<Memorobilia>
<B5>
<Weapon>
<PPG type="resin_sculpture">
<Quantity>4</Quantity>
</PPG>
</Weapon>
<Souvenir>
<Bear type="Teddy">
<Desc>JS embroidered shirt</Desc>
<Quantity>2</Quantity>
</Bear>
<Londo_Mollari_Figure>
<Quantity>1</Quantity>
<Desc>Posable 'action' figure </Desc>
</Londo_Mollari_Figure>
</Souvenir>
</B5>
</Memorobilia>
Notice that we start and end with "Memorobilia", and that everything we've collected is categorized as B5 (for Babylon 5, a darn good science fiction TV series). We could have eliminated either tag - but by enclosing the B5 tags in Memorobilia tags, we have room to add other forms of memorobilia. Sorry: Memorobilia. XML is case sensitive! And if we want to add this to other collectibles that aren't exactly Memorobilia, we can enclose this entire structure within a new set of tags. That's one of the neat things about XML.

This isn't the best, most well designed XML schema, but it's enough to do some basic understanding. Notice, interestingly, that there's little actual text data in here! But that's perfectly reasonable - as with the car example above, we don't necesssarily need text if we have the right categorizations.

You can also see that some of the data is included, not by tagging/categorization, but by attibutes - we can mention that it's a "Teddy" bear by throwing in an attribute on the Bear (in case there's a posable bear, or a statue of a bear), while noting that the Londo figure, of all the Souvenir "children", has no attributes.

I mentioned "children"; that will be the subject of my next post, when I'll discuss XPath (and maybe XQuery).

Interesting issue with log file shrinks...

So, I was looking at this fine blog entry from Lisa Gardner, one of Microsoft's PFEs. And I decided to do a bit of futzing around with it.

If you're too lazy to follow the link (or if you can't see where the link goes and won't follow a link blindly - GOOD FOR YOU!), there's an undocumented command called DBCC LOGINFO in SQL Server. It will show you how many VLFs you have, and show you a nominal status for each of them - are they free, or in use?

I hope you all know that you don't want too many VLFs - less than a hundred is good, and more than a hundred only if you absolutely, positively have no other reasonable choice. Too many VLFs can slow down recovery(which includes database startup/availability when SQL starts!), transactional replication, mirroring/Availability Groups, and probably more that I can't think of. For all that it's undocumented, it's a good command.

Well, as Lisa points out, when you're using an availability group, DBCC LOGINFO has an interesting quirk.

Normally, when you run a log backup, DBCC LOGINFO will show all of the truncated VLFs as having a status of 0 - not in use. Lisa pointed out that if the server is in an availability group, while the log might be truncated (i.e., the VLFs are available for re-use) DBCC LOGINFO might still show them as having a status of 2 (in use).

Let's step back a bit to make sure we're clear on this. The Transaction Log File of a database is like a whiteboard. Each VLF is its own discrete section of whiteboard. When you "truncate" the log, all VLFs that are freed up by the process are now erased - clean, and clear, and ready to accept new data. These VLFs are still there; the log file is still the same size. But now the log file has more empty space (in the form of VLFs) to use. (This is why I use the whiteboard metaphor - erasing a section of whiteboard doesn't make the whiteboard smaller, it makes the blank space bigger.)

DBCC LOGINFO normally tells you which VLFs are at least partially in use, and which ones are blank.

But when the database is in an availability group, it may not be honest. You may have space (re-usable VLFs) but DBCC LOGINFO might show them as in use, status of 2.

Well, I ran into an oddball problem. On an availability group, I tried to shrink a log file - and it wouldn't. (Yes, I know. It had been bloated by one-time operations.) It had been backed up - I even triggered the backup 2-3 times. And DBCC LOGINFO showed all VLFs in use. DBCC SQLPERF(LOGSPACE) (which shows all databases, log file sizes, and percentage of log files in use) showed nearly nothing, less than 1%, of the log file was in use. So, I'm comfortable that I ran into this DBCC LOGINFO display strangeness.

When this happens, the usual trick is to run the log backup once or twice to truncate just a bit more data. I did that - and I still couldn't shrink the log, and DBCC LOGINFO still showed all VLFs in use.

In this case, I figured out what was happening. My log backups were all occurring on the secondary replica of the database. As soon as I took a log backup on the primary replica, DBCC LOGINFO showed almost all VLFs clear, and I could shrink the log file back to its normal size.

What's going on here? I'm not sure, but it would make sense if DBCC SHRINKFILE uses DBCC LOGINFO to find out which VLFs are not in use. When you shrink a log file, it must shrink on VLF boundaries. If it's using DBCC LOGINFO to find inactive VLFs, it might not be finding any. And that might be causing the problem.

It looks like a backup on the primary replica might fix this. I also saw someone fix it with several log backups, and several checkpoints - the log backups were on the secondary replica, and, of course, the checkpoints were issued on the primary. Some combination of Secondary Replica log backups, and Primary replica checkpoints cleared the DBCC LOGINFO VLF status values - and allowed the file to shrink.

One warning: if you do choose to take a log backup on your primary replica, when you normally run log backups on your secondary, I very strongly recommend that you move or copy the log backup to the same location as your normal log backups. When you have to restore log file backups, you don't want to have to dig around in multiple locations - you want them all in the same place!

Friday, June 5, 2015

TempDB MDFs and spinlocks - and other time sinks

I don't want to start any great blog wars, but I saw a blog post today that made me wanna say WTF[1]. It wasn't that it was wrong or bad, but it was pushing some ideas that you shouldn't push too hard, because they can waste your time.

First tip I saw was "one TempDB MDF file per processor core, every time." Hey, I'll be the last person to scoff at the possible problems that TempDB contention can cause, but....

The problem started in SQL Server 2000. Sometime around SP3... or was it 2? 4? Yanno what? It's 15 years old! Do we care precisely when?

Anyway: The SQL product group set it so that SQL would go through TempDB MDF files in a round robin fashion, to avoid the worst aspects of contention in TempDB. And here, they recommended that you create extra TempDB files to go through in a round-robin fashion, to avoid contention, one per CPU, up to 8, and then stop - unless you needed more, then add 2-4 at a time until you saw contention stop.

The thing is, in 2005, a new feature was added where TempDB could cache table structure (unless you did certain things to those tables - this looks right,re: the things you can and can't do but I can't vouch for it.[3] This means that there's already greatly reduced contention on those TempDB files. I'll grant you: when talking to non-technical bosses, pointy-haired or not, "One file per core" is better than the more complicated "40 cores, so we'll start at 8, no, make it 12, and add more as needed". But this fine blogger was suggesting this as an absolute rule. And it's not.

The other thing that was discussed was tracking spinlock stats. Me? I like spinlocks. Just the name is cool. "Spinlock". And dropping it in to the conversation seems crazy-smart: "While your CPUs are over burdened, I think the bigger reduction in throughput might be caused by spinlock contention." Won't you just sound like such a Brainiac your skin might turn green? (Well, for the classical Braniac(tm), at least.)

It isn't a bad idea to track spinlock stats. It really isn't. But I got to see a fascinating case recently and since the subject came up (don't be unkind and mention that I brought it up!) let me tell you something I got to see.

Problem was CPU usage. CPU was 80% or more on a lot of CPU cores. And this wasn't normal on this server - they'd run before at 30, 40% CPU. Well, unexpectedly high CPU means "check statistics, and chase down problems in query plans." That was done. CPU remained high. So more digging was done. I felt oh-so-proud when I checked spinlocks and saw an amazing 1000-2000 spins, per core, per millisecond. That had to be a big, big deal, right? Right?

I'll skip to the punchline. Later on, I saw that this same server was running at 40% CPU... with the same 1000-2000 spins, per core, per millisecond. I was surprised. After all 1500 spins per millisecond is 1.5 million spins per second, which is ... um... order of magnitude, 0.1% of the speed of a 1.5ghz processor. Ouch.

This might still have been a point of contention - it might be that if we could reduce this contention we'd see throughput increase, and without needless spins, maybe we'd see CPU drop to the healthier 25-30% I like to see.

(Um. Let me explain. When I look at a healthy SQL Server, CPU tends to be less than 30%. 35% isn't a problem, heck, 50% isn't a problem. 50% busy is 50% idle, so while some people see the CPU as half busy, and some see it as half idle, I've moved on to other performance counters rather than making up a punchline to this joke. More seriously: if I see CPU > 30%, I'm interested - but not concerned. "Interesting" things are the things you keep in your mind while troubleshooting - but there's a vast difference between "interesting" and "important.")

But even 2000 spins, per core, per millisecond, wasn't causing serious problems on the server as far as I could tell. Sure, keep an eye on spinlocks, spins per core per ms, and especially on backoffs. But keep in mind you can have a pretty darn high set of spins per core per ms, and still not seriously affect your server.

I'm not linking back to the site I found, because it had some darn good advice, and, as I said, I don't want to get into back and forth unkindness. But there are always caveats and there is always care to be taken.



[1] As you might guess, that means "Why... The Frilly-panties-of-Venus!". She left them on my chandelier. I suggested that might make people thought I viewed her as a conquest, and she quite reasonably pointed out that she was one of my conquests.... [2]

[2] Chess. Best out of three games. She's weak on queen side openings. There's probably a joke there, but I ain't going going for it, or she might get miffed. Miffed goddesses are never a good thing.

[3] Edited with strike through - in my day job, I do sometimes have to vouch for knowledge (or stress that I can't), but that doesn't belong here, especially not when I'm linking to someone more knowledgeable on a subject than I. I humbly apologize.

Wednesday, May 13, 2015

Thoughts about checksums and changes

So, I'm looking at this server, and I'm thinking "we don't have to worry about data corruption causing X_Issue, if the Page Verify option is set to Checksum in database properties." (See https://technet.microsoft.com/en-us/library/ms190249%28v=sql.105%29.aspx, recovery options.) Trust me, you want checksum set - in return for a tiny, tiny, tiny amount of CPU during a write to disk, you get verification that the write went to disk, and a subsequent read came back from the disk, without any damage occurring to your precious data.

"Precious..." you say doing your best Gollum imitation? Hey, Gollum's got nothing on a DBA guarding a database! For data integrity, a good DBA will even outdo a sysadmin. Believe it.

Checksum's the default - you need to do a bit of work to get it turned off - or, you need a database that was birthed in SQL Server 2000 or earlier. Checksum was introduced in SQL Server 2005, and it's been the default ever since.

As I was pondering this, I thought about changing a database from page verify = NONE to checksum, and then I thought, hey, what if a database is damaged? Oh my lord, what if we set checksum and suddenly we start flagging error 824! Will people think the checksum corrupted the database?

At that point, I realized I was tired. Really tired. Because that's one of the silliest thoughts that could come into my head - I know better! And if you know your checksums, you know better too - even if you don't realize it yet.

Checksums are computed at writes. If we had a thousand damaged pages already, and turned on checksum, how many 824 errors would be caused by that event alone? None. Even if SQL Server went out and computed a checksum on each and every page in the database (and it does not!!! - see the "myth" link below), it would be running a checksum on the already damaged data. Checksums aren't magical. A checksum is simply set up so that if there are small differences in the data, the correct checksum for that data will be different. So: if you compute a checksum, and send a data page to disk, and then, later, recompute the checksum when it's read back into memory, if the checksums match, you're 99+% certain that the data hasn't been damaged in transit or at rest. If they don't match, you know something terrible has happened.

But if data's already damaged, and you compute a checksum, all you are doing is making sure that if it gets damaged again, you'll notice.

If you turn on checksum verification, only the dirty pages in the buffer pool will automatically obtain checksums computed. Hey, here's an interesting thought: what if we cut power right after we set the checksum option? Could we still say all dirty pages in the buffer pool would get a checksum? Well, then it would be a race - but if you follow that Technet link above, you'll see that setting an option runs a checkpoint on the database - so all the dirty pages should immediately get flushed, and hence, checksummed, as long as the checkpoint completes. So, a new blog post, and the basis for a kinda-cool SQL Server trivia question. Not bad for a Wednesday night!

For a bonus trivia question, is it possible that turning on checksum page verification could reveal a bunch of damaged pages? Yes, but it requires a contrived situation to occur.

Stop and think a moment to see if you can get the answer - it's kind of cool once you figure it out.

Okay: if you turn off Checksum page verification, SQL stops checking checksums, and stops computing them thereafter. But remember, when you change the page verification option, SQL doesn't immediately add (or remove!) checksum information until the page is read into memory, changed, then written back to disk. For maximum horror value, let's back up this database, and restore it somewhere. (You do know that corruption in a database usually transmits perfectly via a backup, right?) Now, when someone restores this database, they might notice that checksum is turned off, and turn it back on. We probably still have pages with checksums computed on them - it'd be awfully strange if every single data page had been read and changed and written back to disk! So, if there were some IO errors leading to some incorrect checksum values (technically, the checksum is correct, it's the data that's incorrect) SQL would resume computing and comparing checksum values, and could suddenly uncover a bunch of errors it would have detected if we hadn't turned off checksum in the first place. And yes, good luck explaining that turning on checksum didn't cause the corruption.

So: the moral of this story is, never turn off checksum page verification, unless you need to construct an absurd hypothetical. In fact, don't even turn it off then - make it a thought experiment.

For more on checksum, and the myths about it, please see this page from Paul Randal's "myth a day" series.

Thursday, May 7, 2015

Interesting changes in indexes...

Wow! While digging into another issue, I found some significant changes in index maintenance on the MSDN Alter Index Page for SQL 2012.

First: usually, if you rebuild an index, you get a new set of statistics on the indexed column(s) for free. It'd be silly to read every value, sort it all, and then not use that data to build a new statistics histogram. That's not true if you rebuild a partitioned index. Quoted from the above:

In SQL Server 2012, statistics are not created by scanning all the rows in the table when a partitioned index is created or rebuilt. Instead, the query optimizer uses the default sampling algorithm to generate statistics. To obtain statistics on partitioned indexes by scanning all the rows in the table, use CREATE STATISTICS or UPDATE STATISTICS with the FULLSCAN clause.


That's interesting, but the shocker to me was another tidbit.

When dealing with database integrity errors, I hold my breath while looking at the index ID for problems in an index. Why? Because: Index ID 0 or 1 is raw table data. Index ID 0 is the heap; index ID 1 is the clustered index. (You only ever have one - Index ID of 0 rules out an Index ID of 1, and vice versa.) Index IDs of 2 and higher? They're "just" indexes that can be rebuilt without any risk of data loss.

Ah, but there's a tricky bit now:

In earlier versions of SQL Server, you could sometimes rebuild a nonclustered index to correct inconsistencies caused by hardware failures. In SQL Server 2008 and later, you may still be able to repair such inconsistencies between the index and the clustered index by rebuilding a nonclustered index offline. However, you cannot repair nonclustered index inconsistencies by rebuilding the index online, because the online rebuild mechanism will use the existing nonclustered index as the basis for the rebuild and thus persist the inconsistency. Rebuilding the index offline, by contrast, will force a scan of the clustered index (or heap) and so remove the inconsistency.


So: if you have a damaged index of ID 2 or greater, you have to take it offline to repair it. But don't think about what you're losing - think about what you're gaining! The other option is putting your database in Single User Mode, and running DBCC CHECKDB with REPAIR_REBUILD. So here, you get to take an index offline, instead of taking your whole database offline. It's a fair trade!

Most of the rest of indexing maintenance hasn't changed. If your indexes are under 10% fragmented, great - don't worry about it. Seriously - you can't always get rid of that last bit of fragmentation. For small tables, it might be literally impossible, and for larger ones, there are still probably better places to spend your energy. 10-30% fragmentation is a good time for a defragment/reorganize. If you see more than 30% fragmentation, a rebuild is your best option - but unless you're on Enterprise edition, you can't do an online rebuild so you might want to try to hold off by using a reorganize until you can get a maintenance window.

Oh, one other thing: most of the time you want to leave Fill Factor at the default of 0. A fill factor of 80 means leaving your pages only 80% full. This sounds like a reasonable idea - why pack them full, so that we have page splits as soon as we add a few rows? Why not leave a bit of space so they can fill up over time? It's not a terrible idea, but remember that data pages that are 80% full require 25% more disk space and buffer pool memory to hold them - and thus, 25% more reads and writes to deal with them. If you're having lots of page splits that are slowing you down, sure, consider trying a fill factor on the busiest indexes. But most of the time, you're better off packing them tight, to minimize disk and memory usage, and to reduce the number of reads and writes you need to perform.

Wednesday, April 22, 2015

SHRINK THE LOGFILE - to make your support engineer turn all kinds of interesting colors....

"Hey, weirdo," one of my co-workers said to me.

"Huh. I didn't know I'd discussed my blog with you," I said, surprised at the easy way he'd added my nom du blog to his vocabulary.

"You have a blog? Anyway - you've got that look on your face... no, not that one, the other one," he continued, because my expression had changed to one of exasperation.

"You know, back in the 20th century, some other weirdo came up with the notion that 'I have no head'...". This is true - and it's a fascinating point. We never see our own faces - we see evidence of our faces, in mirrors and photographs, but in a real sense, we're in our brains, a point looking out, and... oh, never mind.

My co-worker, who was fictionally quick on the uptake, said "The expression you get on your face when people talk about shrinking transaction logs!"

"Oh, that! A combination of hopeless despair and frustration with the state of humanity...".

He was right. I did have "that look" on my face. See, I'd been discussing transaction log sizing with an equally fictional customer. His transaction logs were supposed to be 10gb, you see. And they grew to 15gb, so he shrunk 'em back to 10, and they grew to 18, so he shrunk 'em back to 10, and they grew to 17, so he shrunk 'em back to 10, and then they grew to 21gigabytes, so this time, he wasn't doing any more fooling around. He shrank them back to 5 gigabytes.

Some of you are boggled, I'm sure - but yes, this does happen. Some people see their log files grow, and assume the answer is to shrink 'em back.

Now, I hope you all know that you never want to shrink MDF file if you can possibly avoid it. If you have to remove an MDF file, a SHRINKFILE with the EMPTYFILE option is fine, but the damage a file shrink can do to your indexes could bring tears to your eyes - or maybe it's the user complaints about the performance hit from the massive disk activity plus the index fragmentation.... Anyway, sure, if you have no choice, shrink the MDF, but rebuild those indexes ASAP.

Shrinking a transaction log can take on a whole new dimension of damnfoolishness potentially sub-optimal behavior. You see, since SQL Server 2005, SQL can do instant file initialization for MDFs (and NDFs, but just between us, they're the same thing!). It allocates the disk space, and *poof* - database is bigger. You have to make sure SQL has the Perform Volume Maintenance permission, but other than that, MDF growth is super fast. There's a tiny security risk, of course - a previously deleted file could live on the disk space you just allocated - and now a tricky DBA could find out what used to be in that file. I haven't met anyone who finds this to be a concern, though I also haven't worked for any three-letter government agencies.

Okay: but that's data files. Log files have to be initialized. SQL has to write the correct headers and bit patterns to the new transaction log. If you want your log to grow by a gigabyte, you need to do a full gigabyte of writes to disk. And let's face it: far too often, disks suck file writes are a significant bottleneck to server activity[1]. On top of that, your log file is growing because you're doing a lot of writes to it already. So, just when your log disk is really busy, you're making it even busier, and maybe making your entire database put all changes on hold until there's fresh log space.

You really want to avoid growing log files in the middle of ordinary server operations. Oh, you want to have file growth enabled - a big slowdown while a log file grows is better than the log file filling and everything coming to a halt! But you want to get ahead of the game, and prevent it from happening (except during maintenance) if you can.

And so, if your log file has grown, you really need to stop and think before you shrink. If your log file has grown to 20 gigabytes, that means there was at least one brief moment where SQL needed somewhere in the neighborhood of 20 gigabytes of log space for smooth functioning. So, sometimes, your server needs that much log space. Is there truly some reason you think it will never, ever need that much space again?

Did your log backups fail for an extended period of time? Okay - if that happened, go ahead and consider shrinking them back. Similarly, if you only took log backups once in a heck of a long time, and are now taking them far more frequently, go ahead, consider shrinking. Heaven help us, if you decided you're always throwing away your log backups, and changed to SIMPLE recovery model because it suits your needs, again, feel free to shrink 'em. (SIMPLE recovery model doesn't get enough love, so I like sending it some; when it's the right tool, use it!)

Were you performing something huge, like a fully logged index rebuild (or more than one), or a bulk insert (if so - why didn't you set to Bulk_Logged?[2]), neither of which you'll have to do for a long, long time? Sure, go ahead and shrink them back.

But in a great many cases, if your logs grow to 20gb, your database needs 20gb. And you should leave them at 20gb... eventually at least.

"Eventually?" you ask. Thank you! It's nice to have helpful, questioning readers! Yes, if your log files have grown in lots of tiny increments[3], you have a lot of VLFs - virtual log files. For some reason[4], SQL Server doesn't like to have more than a few dozen VLFs. It can handle even a few hundred, but once you hit about a thousand VLFs, Bad Things May Happen. You may see an extremely long recovery on startup, when SQL has to enumerate and read all of those VLFs. You may see poor performance in log backups, log restores, or the Log Reader Agent if you use transactional replication. You may see poor performance in mirroring, or with availability groups. And I'll guarantee you, although I've listed all the possible problems that I can recall off the top of my head, there's yet another lurking danger ready to do a SERIALIZABLE SELECT right around your short and curlies....

Okay, maybe I'm getting a bit silly here. Seriously: if your log files have grown more than, say, a dozen times (yes, ever!!!) you probably want to shrink them as small as you can, and then grow them in relatively big chunks - see Kimberly Tripp's canonical article for how to grow 'em best - until they're at a size where they won't have to grow ("at all" if you can manage it; "more than a bit" if you can't). This isn't the most critical thing you'll do today - too many VLFs won't crash your server or slow down most normal operations, most of the time. But they will sneak up on you when you least expect it - so keep them under control.

Get your log files to a good size. Leave them there. And keep them from growing more than once in a while.


[1] Seriously. You might be surprised or appalled at how often you'll see poorly performing disks "in the wild", plus how poorly they might perform.
[2] Yes, there are several good reasons not to go bulk-logged. I'm just checking to see if your answer is one of the good ones!
[3] Or lots of large increments, if your log file's really, really huge - really, it's the number, more than the size, that matters.
[4] "for some reason," as in, there's a reason, but I'm darned if I know precisely what that reason is.

Wednesday, March 11, 2015

CXPacket waits - they're not evil... they're just misunderstood!

So, everyone seems to know about poor, put-upon CXPacket waits. "That's the parallelism wait!" people will say. "You've set Max Degree of Parallelism too high!" Well... maybe. Or maybe you've set Cost Threshold of Parallelism too low. Or... maybe nothing at all.

It's true, CXPacket is the parallelism wait, but parallelism is a powerful tool - it lets SQL Server break certain query opertions down into multiple pieces, so that they can run faster. The goal isn't to eliminate CXPacket waits - it's to find a balance where so-much in CXPacket waits gives you more benefit than you lose in the waits. (And as you'll see, you don't always lose anything!)

The Too Long, Didn't Read Summary:
Don't assume you're smarter than http://support.microsoft.com/kb/2806535; if you are, prove your genius with test of your application.
CXPacket waits at a reasonably steady rate, that are almost always lower than number of cores, divided by Max Degree of Parallelism, aren't generally a bad thing. Spiky waits might be, and even steady ones could be, but it'll be hard to prove. Consider other performance troubleshooting first.
If you get much higher than that Cores/MaxDOP * 1000ms/second value, yeah, you need to figure out how to drop parallelism. This can be done by dropping Max Degree of Parallelism, increasing Cost Threshold of parallelism, or maybe by tuning your queries(!!!).

On to the main discussion:
Since CXPacket is the parallelism wait, I want to talk about parallelism for a moment. Here's the lowdown on how SQL decides if it should use a parallel plan. Once your query is nicely pulled apart into component bits, SQL Server starts to optimize it. First it looks for trivial plans, where there's only one way to do something, like "update one row in a table with a clustered index". If it finds one, it produces it, and stops. If it didn't find a trivial plan, it starts digging into other optmizations, trying to balance the anticipated cost of the query versus the time it's taken to optimize. It doesn't try to find a perfect plan (there might be billions of possible plans!); it tries to find one close enough to perfection that it would waste time to look for a still-better one. If it finds that good enough plan, it stops and delivers that plan. But if it doesn't, at the start of the next step it makes a decision. Should we try a parallel query plan or not?

If the cost of the query is higher than the Cost Threshold of Parallelism (from SP_Configure), and if parallelism is allowed for this query, it will consider both parallel and serial plans. (Parallelism can be limited by Max Degree of Parallelism, query hints, and Resource Governor - and possibly other ways that I can't remember!)

What's the cost of a query? Well - it's an estimated time to run the query, in seconds, based upon a standard set of hardware. The default Cost Threshold For Parallelism is 5 - meaning a query that would have taken 5 seconds on a single processor server with a single SCSI disk - or that's what I was told, a few years back. As always, I know that sometimes the best way to learn something on the internet is not to post a question, but to post something wrong. If that information is correct, it means that today, that "cost" is much smaller - a query that ran on a standardized server of 16 years ago would probably run 20-30 times faster, unless it was disk bound, today (and even then it might be much faster, given SSDs, SANs, and even plain-old-spinning platters that are nevertheless much faster than back when). So I don't think anyone should be shy about bumping up Cost Threshold of Parallelism. Let the relatively cheap queries run serially; they'll still be fast.

What about Max Degree of Parallelism? The official Microsoft recommendations are here: http://support.microsoft.com/kb/2806535. They come down to this:

1) if you're using Hyperthreading, ignore it. Cut your number of logical processors in half, to just your physical cores.
2) Max DOP of 0, or N, if you have N processor cores, and N less than 8. (This assumes you have only one NUMA node - which is likely, but not certain!)
3) Max DOP of 8, even if you have more than 8 processors - unless you've tested it, and find that more than 8 works better for you. Except:
4) No more than the number of processors in one NUMA node.

The basic idea goes like this: splitting query operations has diminishing returns and comes with more overhead, and greater use of resources. Split it into too many, and you can actually waste time dealing with the extra overhead. More importantly, when an operation get split into many pieces, each piece has to use a worker thread. If too many of your queries are using parallel operations, with too many worker threads, you can run into a worker thread shortage. That's would be bad. Think Egon-level bad.

Experience shows that you usually get the most bang for your buck up to about 8 threads working on a query operation at once. After that, you can start to end up in situations where you're losing more than you gain. Not always - but frequently. So frequently that 8 really is a pretty good starting maximum; it isn't a hard cap, but it's considered a very good practice. With NUMA - well, NUMA is "NonUniform Memory Access - each CPU core has some memory "near" (faster to access) and some "far" (slower to access). So: Microsoft recommends against guaranteeing you'll need both Near and Far memory in the same query. I've seen people disagree with this - people who think the extra boost of a few more processors working the query will overcome the disadvantage of mixing memory. They might be right! But test your application before deciding you know best - all the best instincts in the world don't beat a good load test.

Okay: enough introductory stuff: on to CXPacket waits!

Once a parallel operation starts, the main thread running the query spins off its child tasks. I've seen it said that it always spins off MaxDOP tasks - if your MaxDOP is 8, all parallel tasks split into 8 parts. I've also seen a claim that SQL will determine on the fly how many child tasks to spin up when the plan is compiled. I can't say which is the truth, but I can say, every time I've seen a parallel query in sysprocesses, or sys.dm_exec_requests, or similar reports, they've always had MaxDOP child tasks. Regardless - this parent task now goes into a wait on CXPacket.

Here's where we see something interesting - if you have 32 processors, and Max DOP set to 8 then if you have 4000ms/second waits in CXPacket, and they're more or less consistent, what does that mean? Well, it might well mean that you constantly have 4 different parallel queries running in perfect harmony. And 4000ms/second is a pretty substantial wait - if you saw this much in average waits on Page_IO_Latch, or blocking, you'd have every right to be concerned. But you don't quite yet have reason to be concerned here. If you're using a tool like SQL Nexus, you might see that you have big bursts of CXPacket waits, instead of relatively steady waits - and that's often a sign of a problem. But if you graphed your CXPacket waits out, and they were nice and stable over time, and just about 4000ms/second, you're probably hitting the sweet spot for parallelism. So that's one reason CXPacket isn't evil. CXPacket waits prove that you're using parallel queries - but it doesn't tell you if you're gaining more than you lose. Up to this limit, we have reason to suspect we're probably gaining more than we lose. We're not certain, but we don't have strong evidence that CXPacket waits are a problem - so let's look for other things to optimize first.

Here's another way CXPacket isn't evil. Same situation: 32 processors, Max DOP of 8, about 4000 ms/second of CXPacket waits. Let's say you drop Max DOP to 4, and your CXPacket waits go up to 6000. Good thing or bad thing? Well, remember how we decided that 4000ms/second of waits wasn't bad when we had 32 processors, and a MaxDOP of 8. If we constantly had 4 parallel queries, taking up 8 cores each, we'd have 4000ms/second of CXPacket waits. Here, we could have up to 8 queries, running on 4 processors each - we could have as much as 8000ms/second of CXPacket waits, if parallel queries were constantly running. But instead, we only have 6000ms/second. We may well have better overall use of parallelism - even though our poor, put-upon CXPacket wait looks worse!

This is, obviously, a contrived example. Real life is rarely ever as clean as this example. But this does show you that CXPacket can often be, in fact, just misunderstood. Ah, but it can be evil, too!

There's another way to gain CXPacket waits. When the main worker splits off the child tasks, the main worker goes into CXPacket wait. And then, if a child task finishes before the others, it also goes into CXPacket wait. And if this happens, you're using up a lot more worker threads - remember, don't cross the streams, and don't run out of worker threads! - and you're not getting anywhere near the performance benefit from it that you should. The best that can be said of this situation is that the CXPacket waits probably aren't as bad as they look - i.e., they're not as bad as if they were lock waits, or latch waits, etc., because the parent thread always sucks up CXPacket waits equal to the duration of the parallel operation. Nevertheless, when your CXPacket waits are above your hypothetical optimum level - number of cores, divided by Max DOP, in ms/second - it's definitely time to do some parallelism reduction. If your MaxDOP is higher than recommended, drop it a bit - and if you have > 8 cores, and haven't set it yet, set it! Consider dropping it even if you're spot-on the recommendations; they're only one-size-fits-most. Consider an increase in Cost Threshold of Parallelism - don't let your cheap queries suck up lots of extra resources to save a few ms. But more importantly, look at some parallel queries and see if there's a better way to speed them up - a narrow covering index on a wide table can drop the query cost below the Cost Threshold, especially if it saves you from having to do a post-scan sort; and if you're doing a parallel scan, you should definitely consider an index to see if you can eliminate it.

So: if there's a moral to this story, it's to understand your CXPacket waits - with understanding, and proper care, you can prevent them from turning evil.

Sunday, February 22, 2015

Meditations on backup and restore...

Ohhhhmmmmmmm... Ohhhhmmmmmmm...

Okay, bad joke.

Backups are a wonderful thing. If only life let you restore to a point in time backup the way a Full Recovery Model database might allow!

Backups and restores are generally easy to deal with and think about, but there are some tricky bits, so let's think about each of them.

The Too Long, Didn't Read summary:

Simple recovery mode: Nice and easy. You can do full backups, or differential. You can restore full backups or differentials. It can get complicated when you learn about "partial" backups - that will be another blog post, I hope! You can also do COPY_ONLY backups, and these maintain the differential base - vital for making sure no one is confused about which differential goes with which full, when making regularly scheduled backups.

Full: You must take log backups, or your log file will not truncate. If you every need to truncate your log file without taking a backup, you really need to consider using Simple recovery model instead! You can take full (including COPY_ONLY), differential, or transaction log (including COPY_ONLY) backups. You must pay attention to the log chain - mostly, a broken, or altered, log chain. You can restore a full, differential, or COPY_ONLY backup - and COPY_ONLY maintains its location in the log chain, and the differential base. You can restore from any one point in time to any other, so long as you have a full, or full+differential (or, technically, full+differential+log chain) to the first time, and a continuous (i.e., unbroken/unaltered) log chain to the second.

With Full, and an unbroken chain, you can perform miracles - you can restore the whole database to any point in time covered by the full and the log chain; you can restore filegroups, individual files, or even corrupted pages, and sometimes, do these things while the database remains online (though obviously, for file/filegroup restores, the affected filegroup will be offline) Note that this is separate from piecemeal backups and restores - you can restore a single damaged file from a full backup (optionally: plus differential), plus the correct log chain. You can also do this from a piecemeal backup, of course.

Bulk-Logged: Saves you space in the transaction log, but not in the backups, if you use "bulk" operations. Is precisely the same as Full if you perform no bulk operations; if you perform a bulk operation, any log backups taken from start to finish of the bulk operation may be all-or-nothing when restoring. As a best practice, consider staying in Bulk-Logged only when you know you should be, and immediately switching back, and closing the log chain, with a fresh full, differential, or transaction log backup.

And now: on to the post!

First, remember, SQL Server has 3 recovery models: Simple, Bulk-Logged, and Full.

In Simple recovery model, the transaction log truncates each time a checkpoint is run (either manually - you can issue a CHECKPOINT as a T-SQL command - or automatically). And you can take Full backups, and Differential backups - you can't take a log backup, and SQL would have no way to use it if you could. You can also take a COPY_ONLY backup to maintain your differential base.

"Differential base?" you ask. Okay: When you take a full backup, SQL Server makes changes to the database - it clears the differential bits that mark which extents have been modified. ("Extent"? SQL uses files in 8kb pages, and groups 8 8kb pages into one 64kb extent.) At this point, you have a new differential base. If you take a new differential the instant this process finishes, your differential backup should be essentially empty - header information only, with a record that no changes had been made.

Let's pretend you take your full backups each Sunday. This sets your differential base. You might think this is silly, but - you want to maintain that differential base until next Sunday, and yes, it really is important. Why? Because in emergencies, people are pumped up on adrenaline. And this is well established: when you're hyped on adrenaline, you are better at doing things that you've drilled in and are good at doing, and worse at things where you need to improvise. So you don't want anyone to have to remember that there's a new full backup, nor guess where to find it, nor have to compare file date/timestamps, and get them right. You want them to be able to do everything one particular way, and it should be a way you've practiced. (Yes, Mr. or Ms. Manager, on company time; nothing encourages good practice like being paid for something other than the daily grind. Do this, and you'll all be heroes when disaster strikes, and believe me, it will.)

So: you can restore a full backup, and that puts your database in a consistent state as of the time the backup completed. If you restore WITH NORECOVERY, you can also restore a differential backup on top of that full backup, which will be consistent as of the point in time that backup completed. That's pretty much all you can do with Simple recovery model, if you're not using partial backups and restores.

If, for some reason, you needed a copy of your database, you have some choices. First, you could take a new full backup. If you're doing fulls, plus differentials, this is a bad idea, unless it can replace your next scheduled full backup. Again: remember, if you need to restore in an emergency, people will be hyped - don't make them freak out when they find tomorrow's differential doesn't restore to the full backup taken at it's normal schedule, because someone took another full backup.

A better option is to take a fresh differential, and use your previously scheduled full, plus the new differential, to do whatever you need to do. But SQL Server has a nifty option to create a COPY_ONLY backup - this is the equivalent of a full + differential in one single file. And, as I said, this doesn't reset the differential base. I have tried to restore a differential taken after a COPY_ONLY backup, on top of the COPY_ONLY - it worked. It should work - it's the same differential base, and a later differential backup can be added to the restore chain if needed. So you can also restore a full backup, a differential backup, and then (assuming the first differential was restored WITH NORECOVERY) you can restore another, later, differential backup. But you shouldn't - the second differential backup contained all the same information as the first, so it was a waste of time to restore both.

(If you're a bit like me, once you heard that you 'shouldn't' restore a second differential, you started to imagine circumstances in which it might be necessary. That's fine - one example is if you're 90% sure you need Monday's backup, but you're checking records to see if maybe you need Tuesday's; and checking records takes longer than the restore. If that's the case, go ahead! This is good to think about; puzzles are great! But always remember that puzzles are puzzles and in the real world you'll almost always want one full, one differential, and that's it.)

(What do you do if you restore a full backup, restore Monday's backup WITH NORECOVERY, and find you don't need Tuesday's - how do you get the database online? You run RESTORE DATABASE My_Database_Name WITH RECOVERY -- except, of course, you replace My_Database_Name with the name of the database you've restored.)

One final bit of interesting knowledge: If you switch from Simple recovery model to Full recovery model, SQL Server still doesn't have any basis upon which to restore a log backup. None of your previous backups include enough log file information for a newly taken log backup to restore. So SQL Server will continue to truncate the log file at every checkpoint, until you establish a backup that will allow restore of a transaction log file. If you've ever taken a full backup of the database, you can just take a differential backup (though if your full backup was in the far-back past, you might as well take a full backup instead). If you've never taken a full backup, you'll need to take a full backup - obviously! - to allow log backups.

I saw at least one online resource state this incorrectly - they claimed that a Full recovery database acts like Simple until the first full database backup. This is incorrect! What probably happened was that person had their Model database set to Simple, created a new database without explicitly setting it to Full recovery model, and then switched it from Simple to Full - which triggered the behavior. If you create a new database and it starts existence in Full recovery model, it will not truncate its log until you take a full backup, and then the first transaction log backup. But if you create a database in Simple, and switch to Full, the log will truncate on checkpoint until the first full backup is taken.

Now, let's talk about Full recovery. In Full recovery model (and in Bulk-Logged), you can take full backups, and differential backups, just like in Simple recovery model, and you can take COPY_ONLY backups which don't change the differential base, and don't affect log backups. You can take a COPY_ONLY full backup, and a COPY_ONLY transaction backup, but not a COPY_ONLY differential - there would be no point, a differential doesn't reset anything! You also can, and "must", take log file backups to allow your log file to truncate - that is, to re-use space inside the LDF file for new transactions. Now, I said you "must" but there are other ways to truncate a log file - and you should never use them. "Never"? Well, not "never", quite. It's just - it's one of the most ridiculous things I've seen in my time as a DBA, and as a troubleshooter. People hear wonderful things about how you have all these great recovery options with full recovery model - but they don't plan for running the log backups, or having enough space to maintain a reasonable log chain. Sometimes it's not their fault - they need to use log shipping, mirroring, Availability Groups, or some other technology that mandates Full recovery model. But in other cases, there's no excuse. If you are in Full recovery model, it's a statement that what's happening in your database is important, over time, and you need to save it over time, so you save that information in log backups. Or you use Simple recovery model, which is far easier to use, and in many cases, more efficient, than Full recovery model.

I suppose I might sound a bit over-excited on this issue, and that's not entirely unfair. It's just - the longer I've worked in IT, the more appreciation I've gained for good design; good design means making the right choices, and if you're going to break that precious log backup chain other than in the most unexpected of emergencies, you should probably be using Simple recovery model as the correct tool for your task. And I've seen far too many people use Full when Simple would have been a better choice for them.

If you stop and think about it, SQL Server database's transaction log is an amazing thing. A huge number of things could be happening to a database - one or more files could be growing, or shrinking, or new files being added; it could be completely quiesced, or running wild with activity, including massive changes to both the contents and structure of the database, and all of the data in the buffer pool is completely volatile, and may have been changed, or not changed, or in the process of being changed and the transaction log tracks everything you need so that if something were to go horribly wrong at any moment, the log file has the information needed to put things right.

When you're in full recovery model, you save that recording of what has happened in the transaction log, until you back it up. (You may save the data until other things happen too - but for now, let's just think about backups.) Unless something breaks the log chain, you can restore from a good full backup, and a continuous string of transaction log backups, to any instant of time. Someone deleted a table at 5pm Friday? You could restore the database to 4:59:59, or (if you only realized this at 6:30!) restore a copy of the database to the same time, to grab just the data in the table.

I mentioned "restoring a good backup" - let me pause for a moment. When is a backup good? Well, when you've restored it, and run a DBCC CHECKDB on the database, you can now say "that backup was good when we restored it." I'm not just playing word games here - always remember that any magnetic media can fail, so you should test your backups once in a while, and the only test that's truly meaningful is a restore that gives you a healthy database. Don't be too paranoid - magnetic media is pretty darn reliable these days - but never forget that you never have complete certainty. If your data is valuable enough, you might want alternate recovery methods - log shipping, replication, mirroring, or availability groups can allow you to have (potentially) offsite, live copies of your data, because backups can fail to restore!

A more common issue than "a good backup" is a break of the log chain. What breaks the transaction log chain? Obviously, a restore from backup, that restores to a point in time prior to the last log backup, breaks the log chain as of that point in time. Switching to SIMPLE recovery model breaks the transaction log chain. God-help-me, backing up the transaction log with TRUNCATE_ONLY will break the log chain. Never backup with TRUNCATE_ONLY - it was removed after SQL Serer 2008R2, thank heaven! Changing to Simple is a better method for truncating a log file because it introduces a clear, obvious break in the chain. (Remember, as mentioned earlier, once you switch back to Full, you need a backup to re-seed your log backup chain - until you take that backup, you'll continue to truncate your log on checkpoint!)

Restoring the database to a snapshot breaks the chain - and it's also the only way I know of to stop a a massive recovery/rollback operation with a guarantee of a consistent state for the database. (Of course, it's rare that anyone is fortunate enough to have taken a snapshot just prior to discovering a need to stop a massive recovery operation.)

Two things that merit special mention: a damaged transaction log backup breaks the log chain, obviously. And an out-of-band transaction log backup breaks the chain, unless it is present and accounted for when someone is trying to restore. So if someone is setting up mirroring, say, and takes a log backup to prime the mirror, and that log backup isn't put in the normal location (hopefully with the same naming schema), then you can run into a broken log chain. This is needless - SQL Server allows you to take a COPY_ONLY log backup that can be used to prime a mirror or for any other use for a transaction log backup (besides the obvious - truncating the transaction log!).

Even without a convenience like a COPY_ONLY backup, it's always best to use your standard log backup routine, or to use a method that duplicates the results of those methods - same log backup location, same naming schema - where possible. In an emergency, you don't want people to find a gap in the log backup chain - and you don't want them to have difficulties scripting out the restore, if one (or worse, a few!) files have different naming schema. Availability Groups add a new twist - you can take a log backup from any availability group replica (yes, even one that's in asynchronous commit mode, so long as it's in a 'synchronizing' state). This is one of the primary reasons I call out "same location". You wouldn't want to have 3 availability group replicas, and your log files scattered to three different locations!

For a good riddle answer, a database that's gone into suspect mode, and has had its log rebuilt, has a broken log chain (obviously), but DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS does not break the log chain... it's fully logged! Of course, both of these situations quite naturally lead to the question "what ridiculous circumstances must we envision where it's sensible to run REPAIR_ALLOW_DATA_LOSS when we have a complete log chain we can worry about breaking? RESTORE FROM BACKUP!" If your database is corrupt, restore from backup. REPAIR_ALLOW_DATA_LOSS is always your last-gasp hope and prayer.

Okay: Good backups, and log chains. What's next? Restores, I suppose.

You can restore a full backup - you can also restore a differential backup. You can restore a COPY_ONLY backup, and it retains a differential base, so you can restore a later differential on top of a COPY_ONLY. You can restore a string of log backups that are part of a log chain, so long as your earliest log backup links to the full (or COPY_ONLY), or differential, that you restored, and you don't have a broken chain. The differential base's changing is logged, so if you lost this week's full backup, but had a log chain stretching to last week's, you could restore past the time the full backup completed, and then restore a differential on top of it. You can restore to a point in time . (This requires a STOPAT, STOPATMARK, or STOPBEFOREMARK - you can, and probably should, use these options on the full and differential backup. You can't restore a full or differential to a point in time, but the STOPAT options will prevent a restore that takes you past your point in time. You can also put these commands on all of your log backups, no matter how far back they go - this can be a good practice for the exact same reason.)

You can restore individual files, or entire filegroups, after taking a tail of the log backup, if you have the required log chain:https://msdn.microsoft.com/en-us/library/aa337540.aspx. You can even restore just damaged pages in a corrupted database with a good log chain and a tail of the log backup. Note: when I first heard of this option, I had assumed - incorrectly! - that you needed a file-level or filegroup level backup. You don't! You can restore a file, or filegroup, from a Full backup (or full+differential), if you have the correct log chain to make it consistent.

The key to all of this magic is the transaction log chain - it tracks all of the changes to SQL Server in such a way that it can make all the right changes, and ensure that the database is in a consistent state. So you see, it can be extremely useful to make sure you have a good, unbroken log chain. The transaction log has some other bits of magic to it - I think I'll save those for another post.

Finally: Bulk-Logged recovery model. Bulk-logged is almost like Full recovery model, except if and when you have a "bulk" - technically, a minimally logged - operation. During these bulk operations, if you're in Bulk-Logged recovery, SQL will not log all of the data changes to the transaction log - it will use a special form of minimal logging that can greatly increase your throughput in the minimally logged transaction, and prevent your log file from growing (or growing as much as Full recovery would require). While you save space in your log file, you will not save space in your log file backups! What the log file tracks are the extents changed by bulk operations, and it dumps all of those into the log file backup when it runs. Not only does this make your log file backup unexpectedly large compared to the size of the LDF file, it also explains why you can't restore to a point-in-time from a log file backup of this nature. The log backup can't make the database consistent at any point in time except the end of the log backup.

This leads us to a tricky bit regarding Bulk-Logged recovery model. Let's say you take a log backup at 8pm, and another at 8:15pm. If all of your minimally logged operations ended by 8pm, and none started until after 8:15pm (technically: until after the 8:15 log backup completed...), your log backup should be precisely the same as a Full recovery model log backup. You should be able to restore to a point in time. I have tested this in SQL Server 2008R2. It worked. And let me tell you something: if you count on this behavior, someone will have run a short SELECT INTO or an INSERT...SELECT, or an index rebuild, that was minimally logged. You can count on that - if you don't abide by Murphy's Law, you really need to work in something other than IT! So, as a best practice: set Bulk-Logged when you intend to use it, and then switch back to Full. If you choose to keep it in Bulk-Logged, always assume you can only backup to the end of any transaction log backup - but in an emergency, you can still try a STOPAT restore.

Obviously, if you get to know me, you'll know I stress certainty and clarity a lot of the time - so, while you can try a STOPAT restore on a Bulk-Logged database, you don't know what will happen. This is why I strongly agree with the best practice of using Bulk-Logged only when you expect to need it. And then, once you're done, switch back to Full, and run a backup - usually a differential or log backup, but if you've just loaded a huge dataset, maybe you want a full backup instead! - so that you're able to restore to this point in time - the point in time at which you once again have confidence that you can use any and all magic of your log file backups again.

Sunday, February 15, 2015

Who am I? And why would I want to be mentored by Paul Randal?

Paul Randal is one of the gods of the SQL Server business. I'm not saying that to butter him up - I'm saying it to acknowledge a fact. If the word "guru" is better than "god" - hey, whatever word you prefer :-). He posted an offer: http://www.sqlskills.com/blogs/paul/want-mentored/. Explain, in a blog post, why he should mentor you.

I didn't have a blog. What to do?

Well - I'd always wanted to be one of the well-known experts in SQL Server - one of those "someday... maybe... if all goes well!" dreams. And I'd kept putting off starting the blog because I have my chronic fatigue issues - I'm always too-damn-tired!

So, this kicked me into starting my SQL blog. So if you think my blog is horrible, you get to blame... me. No one else. But you can still glare at Mr. Randal for inspiring my appearance!

What reasons would I give to be mentored?

I've learned the basics of the way SQL works, from the inside. And I want to share them. And I want to talk about them, with others, in a way that I hope will help people draw connections and see patterns and understand the tools and workings that much better. If you think my other posts are interesting and amusing, great - if you think they're awful, oh well.

Okay, but there's a lot a person can't really learn about just puttering around, and I've surpassed that point a long time ago; I need someone who can help me formulate and ask, then answer, the right questions.

Such as? Well - query plans. "What? There's tons of information about query plans out there!" Sure - but there's not very much that I've seen that explains how to parse them, if you need to dig in to just that one particular detail about what's being found, and how it's being pulled, or how it's being manipulated afterward.

Or XEvent captures: Profiler is going away - XEvents are taking its place. How do they work? Well, events fire, and you can gather information when they do - but how do you know what information you can gather for which event firing? And what are some neat tricks you can use, so you soon find yourself glad, yes, glad that clunky old Profiler's going bye-bye? There's a lot of different bits about XEvents out there, but if there's a good intro for newbies I haven't found it - and I wouldn't mind changing that, and writing the durn thing if I have to!

I've seen a deadlocked scheduler warning on what seemed to be parallelism - but I only counted 300 parallel threads out of 545 threads in the dump - did I miss something else making SQL feel thread-starved? Or was that enough? If 300 was enough, why? 450 threads out of 545 is probably enough to put up a thread-shortage, but 300 seems not-quite-enough (no jokes about Sparta, please!). And again: help me understand, it won't just be me learning - I'm going to try to share it with the world.

Saw a fascinating case where read latch waits were causing massive contention - not locking, shared latches! We puzzled out trying Read Committed Snapshot - I believe that it worked, and I believe it should have worked - the only latches that might occur should be when building the initial version store - and after that, updates (new values in the version store) should occur as a transaction commits - as the lock gets released, without any risk of latch contention, since no other query "sees" the new value existing yet. But I couldn't be sure, because I didn't know the mechanism. Do you? I'd love to learn these kinds of thing!

Hekaton manages to avoid all locking and latching - I see this, I know this, but I haven't the foggiest how they pull that beauty off. I'd love to learn how that happens, if it's public facing (just awfully crunchy) knowledge.

And I'd like to try to link these things to other things, and try to build up a flow, a sense, a song, a spirit... of SQL. And share that with the world.

If you were bored, and checked out my other two, yes, *two!* substantive blog posts, you might decide you'd like to help me share what I learn with the world. Or you might decide to mentor me *in spite* of my crimes against the SQL blogging community :-). Regardless - that's why I'd like to learn more, from one of the people who can definitely point me toward learning more. So I can expand my own knowledge, and share it, in my own, uh, interesting, way.