Spirit of SQL

Tuesday, October 13, 2015

XML and SQL Server - with some actual Extended Events content, I promise!!!

As promised, we're going to start in on Extended Events now - kind of. We're going to do a bit of looking at Extended Events around the edges, and I'm going to give you some examples of working with XML in SQL Server - because trust me, manipulating XML in SQL is a bit of an art. Now: Extended Events are intended to replace most of what Profiler does - not everything that Profiler does, but a lot of it. And when you created a data collector in Profiler it was called a Trace. In Extended Events, it's called a "session". I like to think that this terminology made if far more excusable for me to fail to put Extended Events together for a bit!

Now, a very nice thing about extended events is that you don't just have to collect them in a trace, either in memory or in a file. You can have multiple nifty kinds of targets. You can send them to a file; the approved extension is XEL. In SQL Server 2008, there was also an XEM file, a format file - but those aren't used in SQL Server 2012 and later. You can also send extended events to a ring buffer - that's a data structure that wraps around and overwrites older entries with newer entries when it fills up. And you can use some other, special purpose targets that can tally up data for you in a variety of ways, or even collect unpaired events (for example, Batch Starting with no Batch Completed). Those other targets are valuable, but we did all of this XML so we could start by looking at files and ring buffers. To do that, let's take a look at two views: sys.dm_xe_sessions and sys.dm_xe_session_targets:

I know... I cut off the output from dm_xe_sessions - there's an entire books online page for it you can review if you want! But I wanted to show you a few obvious bits. First, you can see that the join column is the address column of sys.dm_xe_sessions and the event_session_address of dm_xe_session_targets. For files, target_data contains file information - in the event_file row for event_session_address 0x00000003ED4D10C1, I copied the XML text in the event_data to the query window. You see that it shows you information about the file.

But for the ring_buffer target, the target_data column is the entire XML representation of the ring buffer. Yes, I said "XML representation". Time for a bit of pedanticism: the ring buffer data isn't actually stored in XML, which is a pretty inefficient method of storing data. But when we query it, we'll see it in XML. And, if we query it, and cast the result as XML, we can throw it into an XML variable, or column, and draw information from it as needed. So, here's a query you could use to get that XML data:
declare @QueryProcessingXML xml
select @QueryProcessingXML = cast(xst.target_data as xml) from sys.dm_xe_sessions xs inner join sys.dm_xe_session_targets xst on xs.address=xst.event_session_address where xs.name='system_health' and xst.target_name = 'ring_buffer'

We're not going to work with that - a 4meg ring buffer is a bit too much to display on a blog entry. So, let's take a look at XEL files instead! To draw information from the file, we can use fn_xe_file_target_read_file. This reads an Extended Event XEL file. I'm working from SQL Server 2012, so I'm not worrying about XEM files, and, I'm able to show off some slightly more interesting data from the server_health extended event session.

Just like with the sys.dm_xe_session_targets, the fn_xe_target_read_file has an "event_data" column, and you can also cast this as XML. You can pull in one row per event captured, with an XML representation of all data captured in the event_data column. I did this while creating this blog post, and pulled a column from one of the Query_Processing diagnostic results, so I could demonstrate how we're going to slice and dice our XML data from extended events.

I then did a DECLARE @QueryProcessingXML XML; SELECT @QueryProcessingXML= '
...and then my blob of XML data, and what I put in my half-quotes was this:

<event name="sp_server_diagnostics_component_result" package="sqlserver" timestamp="2015-10-04T21:11:43.193Z">
  <data name="component">
    <value>2</value>
    <text>QUERY_PROCESSING</text>
  </data>
  <data name="state">
    <value>1</value>
    <text>CLEAN</text>
  </data>
  <data name="data">
    <value>
      <queryProcessing maxWorkers="544" workersCreated="57" workersIdle="17" tasksCompletedWithinInterval="1537" pendingTasks="0" oldestPendingTaskWaitingTime="0" hasUnresolvableDeadlockOccurred="0" hasDeadlockedSchedulersOccurred="0" trackingNonYieldingScheduler="0x0">
        <topWaits>
          <nonPreemptive>
            <byCount>
              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="287139" averageWaitTime="825" maxWaitTime="89175239" />
              <wait waitType="IO_COMPLETION" waits="17158" averageWaitTime="3" maxWaitTime="1605" />
              <wait waitType="WRITE_COMPLETION" waits="16468" averageWaitTime="0" maxWaitTime="12" />
              <wait waitType="THREADPOOL" waits="5068" averageWaitTime="0" maxWaitTime="41" />
              <wait waitType="CLR_AUTO_EVENT" waits="3192" averageWaitTime="148586" maxWaitTime="89227233" />
              <wait waitType="QDS_PERSIST_TASK_MAIN_LOOP_SLEEP" waits="2454" averageWaitTime="96626" maxWaitTime="89232595" />
              <wait waitType="PAGEIOLATCH_SH" waits="946" averageWaitTime="99" maxWaitTime="5705" />
              <wait waitType="PAGEIOLATCH_EX" waits="464" averageWaitTime="6" maxWaitTime="21" />
              <wait waitType="CMEMTHREAD" waits="272" averageWaitTime="0" maxWaitTime="2" />
              <wait waitType="CLR_MANUAL_EVENT" waits="222" averageWaitTime="9" maxWaitTime="115" />
            </byCount>
            <byDuration>
              <wait waitType="CLR_AUTO_EVENT" waits="3192" averageWaitTime="148586" maxWaitTime="89227233" />
              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="287139" averageWaitTime="825" maxWaitTime="89175239" />
              <wait waitType="QDS_PERSIST_TASK_MAIN_LOOP_SLEEP" waits="2454" averageWaitTime="96626" maxWaitTime="89232595" />
              <wait waitType="LCK_M_S" waits="26" averageWaitTime="4808" maxWaitTime="11197" />
              <wait waitType="PAGEIOLATCH_SH" waits="946" averageWaitTime="99" maxWaitTime="5705" />
              <wait waitType="IO_COMPLETION" waits="17158" averageWaitTime="3" maxWaitTime="1605" />
              <wait waitType="FFT_RECOVERY" waits="30" averageWaitTime="929" maxWaitTime="2329" />
              <wait waitType="FT_IFTSHC_MUTEX" waits="9" averageWaitTime="2273" maxWaitTime="17228" />
              <wait waitType="SLEEP_DBSTARTUP" waits="153" averageWaitTime="100" maxWaitTime="112" />
              <wait waitType="PWAIT_ALL_COMPONENTS_INITIALIZED" waits="3" averageWaitTime="2316" maxWaitTime="2326" />
            </byDuration>
          </nonPreemptive>
          <preemptive>LHW:Preemptive waits deleted for space - but they normally list both count/duration as well</preemptive>
        </topWaits>
        <cpuIntensiveRequests />
        <pendingTasks />
        <blockingTasks />
      </queryProcessing>
    </value>
  </data>
</event>' -- that's my closing half-quote from above!

Pictures, and a scrollbox - I've never done either of these before, and I just know this post is going to look awful. Give me a mysterious performance issue any day of the week over trying to make a nice looking blog entry.

If I just wanted to know a single fact - e.g.: do I have a lot of threadpool waits by duration? Any signs of IO issues? - this would be fine. But the odds are, I want to do some more in-depth analysis. Do I see my top waits by count or duration changing? How many workers do I have, with how many idle, over time? How often do we have blocking tasks included in this report?

For this, we will use a few methods that SQL Server has built in.

Time for a quick vocabulary lesson: "Methods" is a term from object oriented programming. It's really just a name for a function, but a function that does something. The goal of OOP is to have your object do things (or have things done to them). Well, how do you do things to an object in a consistent way? You have a set of steps - a method, if you will - of doing the same thing to different objects. And voila! we have a name for this particular type of function. So, a method's just a function, but usually one attached to an object. In this case, we might have a method like @QueryProcessingXML.SomeMethod() - and that's the key difference between a method and a function. A method is written x.f(), whereas we normally think of a function as being f(x) - because in OOP, x, the object, is what matters.

The two methods I want to play with right now are the query method, and the nodes method. We'll run those off of the @QueryProcessingXML that I created earlier. Soon, we'll add in the value method ... but first, we're going to make sure we can find the data whose values we want. And that is why I had to write up a bit about XPath first.

Query is an amazing method - you'll see it does a lot of work for us. For right now, we're going to use it just to review our XPath infomation. For example:

Things to note here:

In each case, we have a something dot method-name: xcol.query(), and @QueryProcessingXML.nodes(). We'll explain the names in just a moment :-)
The "nodes" method returns raw data with no table structure, so SQL Server demands that it must be aliased. We pretend (because SQL Server makes us) that it creates a table, with a named XML column. Our table name in this example is tbl, and our XML column is xcol. See? That's why it's xcol.query() above.
This would point to all data nodes with a name attribute of "component" that are children of event, the root node - we know we only have one here, but you should be aware of this - for example, if /event had a thousand data children with a name of "component" we'd get a thousand XML blocks here.
The output of this is a set of XML blocks, generally smaller ones. While I'm not sure, I believe that complicated queries are faster against small blocks of XML - and this is why so many people use this to slice-n-dice their XML into smaller blocks
Another advantage of slicing off blocks of XML is that there's more freedom to use those XML blocks in subsequent queries, with no fear that they might contain multiple data[@name="component"] further down the line
You've just been handed a toy with which to futz around with the XPath skills that you picked up in the last post - go use it!

I'm only half-kidding on the last point. I'm going to show you a few more examples of the kinds of things we can do with the query and nodes tools, but if you're like me, you need to practice, practice, practice with these things, because otherwise every time you think about digging into XML, you'll freeze up because you don't want to have to re-review all that XPath stuff.

So:

We could pull in the waits by duration using this:
select xcol.query('.') from @QueryProcessingXML.nodes('/event/data[@name="data"]/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait') as tbl(xcol). That's an absolute path description.

Or if we thought it was a terrible drag to type in all those values, and we only wanted the first 8 values, we could use this:
select xcol.query('.') from @QueryProcessingXML.nodes('//byDuration/wait[position()<=8]') as tbl(xcol)
IMPORTANT NOTE: As you'll see if you read through the XML above, there are both nonPreemptive and preemptive waits in these results - so on a normal QUERY_PROCESSING block, you'll get 16 results. That's the downside of zipping through XML with a too-broad XPath. I've grown to love the // XPath operator, but it can quickly cause more problems than it solves, if you're not careful.

Or for a more reasonable predicate, we could choose only those whose waits that were highest byCount, and also were averaging higher than 0:
select xcol.query('.') from @QueryProcessingXML.nodes('//byCount/wait[@averageWaitTime>0]') as tbl(xcol)

It might be really annoying that Extended Events dump data into XML, rather than nice rowsets, but doesn't that last example show off some of the flexibility of using XML?

As I noted earlier, with these last two examples, you're taking a bit of a risk. You might think that "byCount/wait" isn't going to be re-used for any other block of data, but you can't be sure. You'd probably be better off making sure you're in a QUERY_PROCESSING component block. How could you do that?

As you'll find out a bit further down, you don't generally need to. But let's create a more complicated XPath expression, just for fun. We want precision, here, and we know that we want an event. But that event has to have a special property, so that will be event[]. What property is that? Well - it has a data child, with a name of "component". So, that would be event[data[@name="component"]]. But that's not enough - the "component" must have text of "QUERY_PROCESSING".

Well, then: that's event[data[@name="component]/text="QUERY_PROCESSING"]

So far so good. We then want the data child of this event, that has a name attribute equal to "data". And under that, we know there's a byCount set of waits. Ready?
select xcol.query('.') from @QueryProcessingXML.nodes('event[data[@name="component"]/text="QUERY_PROCESSING"]/data[@name="data"]//byCount/wait') as tbl(xcol)

That's: Give me the event, having a data node with a name attribute="component". But that component name must include a child, text="QUERY_PROCESSING". Once we have that event, we look at its child node data (name attribute="data" as well), and then, under that, we look for any and all "byCount" nodes with "wait" children. And if we decide that we don't want the preemptive waits, we can just change this to:

select xcol.query('.') from @QueryProcessingXML.nodes('event[data[@name="component"]/text="QUERY_PROCESSING"]/data[@name="data"]//nonPreemptive/byCount/wait') as tbl(xcol)

This is still pretty darn wordy, but it might save us a bit of trouble if we're digging through a larger XML document, where there might be a lot of similarly named node paths. Conceivably.

"Conceivably?" you ask? I'm afraid so. Believe it or not, while that was a good exercise in developing an XPath descriptor from the data we have, and increasing our understanding of predicates... we didn't need to do that. In this particular case,
@QueryProcessingXML.nodes('event/data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait')

will give you the same set of nodes. We don't need to specify the attributes in this case, because the path alone uniquely takes us to the values we need.

So, now that I led you down one path to a red herring, let's give you something you actually want. How do we pull values from this? That's done with the value() method. The value method has two arguments - the XPath(/XQuery) that isolates a single value, and then a data type. So:

select @QueryProcessingXML.value('(event/data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait/@waitType)[1]','nvarchar(50)')

... will pull out the first wait type. We need to use the brackets to pull just the first. As you can see we would need to do a lot of writing if we wanted to pull out all 20 waits here with all 4 attributes for each wait - and I've cut out the preemptive waits, so we'd normally have 40! That leads us to one of my primary annoyances with this particular capture.

As you can imagine, it's really easy to pull out all waits attributes:

select n.value('(wait/@waitType)[1]','varchar(50)') [Wait Type], n.value('(wait/@waits)[1]','bigint') [Wait Count], n.value('(wait/@averageWaitTime)[1]','bigint') [Avg Wait],n.value('(wait/@maxWaitTime)[1]','bigint') [Max Wait] from ( select xcol.query('.') as n from @QueryProcessingXML.nodes('//wait') as tbl(xcol) ) as tbl_outer

The problem we run into here is that, while we've pulled out the individual waits, we lose the upper attributes - like whether the wait is top ten byCount or byDuration, preemptive or nonPreemptive, or, more importantly, the timestamp of the capture. The time stamp is an attribute of the top level event in our example. In case you're curious:

I think that's the great, great, great, great, great great grandparent. So, sometimes, XML can be a pain. Is this a serious problem? Not really. Since I analyze SQL Server performance as part of my job, I yearn for the ability to review the top ten waits for each time frame, mostly byDuration (but byCount might matter too). Well, we can get all of that, but when we use this "slice and dice" method, we get one row per node - if we run through on the event level, it's hard to grab each set of ten waits out in one fell swoop. So how do we fix this?

First, obviously, you could just select the XML bits into XML columns in a table. Do you want one column with all waits, preemptive and non? Or two, or four columns, for preemptive and non, and maybe byCount and byDuration? For raw data capture and storage, and display, if you've been following along and playing with XML, it should be starting to lose any intimidation it may have had. This idea has a certain appeal.

However, in accordance with an old Internet Maxim[1], the obvious, best solution is to dump the timestamp and the XML into a temporary table, and then cursor your way through it, running 4x10 selects to pull out the separate rows, then pick your dates, and see how your waits changed... which we can do almost as easily if we've saved the waits as XML. And we'd lose a lot of the flexibility, and we'd have ten rows with the same timestamp, and.... yeah, let's go with the XML.

So: let's review. Over the past three blog posts, we've reviewed the very basics of XML, and then a bit about how XPath works - hopefully enough so you can see how these examples worked. But honestly? My real hope is that you can start to read through something like Jonathan Kehayias's excellent XEvent A Day series. Not because that's the be-all and end-all of Extended Events but because when I started to read through it, I found that my XML difficulties kept me from following along. I could kind-of understand what he was doing, but not being able to read the XPath information was driving me crazy.

Well, in life, it's good to be the change you want to see - so, if you want a world where there's a quick-start set of blog posts on XML, XPath, and SQL XML methods, and how they relate to Extended Events - sometimes the best option is to try to create one. So here's my effort. Don't worry (or do worry?) - I don't plan to stop blogging about Extended Events yet. But I feel that this set of posts gets over the big XML hump, and I hope they help someone out there. Or, I hope they're so terrible they convince someone else to do them better - it's good to be an inspiration, after all....

[1] That maxim states: "The best way to learn something is not to ask a question; it's to post something wrong."[2]
[2]Worked pretty quickly this time, didn't it?

Sunday, September 27, 2015

Brief intro to XPath, pursuant to Extended Events discussion

Okay, as I mentioned earlier, if you want to work with Extended Events in SQL Server, you'll be best served if you know enough XML manipulation to work with the output. And that means you'll want to be able to manipulate XML enough to find, and present, the data that matters to you. For example, if you pull the data from a ring buffer, you'll end up with a big XML document containing everything in the ring buffer; with a file, you can usually pull in smaller XML documents, one per each event, but there may be a fair amount of information in each - and it will vary, depending on what you decided to capture.

So you'll need a way to identify, and obtain data from a mess of XML. XPath is the first part of being able to do this. Earlier, I mentioned XQuery - but it turns out that all examples I've seen use XPath alone - XPath has query-like properties, because it describes parts of the XML document. An XPath construction, by itself, points to data, and you need only use a few tools to extract what you need from what you've pointed to.

Terminology first. XML has "elements". These are like <AnElement AnAttribute="AttributesAreAlwaysInQuotes">Some Text</AnElement>. Everything - open tag, all attributes, all text, closing tag. XPath also talks about "nodes" - these can be an element, but they can also be an attribute, or the text (and some other things, but we don't need to worry about those). These can be part of the XPath expression. Although we won't use this much, it helps to realize that XPath uses familial terminology - a tag that encloses another is a "parent" tag, and parents, plus "grandparents", etc., are all "ancestors"; similarly, the tag enclosed by the parent is a "child". Fellow children of the same parent are siblings, and children, plus "grandchildren" and so forth, are descendents.

Okay: since XML has a root element, and every all new tags must close within their parent tags, we can think of an XML document as kind of like a Unix directory. If you've never used Unix (or not knowingly), the top level directory of a Unix file system is /. You then have a subdirectory (which might be a disk mapped to that directory name). You could easily represent the Windows file system this way by defining / as the root, and the /c as the C: drive, /d for the D: drive, etc.. There are a fair number of similarities. For example, XPath uses both absolute and relative paths. Here's some sample XML: <Vehicle> <Car Make="Chevy" Model = "Sprint"> <Car_Data>My first car...</Car_Data> </Car> </Vehicle>. An absolute path there would be /Vehicle/Car/Car_Data - or even /Vehicle/Car/Car_Data/text(). You can also use relative paths. If Vehicle is not the root of our XML document (maybe it's part of "Assets"), then we can think of a relative path: Vehicle/Car/Car_Data, or Car/Car_Data. For relative paths, especially, there's also another similarlity to file systems: . (a single dot/period) refers to the current location/node, and .. (two dots) refers to the parent location/node. This is the basic "bread and butter" of a search through an XML document. But you quickly run into problems with this.

First: look at my XML - what if I expand this document to a list of all cars I have ever owned? Then, /Vehicle/Car won't be unique, right?

This is actually one of the wonderful things about XPath. /Vehicle/Car can be non-existent, unique, or have hundreds of values. XPath is sort of like an auto-select; "this is the path to all Root-Level Vehicle, child Car, elements." This is really nice, and really slick, when you think about it - it's like a path, and a select, all in one! But a select without a where clause is like a banana split without the banana. It's nice, but there's something critical missing! Fortunately, XPath has all kinds of where clauses.

A simple one is an array-like quantifier. /Vehicle/Car[1] means the first Car that's a child of Vehicle. There are also some functions you can use in the square brackets: last() will get you the last item, so last()-1 will get you the penultimate, and last()-2 will get you the antepenultimate. These are really useful when you need to deterministically return a singular item (rather than a list). Remember, /Vehicle/Car is all Car values that are children of the root Vehicle - so /Vehicle/Car[1] means "the first (and in our case, only) Car". Another neat example for square brackets, one that's a good reminder that XPath is path, plus select, is how you can point to up to the first three cars that are siblings (same parent) with /Vehicle/Car[Position()<=3]. Obviously, that last one won't allow you to return a single item/value!. Still, if Microsoft had a common event for QUERY_PROCESSING health, and it included wait information, and you wanted the top 5 waits... oops, I'm getting ahead of myself.

Also, remember how we said that attributes, and text, are nodes as well? They are - but when you want an absolute path to an attribute, you need to preface the attribute name with @. So, the path to the Car Make would be /Vehicle/Car/@Make. Pause here for a moment - while this is a path, it's a path to an attribute value, while /Vehicle/Car is a path to an element. This will matter - the "return value" is a bit different, right? But we'll get to that later.

For the text data in an element, you need to point to the text() operator. So we'd want /Vehicle/Car/Car_Data/text() to point to "My first car...".

When you start with a / and the root element, you're using an absolute path. Per all XPath information I've found, a relative path, like "Car/Car_Data" should be a way to find any Car_Data child of a Car node. However well this is documented, it doesn't seem to help in the Microsoft XML methods I'm going to be demonstrating later. But it's a good way to introduce another operator: //, a double slash - this is an operator that tells XPath that you don't care where something occurs, so long as it does occur, and this gives us a way to use those relative paths. So, //data will pull any data node, and //data/value will point to any value node that's a child of a data node - or, //Car/Car_data will find the values we hoped for earlier.

For additional searching fun, /event//data will find any wait node that's a descendent of an event node. Or, /DataCollection//event//data will point to all data nodes that are descendents of an event node that are part of the DataCollection XML document, however deeply either the event, or the data, is nested. You can combine these with the array-quantifiers above, but be careful. /DataCollection//event//data[1] means the first wait event in every set of /DataCollection//event//data elements - if you have 10 events, each with data as a descendent, you'll get 10 values. If you want just the first value in the document, you want to use parentheses to clarify this. So, (/DataCollect//event//data)[1] means the first match in the document. Similarly, /DataCollection//event//data[position()<20] would point to the first 19 data siblings in each collection of data-siblings; (/DataCollection//event//data)[position()<20] grabs the first 19 in the entire document. So if you have three sets of 10 data nodes, the first gets all of the data nodes (up to 19 siblings of each group of 10) whereas the second doesn't even quite collect the first two (all 10 siblings in the first group, and only 9 slots remain for 9 of the 10 siblings in the second group).

Okay, stop for a moment, and think about what we've discussed. You can find all nodes of a particular absolute path, and you can find all occurrences of certain relative paths. You can trim these to the first n items - you actually have all the knowledge you need to create an XPath to any single or top-n group of items in a simple XML document whose structure you know beforehand. If XPath stopped there, you'd already be able to do some relatively useful things.

But XPath does more than that. Earlier, I said that the square bracket operator was array-like. It isn't. You can use the powerful selection criteria allowed within the square brackets for something as simple as position in a list. But you can put a great many different selection criteria inside those brackets. For example, if we wanted to be sure we looked only at Chevys in our Vehicle list, we could use the absolute path /Vehicle/Car[@Make="Chevy"].

That's kind of cool, isn't it? But let's continue to pretend we have a lot of cars here, okay? This is way cool - I promise. You can select /Vehicle/Car[Car_Data="My first car..."]. You can say "give me Cars, but only if they have a Car_Data child with a specific text value.

But it gets even cooler. let's redo our XML a bit:

<Vehicle>
<Car Make="Chevy" Model = "Sprint">
<Car_Data>My first car...<Contents>My_Keys</Contents>
</Car_Data>
</Car>
</Vehicle>

We could use /Vehicle/Car//Contents[.="My_Keys"] to find the car that has my keys in it. Interesting note here: I have Contents tagged as it is because while HTML is mostly whitespace agnostic, XML is funny. In text, whitespace is preserved (but all newlines are converted to line feeds). The text of Contents starts the character after the > and continues until the next tag definition, or the closing tag of Contents. So I closed up Contents to avoid having whitespace matching, or line feed aliasing, issues.

Also, just to make sure it's clear, although the predicate in the brackets is often at the last item of the list, it doesn't have to be. /Vehicle/Car[@Make="Chevy"]//Contents[.=My_Keys"] is perfectly acceptable - but if I use that, I better hope I didn't leave my keys in a Ford! As I'm sure you can guess, when you do this, all predicates must be satisfied - it's an AND combination, not an OR.

Okay: now, if I've done my job well, after reading this, you have an idea how to read some simple XPath statements. You can follow an absolute path, or a relative path; you can you can make decisions based upon attributes or child values, or even descendant values. If I threw a blob of XML at you, and showed you how to extract information using SQL Server's XML methods, you could piece together what I'm doing. And that's my next step.

Wednesday, September 23, 2015

Very brief XML introduction - pursuant to Extended Events discussions

I haven't been blogging nearly as much as I hoped - and I hadn't hoped for much more than a single post a month, which means blogging has been pretty damn skimpy!

But I decided that the world needed a bit more information about Extended Events in SQL Server, and, since I was studying them for my own professional development, it was time to try to share them with others.

Problem: if you want to do much of anything useful with Extended Events, you need to know a bit about XML. Talk about a pain!

XML is one of those things, you know? On the face of it, it's stone-cold simple. Enclose text in tags, sometimes even nested tags. You can also have attributes within the tags. And it's extensible, because you can add new attributes, or new tags, and you won't (necessarily) break something that doesn't know to look for the new tags - just like with old web browsers, if you don't know what a tag means, ignore it!

Ah, but XML in SQL Server is a brand new beast. Sometimes you want to check a few things - are they there, or not? - and sometimes you want extract a few values, from text, or attributes. Sometimes you want to pull out some XML, sometimes you need to poke around in different places... you can do all of this, and because you can do all of this, you can process Extended Events results a bit easier than otherwise.

Step one: we need to discuss a bit about XML so you understand enough to know how it works.

There are many, many places to learn about XML out on the web. I'm not going to try to re-invent the wheel, so I'm going to spell out the most important parts here:

XML is eXtensible Markup Language - the "markup" is done by tagging, putting opening and closing tags around a section of text.
There must be a root tag to the document - all other tags will open and close within this.
XML is case sensitive! Never forget this, it will save you a lot of grief if you remember. But you'll probably forget, as I do, far too often.
Tagging is atomic - you can't cross your tags. You can have <Car><Make>Ford<Model>Fiesta</Model></Make></Car>- your tags are all nested - but you can't have <Car><Make>Ford<Model>Fiesta</Make></Model></Car>. If Model starts in Make, it must end there too.
Tags must be closed - you can create a self-closing tag by adding a slash near the end. We could have, for example, </Model Name="Fiesta" />
Tags can have text data - the text enclosed in the tags - or attributes, like the "Name=" in the previous point.

To expand upon this:
<?xml version="1.0"?>
<Memorobilia>
<B5>
<Weapon>
<PPG type="resin_sculpture">
<Quantity>4</Quantity>
</PPG>
</Weapon>
<Souvenir>
<Bear type="Teddy">
<Desc>JS embroidered shirt</Desc>
<Quantity>2</Quantity>
</Bear>
<Londo_Mollari_Figure>
<Quantity>1</Quantity>
<Desc>Posable 'action' figure </Desc>
</Londo_Mollari_Figure>
</Souvenir>
</B5>
</Memorobilia>
Notice that we start and end with "Memorobilia", and that everything we've collected is categorized as B5 (for Babylon 5, a darn good science fiction TV series). We could have eliminated either tag - but by enclosing the B5 tags in Memorobilia tags, we have room to add other forms of memorobilia. Sorry: Memorobilia. XML is case sensitive! And if we want to add this to other collectibles that aren't exactly Memorobilia, we can enclose this entire structure within a new set of tags. That's one of the neat things about XML.

This isn't the best, most well designed XML schema, but it's enough to do some basic understanding. Notice, interestingly, that there's little actual text data in here! But that's perfectly reasonable - as with the car example above, we don't necesssarily need text if we have the right categorizations.

You can also see that some of the data is included, not by tagging/categorization, but by attibutes - we can mention that it's a "Teddy" bear by throwing in an attribute on the Bear (in case there's a posable bear, or a statue of a bear), while noting that the Londo figure, of all the Souvenir "children", has no attributes.

I mentioned "children"; that will be the subject of my next post, when I'll discuss XPath (and maybe XQuery).

Interesting issue with log file shrinks...

So, I was looking at this fine blog entry from Lisa Gardner, one of Microsoft's PFEs. And I decided to do a bit of futzing around with it.

If you're too lazy to follow the link (or if you can't see where the link goes and won't follow a link blindly - GOOD FOR YOU!), there's an undocumented command called DBCC LOGINFO in SQL Server. It will show you how many VLFs you have, and show you a nominal status for each of them - are they free, or in use?

I hope you all know that you don't want too many VLFs - less than a hundred is good, and more than a hundred only if you absolutely, positively have no other reasonable choice. Too many VLFs can slow down recovery(which includes database startup/availability when SQL starts!), transactional replication, mirroring/Availability Groups, and probably more that I can't think of. For all that it's undocumented, it's a good command.

Well, as Lisa points out, when you're using an availability group, DBCC LOGINFO has an interesting quirk.

Normally, when you run a log backup, DBCC LOGINFO will show all of the truncated VLFs as having a status of 0 - not in use. Lisa pointed out that if the server is in an availability group, while the log might be truncated (i.e., the VLFs are available for re-use) DBCC LOGINFO might still show them as having a status of 2 (in use).

Let's step back a bit to make sure we're clear on this. The Transaction Log File of a database is like a whiteboard. Each VLF is its own discrete section of whiteboard. When you "truncate" the log, all VLFs that are freed up by the process are now erased - clean, and clear, and ready to accept new data. These VLFs are still there; the log file is still the same size. But now the log file has more empty space (in the form of VLFs) to use. (This is why I use the whiteboard metaphor - erasing a section of whiteboard doesn't make the whiteboard smaller, it makes the blank space bigger.)

DBCC LOGINFO normally tells you which VLFs are at least partially in use, and which ones are blank.

But when the database is in an availability group, it may not be honest. You may have space (re-usable VLFs) but DBCC LOGINFO might show them as in use, status of 2.

Well, I ran into an oddball problem. On an availability group, I tried to shrink a log file - and it wouldn't. (Yes, I know. It had been bloated by one-time operations.) It had been backed up - I even triggered the backup 2-3 times. And DBCC LOGINFO showed all VLFs in use. DBCC SQLPERF(LOGSPACE) (which shows all databases, log file sizes, and percentage of log files in use) showed nearly nothing, less than 1%, of the log file was in use. So, I'm comfortable that I ran into this DBCC LOGINFO display strangeness.

When this happens, the usual trick is to run the log backup once or twice to truncate just a bit more data. I did that - and I still couldn't shrink the log, and DBCC LOGINFO still showed all VLFs in use.

In this case, I figured out what was happening. My log backups were all occurring on the secondary replica of the database. As soon as I took a log backup on the primary replica, DBCC LOGINFO showed almost all VLFs clear, and I could shrink the log file back to its normal size.

What's going on here? I'm not sure, but it would make sense if DBCC SHRINKFILE uses DBCC LOGINFO to find out which VLFs are not in use. When you shrink a log file, it must shrink on VLF boundaries. If it's using DBCC LOGINFO to find inactive VLFs, it might not be finding any. And that might be causing the problem.

It looks like a backup on the primary replica might fix this. I also saw someone fix it with several log backups, and several checkpoints - the log backups were on the secondary replica, and, of course, the checkpoints were issued on the primary. Some combination of Secondary Replica log backups, and Primary replica checkpoints cleared the DBCC LOGINFO VLF status values - and allowed the file to shrink.

One warning: if you do choose to take a log backup on your primary replica, when you normally run log backups on your secondary, I very strongly recommend that you move or copy the log backup to the same location as your normal log backups. When you have to restore log file backups, you don't want to have to dig around in multiple locations - you want them all in the same place!

Friday, June 5, 2015

TempDB MDFs and spinlocks - and other time sinks

I don't want to start any great blog wars, but I saw a blog post today that made me wanna say WTF[1]. It wasn't that it was wrong or bad, but it was pushing some ideas that you shouldn't push too hard, because they can waste your time.

First tip I saw was "one TempDB MDF file per processor core, every time." Hey, I'll be the last person to scoff at the possible problems that TempDB contention can cause, but....

The problem started in SQL Server 2000. Sometime around SP3... or was it 2? 4? Yanno what? It's 15 years old! Do we care precisely when?

Anyway: The SQL product group set it so that SQL would go through TempDB MDF files in a round robin fashion, to avoid the worst aspects of contention in TempDB. And here, they recommended that you create extra TempDB files to go through in a round-robin fashion, to avoid contention, one per CPU, up to 8, and then stop - unless you needed more, then add 2-4 at a time until you saw contention stop.

The thing is, in 2005, a new feature was added where TempDB could cache table structure (unless you did certain things to those tables - this looks right,re: the things you can and can't do ~~but I can't vouch for it.~~[3] This means that there's already greatly reduced contention on those TempDB files. I'll grant you: when talking to non-technical bosses, pointy-haired or not, "One file per core" is better than the more complicated "40 cores, so we'll start at 8, no, make it 12, and add more as needed". But this fine blogger was suggesting this as an absolute rule. And it's not.

The other thing that was discussed was tracking spinlock stats. Me? I like spinlocks. Just the name is cool. "Spinlock". And dropping it in to the conversation seems crazy-smart: "While your CPUs are over burdened, I think the bigger reduction in throughput might be caused by spinlock contention." Won't you just sound like such a Brainiac your skin might turn green? (Well, for the classical Braniac(tm), at least.)

It isn't a bad idea to track spinlock stats. It really isn't. But I got to see a fascinating case recently and since the subject came up (don't be unkind and mention that I brought it up!) let me tell you something I got to see.

Problem was CPU usage. CPU was 80% or more on a lot of CPU cores. And this wasn't normal on this server - they'd run before at 30, 40% CPU. Well, unexpectedly high CPU means "check statistics, and chase down problems in query plans." That was done. CPU remained high. So more digging was done. I felt oh-so-proud when I checked spinlocks and saw an amazing 1000-2000 spins, per core, per millisecond. That had to be a big, big deal, right? Right?

I'll skip to the punchline. Later on, I saw that this same server was running at 40% CPU... with the same 1000-2000 spins, per core, per millisecond. I was surprised. After all 1500 spins per millisecond is 1.5 million spins per second, which is ... um... order of magnitude, 0.1% of the speed of a 1.5ghz processor. Ouch.

This might still have been a point of contention - it might be that if we could reduce this contention we'd see throughput increase, and without needless spins, maybe we'd see CPU drop to the healthier 25-30% I like to see.

(Um. Let me explain. When I look at a healthy SQL Server, CPU tends to be less than 30%. 35% isn't a problem, heck, 50% isn't a problem. 50% busy is 50% idle, so while some people see the CPU as half busy, and some see it as half idle, I've moved on to other performance counters rather than making up a punchline to this joke. More seriously: if I see CPU > 30%, I'm interested - but not concerned. "Interesting" things are the things you keep in your mind while troubleshooting - but there's a vast difference between "interesting" and "important.")

But even 2000 spins, per core, per millisecond, wasn't causing serious problems on the server as far as I could tell. Sure, keep an eye on spinlocks, spins per core per ms, and especially on backoffs. But keep in mind you can have a pretty darn high set of spins per core per ms, and still not seriously affect your server.

I'm not linking back to the site I found, because it had some darn good advice, and, as I said, I don't want to get into back and forth unkindness. But there are always caveats and there is always care to be taken.

[1] As you might guess, that means "Why... The Frilly-panties-of-Venus!". She left them on my chandelier. I suggested that might make people thought I viewed her as a conquest, and she quite reasonably pointed out that she was one of my conquests.... [2]

[2] Chess. Best out of three games. She's weak on queen side openings. There's probably a joke there, but I ain't going going for it, or she might get miffed. Miffed goddesses are never a good thing.

[3] Edited with strike through - in my day job, I do sometimes have to vouch for knowledge (or stress that I can't), but that doesn't belong here, especially not when I'm linking to someone more knowledgeable on a subject than I. I humbly apologize.

Wednesday, May 13, 2015

Thoughts about checksums and changes

So, I'm looking at this server, and I'm thinking "we don't have to worry about data corruption causing X_Issue, if the Page Verify option is set to Checksum in database properties." (See https://technet.microsoft.com/en-us/library/ms190249%28v=sql.105%29.aspx, recovery options.) Trust me, you want checksum set - in return for a tiny, tiny, tiny amount of CPU during a write to disk, you get verification that the write went to disk, and a subsequent read came back from the disk, without any damage occurring to your precious data.

"Precious..." you say doing your best Gollum imitation? Hey, Gollum's got nothing on a DBA guarding a database! For data integrity, a good DBA will even outdo a sysadmin. Believe it.

Checksum's the default - you need to do a bit of work to get it turned off - or, you need a database that was birthed in SQL Server 2000 or earlier. Checksum was introduced in SQL Server 2005, and it's been the default ever since.

As I was pondering this, I thought about changing a database from page verify = NONE to checksum, and then I thought, hey, what if a database is damaged? Oh my lord, what if we set checksum and suddenly we start flagging error 824! Will people think the checksum corrupted the database?

At that point, I realized I was tired. Really tired. Because that's one of the silliest thoughts that could come into my head - I know better! And if you know your checksums, you know better too - even if you don't realize it yet.

Checksums are computed at writes. If we had a thousand damaged pages already, and turned on checksum, how many 824 errors would be caused by that event alone? None. Even if SQL Server went out and computed a checksum on each and every page in the database (and it does not!!! - see the "myth" link below), it would be running a checksum on the already damaged data. Checksums aren't magical. A checksum is simply set up so that if there are small differences in the data, the correct checksum for that data will be different. So: if you compute a checksum, and send a data page to disk, and then, later, recompute the checksum when it's read back into memory, if the checksums match, you're 99+% certain that the data hasn't been damaged in transit or at rest. If they don't match, you know something terrible has happened.

But if data's already damaged, and you compute a checksum, all you are doing is making sure that if it gets damaged again, you'll notice.

If you turn on checksum verification, only the dirty pages in the buffer pool will automatically obtain checksums computed. Hey, here's an interesting thought: what if we cut power right after we set the checksum option? Could we still say all dirty pages in the buffer pool would get a checksum? Well, then it would be a race - but if you follow that Technet link above, you'll see that setting an option runs a checkpoint on the database - so all the dirty pages should immediately get flushed, and hence, checksummed, as long as the checkpoint completes. So, a new blog post, and the basis for a kinda-cool SQL Server trivia question. Not bad for a Wednesday night!

For a bonus trivia question, is it possible that turning on checksum page verification could reveal a bunch of damaged pages? Yes, but it requires a contrived situation to occur.

Stop and think a moment to see if you can get the answer - it's kind of cool once you figure it out.

Okay: if you turn off Checksum page verification, SQL stops checking checksums, and stops computing them thereafter. But remember, when you change the page verification option, SQL doesn't immediately add (or remove!) checksum information until the page is read into memory, changed, then written back to disk. For maximum horror value, let's back up this database, and restore it somewhere. (You do know that corruption in a database usually transmits perfectly via a backup, right?) Now, when someone restores this database, they might notice that checksum is turned off, and turn it back on. We probably still have pages with checksums computed on them - it'd be awfully strange if every single data page had been read and changed and written back to disk! So, if there were some IO errors leading to some incorrect checksum values (technically, the checksum is correct, it's the data that's incorrect) SQL would resume computing and comparing checksum values, and could suddenly uncover a bunch of errors it would have detected if we hadn't turned off checksum in the first place. And yes, good luck explaining that turning on checksum didn't cause the corruption.

So: the moral of this story is, never turn off checksum page verification, unless you need to construct an absurd hypothetical. In fact, don't even turn it off then - make it a thought experiment.

For more on checksum, and the myths about it, please see this page from Paul Randal's "myth a day" series.

Thursday, May 7, 2015

Interesting changes in indexes...

Wow! While digging into another issue, I found some significant changes in index maintenance on the MSDN Alter Index Page for SQL 2012.

First: usually, if you rebuild an index, you get a new set of statistics on the indexed column(s) for free. It'd be silly to read every value, sort it all, and then not use that data to build a new statistics histogram. That's not true if you rebuild a partitioned index. Quoted from the above:

In SQL Server 2012, statistics are not created by scanning all the rows in the table when a partitioned index is created or rebuilt. Instead, the query optimizer uses the default sampling algorithm to generate statistics. To obtain statistics on partitioned indexes by scanning all the rows in the table, use CREATE STATISTICS or UPDATE STATISTICS with the FULLSCAN clause.

That's interesting, but the shocker to me was another tidbit.

When dealing with database integrity errors, I hold my breath while looking at the index ID for problems in an index. Why? Because: Index ID 0 or 1 is raw table data. Index ID 0 is the heap; index ID 1 is the clustered index. (You only ever have one - Index ID of 0 rules out an Index ID of 1, and vice versa.) Index IDs of 2 and higher? They're "just" indexes that can be rebuilt without any risk of data loss.

Ah, but there's a tricky bit now:

In earlier versions of SQL Server, you could sometimes rebuild a nonclustered index to correct inconsistencies caused by hardware failures. In SQL Server 2008 and later, you may still be able to repair such inconsistencies between the index and the clustered index by rebuilding a nonclustered index offline. However, you cannot repair nonclustered index inconsistencies by rebuilding the index online, because the online rebuild mechanism will use the existing nonclustered index as the basis for the rebuild and thus persist the inconsistency. Rebuilding the index offline, by contrast, will force a scan of the clustered index (or heap) and so remove the inconsistency.

So: if you have a damaged index of ID 2 or greater, you have to take it offline to repair it. But don't think about what you're losing - think about what you're gaining! The other option is putting your database in Single User Mode, and running DBCC CHECKDB with REPAIR_REBUILD. So here, you get to take an index offline, instead of taking your whole database offline. It's a fair trade!

Most of the rest of indexing maintenance hasn't changed. If your indexes are under 10% fragmented, great - don't worry about it. Seriously - you can't always get rid of that last bit of fragmentation. For small tables, it might be literally impossible, and for larger ones, there are still probably better places to spend your energy. 10-30% fragmentation is a good time for a defragment/reorganize. If you see more than 30% fragmentation, a rebuild is your best option - but unless you're on Enterprise edition, you can't do an online rebuild so you might want to try to hold off by using a reorganize until you can get a maintenance window.

Oh, one other thing: most of the time you want to leave Fill Factor at the default of 0. A fill factor of 80 means leaving your pages only 80% full. This sounds like a reasonable idea - why pack them full, so that we have page splits as soon as we add a few rows? Why not leave a bit of space so they can fill up over time? It's not a terrible idea, but remember that data pages that are 80% full require 25% more disk space and buffer pool memory to hold them - and thus, 25% more reads and writes to deal with them. If you're having lots of page splits that are slowing you down, sure, consider trying a fill factor on the busiest indexes. But most of the time, you're better off packing them tight, to minimize disk and memory usage, and to reduce the number of reads and writes you need to perform.