SQL Smack down is back!
It's been a while since the last edition, but I thought of a few topics from previous SQL Smack downs that I thought could use a little more explanation. In this edition we will cover issues such as locking hints and derived tables, so without further ado, here we go.
I know that it has been stated many times before, but I continue to see areas where using locking hints may be especially beneficial for performance.
The most common locking hint I use is NOLOCK. This hint tells the query engine to ignore all locks and read the data from the table, even if it is in the middle of a transaction. This hint is not good for every query, but many times it will not cause any problems. The reason this can improve performance is that whenever a read or write is done to the data a lock is placed on some level, and the next transaction, be it a read or write, cannot be completed until the first lock is released. If you are running into a lot of deadlocks, this hint may help your database tremendously. Remember, on tables that have many updates done to them there is a possibility that you may read an uncommitted transaction, so if it is imperative that you have only committed data then you will not want to use this hint.
On occasion when doing updates SQL Server may escalate a lot from a lower level to a higher level, such as a table lock. When a table occurs none of the data in the rest of the table can be accessed until the lock is released. If this is occurring a lot, and you believe that it is unnecessary for the entire table to be locked, you may want to use the ROWLOCK (row level lock) or PAGLOCK (page level lock). ROWLOCK will instruct the query optimizer to lock only the row or rows being updated, and try to prevent escalation to higher levels. PAGLOCK will try to lock only the data pages being used, which normally house a set of rows, but not an entire table.
Even if you specify a locking hint such as ROWLOCK or PAGLOCK, there is still a chance SQL Server will escalate to a higher-level lock if it cannot perform the operations it needs to with the lock you specified.
Finally, please be careful using locking hints. Specifying a lock such as a PAGLOCK or TABLOCK (table lock) can cause deadlocking problems if the database cannot resolve other requests during the duration of the initial locking transaction. NOLOCK is pretty safe to use, as long as you are sure it will not cause any problems reading uncommitted transactions.
I know I mentioned this in a prior SQL Smack down, but I am going to go into a little more depth about how to use this procedure. If you remember from the last time I cover this subject, the benefit of sp_executesql is that SQL Server can cache execution plans when you use it. Lets go over the basics of how to use the procedure.
Like any other stored procedure, sp_executesql is called using the EXEC function, and accepts standard variables as parameters. Here is the basic usage:
EXEC sp_executesql [@stmt = ] stmt
[
, [@params = ] N'@parameter name datatype [,...n]
, [@param1 = ] 'value' [,...n]
]
So what does this all mean? Well let's work our way to the end from the front.
EXEC sp_executesql obviously means to execute the stored procedure sp_executesql. This is the same as any other stored procedure. The next part, [@stmt = ] stmt, means that we are going to pass a parameter called stmt into the stored procedure. The square brackets mean that this part is optional, i.e. we don't need to tell the stored procedure the name of the variable; it will assume it to be @stmt if it is the first parameter passed in. The stmt variable is just the SQL statement we want to pass in.
The next 2 lines will need a little more explaining. The line [@params = ] N'@parameter name datatype [,...n] is basically just a list of variable we are going to pass to the function, along with their data definitions. The N in front of the string just indicates that the string in of the type NVARCHAR, as opposed to a regular VARCHAR string. The first 2 parameters, the SQL statement and the parameters list, must be NVARCHAR strings.
The line [@param1 = ] 'value' [,...n] is just the actual variable list that you defined in the previous statement. This is where you pass in the values of comparison variables.
Here is a very basic example:
EXEC sp_executesql
N'SELECT * FROM pubs.dbo.employee WHERE job_lvl = @level',
N'@level TINYINT',
@level = 35
Notice in the above example we didn't really do any dynamic SQL. This example was just to show the basic usage syntax. The next example will get a bit more in depth.
DECLARE @sql NVARCHAR(500), @TableName NVARCHAR(10), @level TINYINT
SET @sql = 'SELECT * FROM ' + @TableName + ' WHERE job_lvl = @level'
SET @TableName = 'pubs.dbo.employee'
SET @level = 35
EXEC sp_executesql
@sql,
N'@level TINYINT',
@level
Notice that this time the table name was dynamic in the SQL string. We could just as easily have made the WHERE clause dynamic, or the selection list, or any other part of the SQL statement. If, in the above example, we wanted to run this statement on not only the pubs employee table, but also on another table we created, SQL Server will automatically cache the execution plans for each different table. This is the major benefit that sp_executesql has over simply creating a SQL string and running it with the EXEC function.
Please refer to Books Online for more examples on how to use sp_executesql, as well as for a more detailed description of all the parameters used.
This topic has come up a few times recent weeks as I have found a few places in which we are doing string manipulation inefficiently, or at least in a less than optimal matter. There are also a few items I thought many of you might find useful.
In general it is best to avoid doing concatenation and other string manipulations in WHERE and ON clauses. Here are some examples I have seen lately:
SELECT *
FROM Orders o
INNER JOIN sfel.dbo.elpc e
ON (o.State + o.Code = e.State + e.Code)
If you rewrote this statement to compare the individual values, odds are that the query optimizer will do a better job of preparing an execution plan. Here is a corrected example:
SELECT *
FROM Orders o
INNER JOIN sfel.dbo.elpc e
ON (o.State = e.State
AND o.Code = e.Code)
I know many of you have asked in the past about how you can do padding on strings, such as with the PADL and PADR functions. There are a couple of ways to do this. The first is a fairly easy way to pad small strings using the LEFT and RIGHT functions. The second method uses the REPLICATE function. Suppose we needed to pad a number with 0s to the left of the number, and the final string must always be 5 characters long. You could use this method:
SELECT RIGHT('00000' + CAST(@num AS VARCHAR), 5)
The other way to accomplish this would be to use the REPLICATE function as shown below.
SELECT REPLICATE('0', 5 - LEN(CAST(@num AS VARCHAR))) + CAST(@num AS VARCHAR)
The method using REPLICATE takes a bit more typing, but is easier to maintain when you have longer strings.
There is a lot more to string manipulation than this, and I encourage you all to look at Books Online under 'String Functions' to learn more.
This is another subject that I have covered before, however I think it could use a little more in depth look. One thing I want to explain, is that derived tables are not only used in situations where there is no other choice besides a temp table, but sometimes they are used to gain performance. First, check out this example and see if you can follow what it is doing.
SELECT b.sStateName, a.LeadCount, a.StateCode
FROM (
SELECT a.chrState AS StateCode, COUNT(*) AS LeadCount
FROM Orders a
INNER JOIN Manuscript b
ON (a.chrOrdno = b.chrOrdno)
WHERE a.sdtOrder_dt BETWEEN @StartDate AND @EndDate
GROUP BY a.chrState
) AS a
INNER JOIN sfel.dbo.tStateLookup b
ON (a.StateCode = b.sStateCode)
As you can see in the inner query, or derived table, I am computing the COUNT aggregate and grouping by the chrState field. Once my derived table is created, I am joining that against the tStateLookup table based on the state codes. Now why did I write the query this way instead of like this?
SELECT c.sStateName, COUNT(*) AS LeadCount, a.chrState AS StateCode
FROM Orders a
INNER JOIN Manuscript b
ON (a.chrOrdno = b.chrOrdno)
INNER JOIN sfel.dbo.tStateLookup c
ON (a.chrState = c.sStateCode)
WHERE a.sdtOrder_dt BETWEEN @StartDate AND @EndDate
GROUP BY c.sStateName, a.chrstate
Well the answer is for performance reasons. In the second example, the query optimizer was joining all of the tables first, and then computing the aggregate. In my first example, I cut down on the amount of processing needing to be done by computing the aggregate first, then joining the last table in. Lets take a look at some data so you can see what I mean.
This is what was happening in the second example. First the join was carried out, so a result set like this was created:
Orders.State tStateLookup.sStateName Manuscript.* 01 Alabama <Manuscript Fields> 01 Alabama <Manuscript Fields> 01 Alabama <Manuscript Fields> 02 Alaska <Manuscript Fields> 02 Alaska <Manuscript Fields> 02 Alaska <Manuscript Fields> etc...
Notice all the duplication of data caused by the joins. Once this was completed the aggregate was computed, giving the final result set. In the first example the derived table was computed first, giving a result set similar to this:
Orders.State Manuscript.COUNT(*) 01 50 02 200 03 5 04 20 05 155 06 12 etc...
Notice that this will be a much smaller result set, since it will only contain a distinct list of states because of the GROUP BY clause. The final step is then to join the tStateLookup table to get the state names, which will only need to be joined to the smaller result set of distinct states.
The amount of improvement you may see depends on a number of factors. The most influential factor is the size of the data set. On very small data sets you will see minimal gains. As the data sets get larger, the gain becomes more noticeable. Also, the more tables you need to join to the result set the more likely you are to see a performance increase by using a derived table.
I was going to cover cursors in this issue, but I have run a little long already. Look for that and other tropics in the next issue.
Well that's it for this edition of SQL Smack down. As always, if you have any questions about these or any other topics, please talk to Barry.