SQL Smack down is back!
I thought of a few more tips I wanted to share with you all, so here we go.
It has really come to my attention that we use temp tables entirely too much. Often times a simple JOIN, CASE, UNION, or other T-SQL statement will remove the need for the temp tables. One of the main things we do wrong with temp tables is how we create them. I have seen many places where a temp table is created using the SELECT INTO method. THIS IS BAD! The problem with creating temp tables is that every time you create or modify one, tempdb is locked, as well as other system tables and db's. What is the solution you ask? Don't use temp tables. Every time you use a temp table, the data in your temp table is stored in tempdb, and you can imagine if a lot of temp tables, or very large temp tables are being used, this can put a real strain on tempdb. However, there may be times when you have to use a temp table. In those instances use the CREATE TABLE syntax, as you will not be inserting data at creation time, therefore not locking tempdb for as long. Inserting data at a later time using INSERT SELECT is much more efficient.
I will go into a few examples of how to replace temp tables now.
I don't know how many times I have seen this in our code:
SELECT field1, field2
INTO #temptable
FROM table1
UPDATE #temptable
SET field2 = 'X'
WHERE field1 = 'Y'
SELECT field1, field2
FROM #temptable
This could easily be fixed with a CASE statement, like this:
SELECT field1,
CASE
WHEN field1 = 'Y' THEN 'X'
ELSE field2
END AS field2
FROM table1
I can't think of another example right now, but you get the idea.
I know I have covered joins before, but there are a few things I wanted to reinforce. First off, please try to use ANSI SQL syntax for all of your joins. If you are unfamiliar with ANSI SQL syntax, below is an example of the ANSI SQL syntax for a join compared with the old MS SQL syntax.
SELECT a.field1, b.field2
FROM table1 a, table2 b
WHERE a.field3 = b.field3
ANSI SQL Syntax:
SELECT a.field1, b.field2
FROM table1 a
INNER JOIN table2 b
ON (a.field3 = b.field3)
I know many of you still use the old MS SQL syntax to do joins, but there are several disadvantages to this. First off, it is very easy to make mistakes using this method. I have seen a few examples where errors in queries using MS SQL syntax have resulted in cross joins instead of inner or outer joins. This causes too many rows to be returned, which is often corrected by using a DISTINCT or GROUP BY clause. The problem with this is that you are putting a much larger burden on the server than you would be if correct syntax had been used, which would have filtered out the correct rows without having to group or do a distinct operator after the fact.
The second, and probably biggest disadvantage to using MS SQL syntax, is that SQL Server 2000 is probably the last version of SQL Server that will support the old syntax. The word from Microsoft is that all newer versions of the server will only support ANSI SQL. If you don't want to do a lot of work later, take the time to write your queries using ANSI SQL now.
One important thing you need to remember when doing joins is indexes. While many of you understand the basic principals of indexes, I know a lot of you do not have a firm grasp on when they should be used when joins are involved. The most important thing to remember is to index the fields referenced in the ON clause of the join. So from the example above, you would index field3 in table1 and field3 in table2 since they are used in the ON clause. There are other factors to consider, such as the fields used in the SELECT and WHERE clauses, but if you remember to index the fields used in the join you should see substantial performance increases on larger tables.
If you want to create a covering index on your table (one that indexes all the fields used in a query), there are a couple of rules to remember. First, you want to try to put the most selective (unique) fields as far towards the top (or left depending on which way you look at it) of the index as possible. If you put the most selective fields first, and the least selective fields farther down in the index, the query optimizer is more likely to use your index. The reason for this is that statistics are only stored for the first field in a query, and if the statistics say that this field will not reduce I/O or processing time, the query optimizer will use a table scan instead.
Also, if the first field in the index is not in the WHERE clause, there is a good chance the query optimizer will not use the index. The exception to this is when the where clause fields are not very selective, and in these instances the query optimizer may still use the index if the fields on the left are more selective.
Another thing to know about covering indexes is that sometimes you can get better performance by creating a single column index on each field used in the query. This is usually not the best method for creating indexes, however if you have many different queries that need different indexes this may be a better option than having many different covering indexes. What the query optimizer might actually do is creates a hash table that intersects each index, and when you run your query the optimizer looks at the hash table, which acts as a covering index, rather than doing a table scan.
There are times when a covering index can significantly reduce performance however. Some of these instances are when the query you are creating the index for is not run very often, or if the size of the covering index is very large in comparison with the size of the table. These instances will increase I/O, probably to the point of reducing performance.
Probably the most important thing you can do with indexes is to test with them. See what kind of performance you get with each type of index. Turn on the "Show Execution Plan" option in the query analyzer and see exactly what indexes the query optimizer is using and is not using. Only by testing can you really see what is the best option for your particular query. And keep in mind that the execution plan can change over time as the database grows or shrinks. I know it is a pain, but maintenance on your indexes is really important.
This is something I just wanted to cover real quick. Often times derived tables can be used to avoid using temp tables, but they are a big enough topic I wanted to split them out.
What is a derived table? A derived table is basically just a result set from another query. See where the comparison to a temp table comes in? Here is a basic example of a derived table:
SELECT field1, field2
FROM (SELECT field1, field2
FROM table1) AS tablealias
This is not a good use of a derived table, but it should be fairly easy to understand the syntax. Notice the table alias, which I have conveniently named tablealias. You must always assign an alias to a derived table. Don't ask me why, it's just the rule.
Often derived tables are used for calculating aggregate values. Here is an example from Books Online using the Northwind db:
SELECT C.CategoryID, C.CategoryName, P.ProductName, P.UnitPrice, CT.Category_Count
FROM Categories C
INNER JOIN Products P
ON (C.CategoryID = P.CategoryID)
INNER JOIN (SELECT C.CategoryID, COUNT(*) AS Category_Count
FROM Categories C
INNER JOIN Products P
ON (C.CategoryID = P.CategoryID)
GROUP BY C.CategoryID, C.CategoryName) CT
ON (C.CategoryID = CT.CategoryID)
ORDER BY C.CategoryName
This example is a little in depth, but with a little work we should be able to figure it out.
Lets start from the inside and work our way out. For the inner-most select, we have the following:
SELECT C.CategoryID, COUNT(*) AS Category_Count
FROM Categories C
INNER JOIN Products P
ON (C.CategoryID = P.CategoryID)
GROUP BY C.CategoryID, C.CategoryName
This will return a list of distinct CategoryID's from the Categories table, where those CategoryID's are also in the Products table. The actuall result set when executed on PIIDBS02 is this:
CategoryID Category_Count ---------- -------------- 1 12 2 12 3 13 4 10 5 7 6 6 7 5 8 12 (8 row(s) affected)
What this result set shows us is actually what is going to be referenced as a table when we select from it. This query, or derived table is given the alias of CT, so for the time being we can think of this as a table named CT. Next we move out to outer SELECT, which should be fairly self explanatory. We are selecting the CategoryID and CategoryName from the Categories table, and the ProductName and UnitPrice from the Products table. The field named Category_Count is selected from our derived table, named CT. The Products table is joined against the Categories table on CategoryID, so we only return products that have a category, and categories that have products. Finally, we join against our derived table, aliased as CT on the field CategoryID, which only returns rows from where the CategoryID is in both the Categories table and the CT table (our derived table).
If all that got a little confusing, just try to cut out the derived table, and just leave the alias. Think of the result set from the inner SELECT statement as a physical table. If you need more help than that check out books online.
I think this has been the longest SQL Smack down to date, so it's time to cut this one off. As always, if you have any questions about these or any other topics, please talk to Barry.