Mike's profileFreeToDevBlogLists Tools Help

Blog


    September 27

    New SQL Blog - Optimise Optimize Optimise

    Love query optimisation? The SQL Server Query Processor team have started a new blog "Tips, Tricks, and Advice from the SQL Server Query Processing Team". They also link to an old blog I hadn't come across before which appears to have a ton of information! It's going to be a late night!

    On a side note, a new beta (1.0 build 145) of Windows Live Writer is available. I'm using it for this post, so if you are reading this, it worked!

    September 24

    Two helpful TFS posts...

    These two posts would have come in handy about 6 months ago! TFS V1 doesn't make a lot of things easy. Here are solutions to two of the most frustrating 'features': Determining Whether Tests Passed in Team Build and Building a Specific Version with Team Build

    September 21

    #Tables, Table Variables, CTE's - what's the difference?

    Here is a handy table that explains some of the differences between the three. The full sized version is available here.

    I'd say that in the past I've used Table Variables about 60 to 70% of the time when I've needed some form of temporary data. With Common Table Expressions (CTE) in SQL server 2005 though, I think I'll be using them a lot less. Note, the above table is for guidance only. I'd recommend that if you need to use Temporary Tables or Table Variables, you test the performance of your code using both.

    Note, recompilations hurt performance. In SQL Server 2000, recompilation was handled at a procedure level. In SQL server 2005, this happens at a statement level. Here is a great article on Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005

    September 20

    CREATE INDEX ..... [ ASC | DESC ]

    I came across a good post the other day that highlighted something I'd not seen or used in the CREATE INDEX statement. Columns can be specified to be sorted either ASC or DESC. From BOL:

    [ ASC | DESC ]

    Determines the ascending or descending sort direction for the particular index column. The default is ASC.

    This is nowhere near as exciting as the INCLUDE statement I blogged about on 9th September, but for someone out there, it may make their application that little bit better! In case the post one day disappears, here is a summary:

    "While SQL Server can traverse an index in either direction (since it is a doubly linked list,) sometimes it can be valueable to sort the keys of an index to match the sort order of some desired output.  For example:

    USE AdventureWorks; GO

    SELECT maritalStatus, hiredate
    FROM   HumanResources.Employee
    ORDER BY maritalStatus ASC, hireDate DESC

    The plan for this query is:

      |--Sort(ORDER BY:([AdventureWorks].[HumanResources].[Employee].[MaritalStatus] ASC,
                      [AdventureWorks].[HumanResources].[Employee].[HireDate] DESC))
           |--Clustered Index Scan(OBJECT:([AdventureWorks].[HumanResources].[Employee].
    [PK_Employee_EmployeeID]))

    Next, create a typical index with the default (ascending) sort order:

    CREATE INDEX employee_maritalStatus_hireDate ON HumanResources.Employee(maritalStatus,hiredate)

    The plan changes to an Index Scan (since it can use the index to cover the query,) and still a sort.

        |--Sort(ORDER BY:([AdventureWorks].[HumanResources].[Employee].[MaritalStatus] ASC,
                        [AdventureWorks].[HumanResources].[Employee].[HireDate] DESC))
           |--Index Scan(OBJECT:([AdventureWorks].[HumanResources].[Employee].
    [employee_maritalStatus_hireDate]))

    Better, but not perfect.  Change that to an index sorted in the direction that the output is desired in:

    DROP INDEX employee_maritalStatus_hireDate ON HumanResources.Employee
    GO
    CREATE INDEX employee_maritalStatus_hireDate ON HumanResources.Employee(maritalStatus ASC,hiredate DESC)

    Now the Sort is gone: 

    |--Index Scan(OBJECT:([AdventureWorks].[HumanResources].[Employee].[employee_maritalStatus_hireDate]), ORDERED FORWARD)

    Of course, to be sure this has limited applicability, and is not necessarily the best thing to do even if it improves a query or two, since this is now an index that will need to be maintained, which may end up being more costly than paying for the clustered index scan."

    September 19

    Using .WRITE on VARCHAR(MAX)? - Part 2

    I've been delving into the .WRITE operator a bit more, and in particular it's more complex execution plan when replacing data. Here's the setup code for the tests I've been running:

    SET NOCOUNT ON
    USE tempdb
    GO

    -- CONFIGURE THE NUMS TABLE
    IF OBJECT_ID('dbo.Nums') IS NOT NULL
    DROP TABLE dbo.Nums;
    GO
    CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
    DECLARE @max AS INT, @rc AS INT;
    SET @max = 100000;
    SET @rc = 1;

    INSERT INTO Nums VALUES(1);
    WHILE @rc * 2 <= @max
    BEGIN
    INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums;
    SET @rc = @rc * 2;
    END

    INSERT INTO dbo.Nums
    SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max;
    GO

    -- CONFIGURE THE SAMPLE TABLE
    IF OBJECT_ID('dbo.Sample') IS NOT NULL
    DROP TABLE dbo.Sample;
    GO
    CREATE TABLE dbo.Sample(id INT PRIMARY KEY, data VARCHAR(MAX));
    DECLARE @String VARCHAR(MAX);
    SET @String = '';
    WHILE LEN(@String) < 50000
    BEGIN
    SET @String = @string + REPLICATE('Lorem ipsum. ', 1500);
    END

    INSERT INTO Sample (id, data)
    SELECT n, @string
    FROM dbo.Nums
    WHERE n < 10000;
    GO

    I think this represents a plausible real life situation. A table with a few thousand rows (9999), a primary key which we will use in our SARGS and a VARCHAR(MAX) field that we will update. A quick detour: I've seen a post on the net from March 06 where a person puts together some code and compares + to .WRITE. The .WRITE code is apparently 1000 times faster. Here is his sample:

    Using .WRITE

    declare @t table (v varchar(max))
    insert into @t values ('')
    declare @i int
    set @i = 0
    while @i < 5000
    begin
    update @t set v.write (replicate('A',10),null,null)
    set @i = @i + 1
    end

    Using +

    declare @t table (v varchar(max))
    insert into @t values ('')
    declare @i int
    set @i = 0
    while @i < 5000
    begin
    update @t set v = v + replicate('A',10)
    set @i = @i + 1
    end

    I have two issues with this sample,

    1. He states that the .WRITE operation takes half a second, and the + operation takes over 18minutes. On my machine (3.6Ghz, 3Gb) the .WRITE also takes half a second, but the + operation takes 3 seconds (which is nowhere near 18minutes!).
    2. Have you ever seen this scenario in the real world? Appending the same string to the same column 5000 times! I doubt it! I think it's important to try keep samples, especially performance tuning, as close to real life scenarios as possible.

    Back to the .WRITE investigation. In Part 1 I showed you the Execution Plans, lets now take a look at the SHOW_PLANS

    UPDATE Sample SET data = REPLICATE('Lorem ipsum. ', 200) WHERE id < 4999;
    |--Clustered Index Update(OBJECT:([tempdb].[dbo].[Sample].[PK__Sample__1DE57479]), SET:([tempdb].[dbo].[Sample...

    UPDATE Sample SET data.WRITE (REPLICATE('Lorem ipsum. ', 200), 0, NULL) WHERE id < 4999;
    |--Clustered Index Update(OBJECT:([tempdb].[dbo].[Sample].[PK__Sample__1DE57479]), SET:(PartialUpdate([tempdb].[dbo].[...
    |--Assert(WHERE:([Expr1007]))
    |--Top(ROWCOUNT est 0)
    |--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [tempdb].[dbo].[Sample].[data] IS NULL THEN (0) ELSE NULL END))
    |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[Sample].[PK__Sample__1DE57479]), SEEK:([tempdb].[dbo].[Sample].[id] < (4999)) ORDERED FORWARD)

    The classic operation appears to do everything in one operation, the Clustered Index Update. The .WRITE operator needs five operations though. First it does a Clustered Index Seek using the SARG we provided. Then it computes a scalar that is stored in Expr1007 and used in the later Assert operation. After the scalar is computed, it runs a TOP, then the Assert and finally a PartialUpdate is run on the Clustered Index. This all seems a bit much considering that the classic operation can do everything in one step.

    I've sent out a few email for comment on this. If you have any feedback, feel free to leave a comment below.

    September 18

    Using .WRITE on VARCHAR(MAX)? - Part 1

    Are you using .WRITE when updating the contents of a VARCHAR(MAX) or NVARCHAR(MAX) column and are you replacing the whole contents (using @Offset = 0 and @Length = NULL)?

    UPDATE  MyTable
    SET     col1.WRITE ('Lorem ipsum’, 0, NULL)

    or simply replace the value:

    UPDATE  MyTable
    SET     col1 = 'Lorem ipsum’

    Depending on your app, you may want to try using a classic update rather than .WRITE. Although .WRITE can provide significant performance enhancements, if you are not appending data or changing a subset of the MAX column, then I've found a classic update to be faster. Marginally! Note the query plan for a .WRITE when updating a single column via a PRIMARY KEY is more complex (I'm still getting to the bottom of this)

    September 16

    Microsoft SQL Server 2005 Query DeskSheet

    I've been doing some reading lately and made some notes (to myself) on various querying aspects of SQL 2005. As a reference, I've created a 'DeskSheet' so I have something to quicky refer back to... Here's the first version of it in pdf format.

    I'll be posting a Microsoft SQL Server 2005 Programming DeskSheet in a later post...

    September 13

    Security Bits

    Taking a small detour away from TFS & SQL, go here for a free pdf (4.8Mb) of the The Developer Highway Code. There are also some great resources for security training here.

    [Category updated to 'Security'.]

    The rise of TOP, and fall of ROWCOUNT

    In SQL Server 2005, TOP has gained a few new capabilities.

    SELECT TOP (scalar expression | variable | subquery) [PERCENT] *
    FROM    aTable

    TOP can also be used in DML statements. Previously you may have used ROWCOUNT to limit the affected data, but apparently from Katmai (next version of SQL Server) ROWCOUNT will have no affect on DML statements! So, if you are currently using ROWCOUNT to throttle DML statements, I recommend you move over to the TOP operator. It's cleaner and has better support going forward....

    The old way
    SET ROWCOUNT 1000;
    WHILE 1 = 1
    BEGIN
        DELETE FROM aTable
        WHERE KeyId < 5000000;
        IF @@rowcount < 1000 BREAK;
    END
    SET ROWCOUNT 0;

    The new way
    WHILE 1 = 1
    BEGIN
        DELETE TOP(1000) FROM aTable
        WHERE KeyId < 5000000;
        IF @@rowcount < 1000 BREAK;
    END

    After running a few tests, I believe that the performance of the TOP operator might by slightly slower than ROWCOUNT. I'll do some further analysis and post it up here at a later date...FTD

    Toggle Actual Execution Plan [CTRL+M]

    According to SQL Server 2005 Books Online (July 2006 Refresh), there is no keyboard shortcut to display the actual execution plan for a query. The documentation states:  Include actual execution plan in the query output -> No equivalent.

    This appears to be a strange omission, especially given that CTRL+L can be used to display the estimated execution plan. The good news is that this frustrating omission appears to be an error in the documentation, rather than the product. Simply key in CTRL+M to toggle the display of the actual execution plan... Happy tuning... FTD

    September 10

    Re-using Key values (finding the gaps)

    Suppose you do physical deletes of data and are in a position to re-use the 'keys' for a table. You can use the technique below to find gaps in the key for your table (assuming you are using a single numerical key).

    -- LET'S USE TEMPDB
    USE tempdb;
    GO

    -- DROP THE TABLE IF IT ALREADY EXISTS
    IF OBJECT_ID('dbo.SampleOrders') IS NOT NULL
    DROP TABLE dbo.SampleOrders;
    GO

    -- CREATE THE TABLE AND POPULATE IT
    CREATE TABLE dbo.SampleOrders
    (
    OrderId INT NOT NULL PRIMARY KEY,
    OrderDate DATETIME NOT NULL
    );
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(2, '15 Aug 2006');
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(3, '15 Aug 2006');
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(5, '16 Aug 2006');
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(6, '19 Aug 2006');
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(8, '19 Aug 2006');

    -- NOW WE CAN SELECT THE MINIMUM MISSING KEY (OrderId), IN THIS CASE 1 WILL BE RETURNED
    SELECT CASE WHEN NOT EXISTS(SELECT *
    FROM dbo.SampleOrders
    WHERE OrderId = 1) THEN 1
    ELSE (SELECT MIN(A.OrderId + 1)
    FROM dbo.SampleOrders AS A
    WHERE NOT EXISTS
    (SELECT * FROM dbo.SampleOrders AS B
    WHERE B.OrderId = A.OrderId + 1))
    END;

    -- NOW LET'S INSERT 1 TO SEE IF OUR INNER QUERY WILL RETURN THE CORRECT VALUE (4)
    INSERT INTO dbo.SampleOrders(OrderId, OrderDate) VALUES(1, '11 Aug 2006');

    -- NOW THIS SAME QUERY SHOULD RETURN 4
    SELECT CASE WHEN NOT EXISTS(SELECT *
    FROM dbo.SampleOrders
    WHERE OrderId = 1) THEN 1
    ELSE (SELECT MIN(A.OrderId + 1)
    FROM dbo.SampleOrders AS A
    WHERE NOT EXISTS (SELECT *
    FROM dbo.SampleOrders AS B
    WHERE B.OrderId = A.OrderId + 1))
    END;

    -- CLEANUP THE SAMPLEORDERS
    IF OBJECT_ID('dbo.SampleOrders') IS NOT NULL
    DROP TABLE dbo.SampleOrders;
    GO

    It's a fairly easy way to find gaps working from the lowest gap up. Thanks to Itzik Ben-Gan for pointing it out in his book: Inside Microsoft SQL Server 2005: T-SQL Querying.

    Note, although the use of * is generally advised against, it is safe to use it in EXISTS or NOT EXISTS queries as the optimizer ignores the SELECT arguments.

    September 09

    SQL Server 2005: INCLUDE argument with Nonclustered Indexes.

    There is a new INCLUDE argument available in SQL Server 2005 when creating nonclustered indexes. According to the MSDN documentation, INCLUDE is used to: Specify the nonkey columns to be added to the leaf level of the nonclustered index. In a nutshell, this new clause allows us to better target columns we need in a key and satisfy more covering index scenarios. The result of using this argument should be less IO (which is good!). I think it's important to point out the Design Recommendation for this new argument

    "Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query included nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient."

    TFS Admin Tool 1.1 Release Candidate Available

    I blogged about version 1.0 of this tool some time back. It was pretty buggy, but made life a whole lot easier for administering users. Now a Release Candidate for 1.1 is out. The RC appears to be a little faster, though how they came up with a UI this bad is hard to understand. I expect a UI refresh in the future will go the way of tree based navigation. If its not in the very near future, perhaps a better tool will steal some of its thunder. On another note, the tool is incorrectly named. It should be Team Foundation Server User Administration Tool.

    A few updates in the Team Foundation Server world

    The TFS Power Toys have been refreshed. The must useful for day to day users will be the Annotate toy which allows you to see who the last person was that made a change to a section of a file, and the TreeDiff toy which allows you to compare working / local folders.

    After installing the toys, for Annotate, simply right click a file in Source Control and Annotate will be listed in the context menu. The TreeDiff tool is also accessed via a context menu.

    A bit more on the TABLESAMPLE clause...

    Itzik Ben-Gan, the author of the two SQL books I mentioned in a previous post, also has an article on the TABLE SAMPLE clause. This is a still a good read, even if you've read the article over on Fotia.

    In my view this new clause can be used for two purposes.

    1. The new starter: will use it to get to know the contents of a database.
    2. The perfect tester: will use it to prove that tests can run as planned against random data.

    Let me know if you can think of any other scenarios.

    September 06

    Some New Books - bring on the learning...

    I bought two new books today: Inside Microsoft SQL Server 2005: T-SQL Querying and Inside SQL Server 2005: T-SQL Programming, both by Itzik Ben-Gan. I spent a morning in one of his Black Hat SQL sessions a few years ago and have been following his works ever since.

    I've started with T-SQL Querying as it's described by the author as Part 1 and the other book Part 2. 11 pages in and I've found out something small, yet helpful.

    Have you ever been confused with regards to whether to place a logical expression in the ON filter or the WHERE filter, e.g.

    SELECT C.customerid
    FROM dbo.Customers AS C
    JOIN dbo.Orders AS O
    ON C.customerid = O.customerid
    WHERE C.city = 'Madrid'

    vs -->

    SELECT C.customerid
    FROM dbo.Customers AS C
    JOIN dbo.Orders AS O
    ON C.customerid = O.customerid
    AND C.city = 'Madrid

    In this case it makes no difference. The difference appears though when you use an OUTER JOIN. As explained in the book, the ON is processed first and then the WHERE. So the first statement is suitable if we only want data for Madrid, however if we want OUTER rows maintained, then the second query is suitable.

    Fotia Ltd - The TABLESAMPLE Clause

    It's been some time since the last post, but at last Stefan has posted an update to Fotia. This covers the TableSample clause and is the 18th installment in his 'Discover SQL Server 2005' series.

    September 02

    Building your solutions using MSBuild - Part 1

    Many people seem to be afraid of the angled brackets that accompany MSBuild and tend to avoid it. This is a pity as MSBuild can be as simple or complex as you want to make it. The beauty of the tool is that it's kinda like Lego; you build it bit by bit, enhance it, break it, build it better.

    I thought in Part 1 of this series I'd keep it as simple as possible and show how to compile a solution using MSBuild. We'll need a starting point, so I've attached a sample application that will be used throughout. Note, this sample application uses the Microsoft Solutions Build Framework (v2.0.060717.001) from GotDotNet. You can download this now and take a look, or you can follow this series and see how we take a basic solution compiling to something a little more useful. The SBF provides a bunch of MSBuild tasks, so we don't need to write them ourselves!

    1. Download and unzip the attached file to your PC
    2. Open a .NET 2.0 command prompt to the path of the Build.Proj file
    3. Run MSBuild.exe Build.proj

    You should see a successful build of your solution (assuming it compiles ok).

    That's all there is to it. Add the build folder to each of your solutions, then a few lines of MSBuild configuration and you can compile your solution. In fact the only part you should need to change in to compile your solution is the orange part below

    <?xml version="1.0" encoding="utf-8"?>
    <Project DefaultTargets="Full" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <!--IMPORT THE MS SBF TARGETS-->
    <Import Project="support\Microsoft.Sdc.Common.targets"/>
    <!-- BUILD THE MAIN SOLUTION.-->
    <Target Name="Full">
    <Tools.DevEnv VisualStudio="8.0" Path="$(MSBuildProjectDirectory)\..\SampleConsoleApp\SampleConsoleApp.sln" Config="Release" Clean="true"/>
    </Target>
    </Project>

    Great! But we need more. What about SourceSafe / TFS integration. What about notifications. What about deployment. What about labeling. What about FXCop integration....

    All this is to come, but first get yourself accustomed to the simple sample in part 1.

    RC1 of .NET 3 and Windows SDK Available

    Hidden behind the Windows Vista RC1 release, we at last have a new release of the development bits. All the information you need is here. Best of all, the number of samples have been increased dramatically over previous releases.

    Quick Links:
    Microsoft .NET Framework 3.0 - Release Candidate
    VS2005 Extensions for Windows® Workflow Foundation Release Candidate 5
    Windows SDK (note, you can get a full ISO download here 1.1Gb)

    September 01

    Team Foundation Server – Administer users the ‘easy’ way

    Install the TFS Admin tool from CodePlex:

    Note: This tool (v1.0) is very slow and very buggy. Don't do too much at once.