Wednesday, October 24, 2007

How to update statistics for large databases

Statistics are something that the SQL Optimizer depends on to generate a plan. The more outdated your statistics are, greater are the chances that your query optimizer will land up with a sub-optimal plan.

So what do you do when you have a database which is highly transactional in nature and is quite large in size. So, you might land up in a scenario where the database is online 24X7 and there is no downtime window where you can update your statistics with a FULLSCAN.

Before we get into what my opinion is about updating statistics of a database is, lets make sure we understand what I mean by statistics being "up-to-date". Statistics being up-to-date is the sampling rate used while updating statistics was 100% or they were updated with a FULLSCAN. Statistics updated using AUTO UPDATE STATISTICS property of a database will not update statistics with a FULLSCAN contrary to popular belief. The sole reason for this being is that suppose a table has 1 GB of data and a query on the database initiated a AUTO UPDATE of a STAT on this database, then if the sampling was 100%, then your query duration would take an eternity.

So, if you have a large database:
1. Ensure that the AUTO UPDATE STATS is turned on for the database. This would provide for some relief to the database
2. Identify which tables need to have their statistics updated with a fullscan. Only those would be required which have the maximum number of queries running on them
3. Next identify a period when the database usage is low. This is the time when you can use to update the stats of the other tables which did not qualify the list in #2.
4. Use a script to update statistics of the tables indentified step # 2 probably based on rowmodctr values (ROWMODCTR: Counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table. In SQL Server 2005, this is not going to be always helpful [Ref: http://msdn2.microsoft.com/en-us/library/ms190283.aspx]. But in SQL 2000, this could be used as a deciding factor)

Sample Script
============

Let's say I had identified 4 tables in STEP 2:

CREATE TABLE UPD_STATS_TBL (tblname varchar(10), dt_updated datetime, rowmodctr bigint)

SELECT * FROM UPD_STATS_TBL
tblname dt_updated rowmodctr
TBL1 2007-10-24 21:13:46.123 20000
TBL2 2007-10-23 21:13:46.123 400000
TBL3 2007-10-22 21:13:46.123 508000
TBL4 2007-10-24 20:13:46.123 87000

**************************************************
CREATE PROC UPD_STATS_DB
AS
DECLARE @tbl varchar(10)

UPDATE upd_stats_tbl SET rowmodctr = (SELECT MAX(rowmodctr) from sys.sysindexes where id=OBJECT_ID(tblname))

SELECT TOP 1 @tbl = tblname
FROM UPD_STATS_TBL
WHERE rowmodctr >
ORDER BY dt_updated

DECLARE @stmt varchar (100)

SET @stmt = 'UPDATE STATISTICS'+SPACE(1)+@tbl+SPACE(1)+'WITH FULLSCAN'

EXEC (@stmt)

UPDATE upd_stats_tbl SET dt_updated = (SELECT GETDATE())
WHERE tblname = @tbl

**************************************************

NOTE: You could refine this down to the table statistic if you wanted to.

No comments: