Monday, September 3, 2007

Database Integrity Checks

Database Integrity Checks Database integrity checks are recommended to ensure that the database consistency is intact and if there is a problem with consistency, then it is reported to the appropriate team(s) so that necessary action can be taken to rectify it. This can be done with the help of Database Maintenance Plans. In the event that a CHECKDB on a database fails, then it needs to be reported by Email(using xp_sendmail or SQL Agent Mail or Database Mail for SQL Server 2005) or events fired in the Operating System Event Logs (if these are monitored regularly) with the help of Operators which can be configured for SQL Server Agent. The xp_sendmail feature is not available for SQL 2005 64-bit versions.

The frequency of these checks largely depends on the following factors:
1. Importance of the database
2. How often data changes in the database (If a database integrity check fails for a database where data is not modified, then it would be advisable to restore the last known good backup rather than trying to repair the consistency database)
3. The size of the database
4. In the event of consistency checks failing, it needs to be determined which is the most feasible option:
a. Restore the last known good backups in accordance with the recovery model for that database to allow for a minimum amount of data loss
b. Or try and repair the database and falling back on Option (a) only if this fails In case a repair option is suggested in the CHECKDB output, it is important to note that a REPAIR_ALLOW_DATA_LOSS be never performed on the database without understanding its full consequences and consulting Microsoft PSS. In the event, that this route needs to be taken, it is always recommended to fall back on the last known good backups if possible. The REPAIR_FAST and REPAIR_REBUILD repair options can be performed without having any data loss. Please note that these are time consuming operations and in the event of database inconsistency it is not possible for us to predict how long these tasks would run for. Also, the time taken for CHECKDB on a database cannot be predicted. An educated guess can be made to how long it would take by referring to the last durations of the CHECKDB operations on that particular database. For the above mentioned repair options, please refer the following article: http://msdn2.microsoft.com/en-us/library/aa258278(SQL.80).aspx

REPAIR_ALLOW_DATA_LOSS
Performs all repairs done by REPAIR_REBUILD and includes allocation and deallocation of rows and pages for correcting allocation errors, structural row or page errors, and deletion of corrupted text objects. These repairs can result in some data loss. The repair may be done under a user transaction to allow the user to roll back the changes made. If repairs are rolled back, the database will still contain errors and should be restored from a backup. If a repair for an error has been skipped due to the provided repair level, any repairs that depend on the repair are also skipped. After repairs are completed, back up the database.

REPAIR_FAST
Performs minor, nontime-consuming repair actions such as repairing extra keys in nonclustered indexes. These repairs can be done quickly and without risk of data loss.

REPAIR_REBUILD
Performs all repairs done by REPAIR_FAST and includes time-consuming repairs such as rebuilding indexes. These repairs can be done without risk of data loss. In case of SQL Server 2005, you have the option of checking the suspect_pages in the MSDB database to find out the affected pages.

Please refer the following articles for more detailed information:
Suspect_pages table (SQL 2005)
http://msdn2.microsoft.com/en-us/library/ms174425.aspx
Understanding and managing the suspect_pages table
http://msdn2.microsoft.com/en-us/library/ms191301.aspx
Designing a Backup and Restore Strategy
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/adminsql/ad_bkprst_63eh.asp

In the event, that only a few pages have a problem, then for SQL Server 2005 a page level restore can be performed. Online page restore operation is a feature available in SQL Server 2005 Enterprise Edition but all other Editions of SQL Server 2005 support offline page level restores. The page level restores are done using the NORECOVERY option for that database. Then a backup of the current transaction log is taken and applied to the database with the RECOVERY option. This feature is applicable to databases in FULL or BULK-LOGGED recovery models.

For performing page level restores, please refer the following article: http://msdn2.microsoft.com/en-us/library/ms175168.aspx

It is highly important that a disaster recovery plan is in place to ensure the following:
· A plan to acquire hardware in the event of hardware failure
· A communication plan.
· A list of people to be contacted in the event of a disaster.
· Instructions for contacting the people involved in the response to the disaster.
· Information on who owns the administration of the plan.
· A checklist of required tasks for each recovery scenario. To help you review how disaster recovery progressed, initial each task as it is completed, and indicate the time of completion on the checklist.

In conclusion, if the database is of a considerably large size, then an integrity check needs to be scheduled during a window when the load on the server is at a minimum. The definition of minimum here refers to a load which is lesser than the normal workload on the server. If the database sizes are quite small, then daily integrity checks on the database would be the order of the day.

It is recommended that DBCC CHECKDB be run during hours when the load is light on the server. If DBCC CHECKDB is run during heavy peak usage time, expect a performance hit on the transaction throughput as well as DBCC CHECKDB completion time.

Recommendations for Good DBCC Performance
· Run CHECKDB when the system usage is low.
· Be sure that you are not performing other disk I/O operations, such as disk backups.
· Place tempdb on a separate disk system or a fast disk subsystem.
· Allow enough room for tempdb to expand on the drive. Use DBCC with ESTIMATE ONLY to estimate how much space will be needed for tempdb.
· Avoid running CPU-intensive queries or batch jobs.
· Reduce active transactions while a DBCC command is running.
· Use the NO_INFOMSGS option to reduce processing and tempdb usage significantly.

No comments: