Sunday, September 23, 2007

Stuck Stalled IO Messages

Sometimes, we see the following messages in the SQL Server 2000 (Service Pack 4 and above) and SQL Server 2005 ERRORLOGs:

2004-11-11 00:21:25.26 spid1 SQL Server has encountered 192 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [E:\db_data2.ndf] in database [user_db] (7). The OS file handle is 0x00000000000074D4. The offset of the latest long IO is:

A given I/O request that becomes stalled or stuck is only reported one time. If the message reports 10 I/Os as stalled, these 10 reports will not occur again. If the next message reports 15 I/Os stalled, it indicates that 15 new I/O requests have become stalled.

Most of these messages if occur frequently in a SQL Server, then it could be due to one of the following:

1. There is an underlying hardware issue where in your firmware and disk drivers need to be upgraded. This is mostly done by the Internal Storage Team or preferrably by the Hardware Vendor
2. Or the disks are overwhlemed with the IO Requests being posted on the isks. In this case, you would want to go ahead and make sure that if possible some of the data files and log files would need to be moved to a different drive
3. Or it could be due to bad plans SQL Server ends up performing extra number reads than required

Also, monitoring the disks using the following counters under performance monitor under the would be helpful (under the performance objects - Logical Disk and Physical Disk):

1. Average Disk Queue Length
2. Average Disk Secs\Transfer
3. Average Disk Secs\ Write
4. Average Disk Secs\ Reads

The last three counters should never show a prolonged peak period which shows values of 30 milli-seconds and above. Please refer my blog article on how to setup a perfmon trace:

Helpful Articles:
SQL IO Basics
SQL Stuck Stalled IO KB Article
826433: PRB: Additional SQL Server Diagnostics Added to Detect Unreported I/O Problems
Stuck Stalled IO Messages:


Dinesh Babu Munugala said...

I tought we never use IOStress and IOSIM to check the performance of the disk... any takers?...

Amit Banerjee said...

Thanks for the heads up :) You are right. SQL IO SIM should be used only for data corruption issues for finding out the status of the disk in case the logs are not able to help us in identifying a probably root cause.