Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log. Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.
An event 153 is similar to an event 129. An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article. The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request. The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.
Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing. This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout. Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out. The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.
Below is an example event 153:
This error means that a request failed and was retried by the class driver. In the past no message would be logged in this situation because storport did not timeout the request. The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.
The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:
In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried. In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ).
The most common SCSI commands are:
SCSIOP_READ - 0x28
SCSIOP_WRITE - 0x2A
The most common SRB statuses are below:
SRB_STATUS_TIMEOUT - 0x09
SRB_STATUS_BUS_RESET - 0x0E
SRB_STATUS_COMMAND_TIMEOUT - 0x0B
A complete list of SCSI operations and statuses can be found in scsi.h in the WDK. A list of SRB statuses can be found in srb.h.
The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period. The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.
A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem. Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.
I dont have WDK to have access to the full scsi.h and srb.h. Could you please post more codes online to make it easier to find more information about this error? I'm getting a 00 04 28, so SCSI is good but SRB is ????, all that while doing a read.
[Hi Mauricio. The WDK is a free download, you can get it from http://msdn.microsoft.com/en-us/library/windows/hardware/hh852362.aspx. Regarding the error you are seeing, 04 is SRB_STATUS_ERROR.]
Downloading the whole WDK just to find the codes does seem a little overkill.
I'd be especially thankful if you could decode the following "02 08 28".
[Thank you for your feedback. To decode your message:
08 - SRB_STATUS_NO_DEVICE
28 - SCSIOP_READ
It appears your system attempted to read from a device which was not present. Most likely the device was removed.]
this is great stuff man. Would this also report with wiht the Microsoft iSCSIPORT driver. Can is help troubleshoot dropped frames for an ethernet/iSCSI SAN?
[Most iSCSI implementations now use msiscsi rather than iscsiport. To answer your question, it depends on if msiscsi returns an error that that will bubble up to the class driver to be retried. I do not recall of any errors that are handled in that way. The only one I can think of that may bubble up is SRB_STATUS_BUS_RESET (0xE). Most all of the retryable errors occur between msiscsi and storport. Msiscsi does not time requests, so you should not see a timeout.]
We are seeing a ton of these errors on one of our newly deployed Server 2012 boxes. The error string is "The IO operation at logical block address 22f66d73 for Disk 0 was retried." and the bytes mentioned in the article above are: 02 04 28
From what I've been able to dig up this means:
29 = 0x02 SCSISTAT_CHECK_CONDITION
“When the target returns a Check Condition in response to a command it is indicating that it has entered a contingent allegiance condition. This means that an error occurred when it attempted to execute a SCSI command. The initiator usually then issues a SCSI Request Sense command in order to obtain a Key Code Qualifier (KCQ) from the target.”
30 = 0x04 SRB_STATUS_ERROR
“Occurs if the HBA returns a nonspecific bus error”
31 = 0x28 SCSIOP_READ
“Occurs during a disk read operation”
So it looks like the OS is having trouble performing a read operation on that block is that correct? It's always the same logical block address. Does this indicate a bad drive? I'm not sure where to go from here. Thank you for this article, it's the first bit of information I've found helpful at all in troubleshooting this issue.
[This is most likely a bad drive.]
Hi, following up on Mauricio's question, can you please tell me where and how to look/check next when getting SRB_STATUS_ERROR = 04? Can you list the all possible causes? Thanks.
[If the requests to the disk are failing, most likely there is an issue with the controller, cable, disk, etc.]
Hi, nice information ! After some driver update, my Surface Pro 2 256 insert into event log a lot of this event.
Chkdsk and scan runs without errors.
In your opinion do I need to worry about ?
Log Name: System
Date: 07/04/2014 20:29:21
Event ID: 153
The IO operation at logical block address c28 for Disk 0 (PDO name: \Device\00000034) was retried.
<Event xmlns="" rel="nofollow" target="_new">schemas.microsoft.com/.../event">
<Provider Name="disk" />
<TimeCreated SystemTime="2014-04-07T23:29:21.458055100Z" />
[Using the Binary part of that log, the error decodes as:
Unfortunately there is no indication of what the error was, just that an error occurred. This may or may not necessarily indicate a hardware failure.]
FYI: We use Altaro Hyper-V Backup to backup our Hyper-V machines. Now the Hyper-V Host (where the Backupsoftware runs) logs this event every time after the backup completes; with the code 02 08 28, so the status is:
The disks with the numbers in the logged event are really removed (in my case Disk 1 and Disk 2). When i go to the storage manager, I've got only Disk 0.
So Altaro mounts the VHD's of the virtual machines to do the backup and unmounts it, when the backup is completed. So the logged event is correct and can be ignored in this case (when you backup your virtual machines with Altaro or any other software acting the same way to backup Hyper-V machines...
I was really surprised and feared that the disks are going to fail soon :)
Have a nice day everybody
[You may also be getting event 157 messages if the disks are being surprise removed. For more information see http://blogs.msdn.com/b/ntdebugging/archive/2013/12/27/event-id-157-quot-disk-has-been-surprise-removed-quot.aspx.]
i also assume this could be related to a DVD?
Can you please help me decode this error message:
0000: 0F 01 04 00 04 00 2C 00 ......,.
0008: 00 00 00 00 99 00 04 80 ......
0010: 00 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 22 04 2A .".*
[22 is SCSISTAT_COMMAND_TERMINATED
04 is SRB_STATUS_ERROR
2A is SCSIOP_WRITE
Usually this happens because a frame is dropped. This is an error at the hardware level (controller, disk, cabling, SAN fabric, etc).]
Is there anyway of determining exactly *which* disk was having this error? Many thanks.
[The text of the error indicates which disk had the error. The example shown in this article occurred on Disk 0. You can correlate the "Disk #" string to a specific disk using Disk Management.]
I have the following. 0028: 00 00 04 28 ?
[04 is SRB_STATUS_ERROR.]
Great article Bob. Thanks for the heads up Stephan, I'm getting 02 08 28 for 3 missing discs on my Hyper-V host logged every hour which relates to the DPM backup of my 3 VMs.
[Hi Paul. Unfortunately we are not able to provide 1:1 support through this blog. The issue reported does not seem to match a known issue. You can obtain 1:1 support through http://support.microsoft.com/.]