CSS SQL Server Engineers

This is the official team Web Log for Microsoft Customer Service and Support (CSS) SQL Support. Posts are provided by the CSS SQL Escalation Services

How It Works: SQLIOSim - Checksums

How It Works: SQLIOSim - Checksums

Rate This
  • Comments 4

SQLIOSim, like its predecessor SQLIOStress, is designed to read pages it has written and validate the data.   SQLIOSim does this using a checksum algorithm.

  • When SQLIOSim starts up it creates a set of buffers and used the Cryto APIs to generate random data on them.  It then calculates the checksum for each of the buffers.  These become statically stored.
  • When a page is DIRTIED by SQLIOSim a random seed is calculated.   Using this random seed value MOD (%) with the number of available buffers, a memcpy is used to dirty the write buffer.
  • The page header is correctly updated (these fields are not part of the checksum) and the page is written to disk.

Auditing reads are issued at various intervals to validate the checksum.  There are two angles to checksum validation.

  • The checksum calculated for the page should match that stored on the page.  (Page physically damaged)
  • The checksum and seed stored on the page should match that of the last write to the page.  (Page physically damaged or incorrect version of page returned I.E. Stale Read)

The SQLIOSim error log contains an error message sequence such as the following.

<ENTRY TYPE='ERROR' TIME='17:21:48' DATE='03/04/08' TID='5080' User='CPU Idle User' File='e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\buffer.cpp' Line='791' Func='CBUF::ValidateBuffer' HRESULT='0x80070467' SYSTEXT='While accessing the hard disk, a disk operation failed even after retries.'>

<EXTENDED_DESCRIPTION>Buffer validation failed on F:\sqliosim.mdx Page: 87366, offset 0x8</EXTENDED_DESCRIPTION>

</ENTRY>

When a significant error is detected a text file is created showing extended details.  In this case the error information shows SQLIOSim encountered a checksum failure and attempted 15 retries.   Each retry does a sleep between retry attempts.   After the 16 total reads the problem could not be resolved and the extended text file dump is generated.

<ENTRY TYPE='ERROR' TIME='17:21:48' DATE='03/04/08' TID='5080' User='CPU Idle User' File='e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\page.cpp' Line='1043' Func='ErrorDumpHandler' HRESULT='0x00000000' SYSTEXT=''>

<EXTENDED_DESCRIPTION>Dump file successfully written: H:\SQLIOSimX86\SqlSimErrorDump00006.txt</EXTENDED_DESCRIPTION>

</ENTRY>

The text file contains several sections which I have outlined below.  To assist in interpreting the output it helps to understand the page header definition.

DWORD m_dwPage;           

      DWORD m_dwFile;

      DWORD m_dwPageSeed;

      DWORD m_dwCheckSum;

I used the following page to illustrate the information here.  

87366 = 0x00015546  or byte swapped 46 55 01 00 

This is an example of the stale read (stable media returned previous version of the page) showing the differences are the seed and checksum and all the rest of the data.   Other conditions may be issues such as a single bit damaged on the page, swapped 512 sectors or even the wrong page (offset) returned.

 

Header Shows the basic information about the dump.

Data mismatch between the expected disk data and the read buffer:
File: F:\sqliosim.mdx
Offset: 0x2AA8C000
Expected FileId: 0x0
Received FileId: 0x0
Expected PageId: 0x15546
Received PageId: 0x15546
Expected CheckSum: 0x28DBAE9B
Received CheckSum: 0x31D71152
(does not match expected)
Calculated CheckSum: 0x31D71152
Expected Buffer Length: 0x2000
Received Buffer Length: 0x2000
Synchronous read was not successful after 15 attempts
Data buffer received

The raw dump of the data as read from stable media.

0x000000  46 55 01 00 00 00 00 00 66 1A 00 00 52 11 D7 31 C4 D7 68 54 52 44 98 F1 32 0D 81 F7 49 81 90 D3    FU......f...R..1..hTRD..2...I...

0x000020  21 14 B9 B7 F5 9E AB 77 11 FC 7C 99 47 4B 11 D5 B2 68 3A 86 50 3E 68 CE 95 61 9E BB 7B C1 24 08    !......w..|.GK...h:.P>h..a..{.$.

0x000040  78 54 68 48 73 92 9A 4F BB 79 83 CE B1 FE 68 D4 67 C0 5B 0A 3C 61 AB 04 D1 39 EF CE F5 D9 AB 74    xThHs..O.y....h.g.[.<a...9.....t

Data buffer expected The in-memory version of the expected page data.

0x000000 
46 55 01 00 00 00 00 00 1F 39 00 00 9B AE DB 28 FF 5B AD F3 21 2E A6 FD 4B 1D 34 AB 7D 04 38 33    FU.......9.....(.[..!...K.4.}.83
0x000020  48 4C 95 23 83 44 56 58 E6 51 DE 07 64 C4 14 78 8E F7 F7 6D 46 18 6F 39 E9 08 69 F1 7F 58 68 A2    HL.#.DVX.Q..d..x...mF.o9..i..Xh.
0x000040  2C B5 E3 34 84 92 61 30 72 3C 9A 85 4E 50 89 2C 48 AE BC 4A 2C 68 C1 5A E3 4D E1 19 DB F7 DB 56    ,..4..a0r<..NP.,H..J,h.Z.M.....V

Data buffer difference Shows the bitwise differences between the expected and received buffers.

0x000000                          79 23       C9 BF 0C 19 3B 8C C5 A7 73 6A 3E 0C 79 10 B5 5C 34 85 A8 E0
0x000020  69 58 2C 94 76 DA FD 2F F7 AD A2 9E 23 8F 05 AD 3C 9F CD EB 16 26 07 F7 7C 69 F7 4A 04 99 4C AA
0x000040  54 E1 8B 7C F7    FB 7F C9 45 19 4B FF AE E1 F8 2F 6E E7 40 10 09 6A 5E 32 74 0E D7 2E 2E 70 22
0x000060  51 75 A6 E0 AD 71 D3 8F FD 73 70 D2 53 4C 1C AF 0E C9 62 9B 92 01 FA 9B D7 10 D3 FD 2C 87 94 BB
0x000080  4D 28 88 1E 07 56 46 9A BB B6 99 AB DB 34 1D EF 5B 22 AD B7 F8 5B 6A 1C F8 08 A9 8A E0 50 A5 B7

File IO calls history dump A dump of the current API call ring buffer.   Since this is a ring buffer information about the specific page could have been lost.   Use the page number * 8192 combined with the Offset and Bytes values to locate the entries that contain the offset of the damaged page.

Function                   Handle              Offset       Bytes   Ret bytes       Start         End Rslt       Error    TID
GetOverlappedResult         0x334          0x7ba3e000       32768       32768  1527013921  1527013921    1           0   4956
ReadFile                    0x334          0xe9f90000        8192           0  1527013921  1527013921    1           0   4956
GetOverlappedResult         0x334          0x1374e000       73728       73728  1527013921  1527013921    1           0   1560
GetOverlappedResult         0x334          0xe9f8a000        8192        8192  1527013921  1527013921    1           0   5092
GetOverlappedResult         0x334          0xe85ce000       98304       98304  1527013921  1527013921    1           0   6028
ReadFile                    0x334          0xe9f8a000        8192           0  1527013921  1527013921    1           0   4956
GetOverlappedResult         0x334          0xe9f7a000        8192        8192  1527013921  1527013921    1           0   4740
ReadFileScatter             0x334          0x7ba3e000       32768       32768  1527013921  1527013921    1           0   4740
GetOverlappedResult         0x334          0xdd99a000      212992      212992  1527013921  1527013921    1           0   4736
GetOverlappedResult         0x334          0xb9e50000       32768       32768  1527013921  1527013921    1           0   4736
ReadFileScatter             0x334          0xb9e50000       32768       32768  1527013921  1527013921    1           0   4140

These types of failures are typically configuration or hardware related.  When you encounter such issues be sure to contact your operations and hardware vendors.

 

Bob Dorr
SQL Server Senior Escalation Engineer

Leave a Comment
  • Please add 1 and 1 and type the answer here:
  • Post
  • Kevin gives a run-down of the dozens of SQL Server topics (and other topics) he's been trying to stay on top of.

  • This is an extension to by previous post about SQLIOSim data integrity testing. http://blogs.msdn.com/psssql/archive/2008/03/05/how-it-works-sqliosim-checksums.aspx

  • I'v got the following error:

    Data mismatch between the expected disk data and the read buffer:

    File: G:\sqliosim.mdx

    Offset: 0xFBB6C000

    Expected FileId: 0x1

    Received FileId: 0x0 (does not match expected)

    Expected PageId: 0x7DDB6

    Received PageId: 0xC6E94000 (does not match expected)

    Received CheckSum: 0xEA72F2CC

    Calculated CheckSum: 0xEA72F2CC

    Received Buffer Length: 0x2000

    Received and calculated checksums are equal but fieldid and pageid are wrong. Could anyone explain me what is wrong? Or maybe someone could point me right place to ask this question.

  • Hi,

    I'm trying to run SQLIOSIM on NAS cluster's NTFS shared folder. Here's the command line:

       sqliosim.com -dir \\c16\vol1cifs -size 512 -d 600

    But it repeatedly fails to create files during the setup due to following error:

       Error: 0x80070057 Error Text: Description: Unable to get volume name for mount point \\c16\vol1cifs\

    vol1cifs - NTFS shared folder, that can be opened and edited on the same client where SQLIOSIM is installed, without any problem. According to SQLIOSIM description it does supports UNC path.

    What I'm doing wrong?

    Thanks for help.

    Sam

Page 1 of 1 (4 items)