Problem:

Management view sys.dm_exec_requests has a column total_elapsed_time to track the total time elapsed in milliseconds since the request arrived. Following T-SQL query converted this value from milli-seconds to micro-seconds.

-- Simplified version of a complex T-SQL batch
SELECT  session_id, command,
r.total_elapsed_time * 1000 AS total_elapsed_time_in_ms
-- Convert ms to us.
FROM sys.dm_exec_requests AS r

Sometimes the above query throws Arithmetic overflow error while converting large integer value in total_elapsed_time to micro-seconds. This issue happens only when the sessions ran long enough on SQL server to hit this issue. This scenario makes it little harder to predictably reproduce in test environments.

Following T-SQL exception was captured in error log of application that ran this query on a recurring basis.  

Msg 8115, Level 16, State 2, Line 2

Arithmetic overflow error converting expression to data type int.

 

Troubleshooting:

Extended Events in SQL Server 2008 is a powerful, light weight monitoring and trouble-shooting feature that can help developer/dba to narrow down the root cause of issues like the one above. You can read more about Extended events at http://technet.microsoft.com/en-us/library/bb630282.aspx

Here are the steps I did to capture the exact T-SQL batch that caused this issue. My goal was to have a light weight event gathering mechanism to capture T-SQL batch that caused Error 8115.

Event session definition below has

·        Action -  ACTION (sqlserver.tsql_stack, sqlserver.sql_text) to capture sql_text along with event data

·        Predicate - “WHERE (error = 8115 )” to capture events that are generated only when error 8115 is hit.  

·        File Target - Captured events are logged to asynchronous file target for every 10 seconds to keep this event capture session as light weight as possible.

-- Drop event session if it was defined earlier
IF EXISTS(SELECT * from sys.server_event_sessions where name='ArithmeticError')

      DROP EVENT SESSION ArithmeticError ON SERVER

GO

-- Create event session to capture Error:8115 along with T-SQL Statement that
-- caused this error
-- Note: This script assumes that C:\temp is available and SQL Service account
-- has privileges to read, write to this folder
CREATE EVENT SESSION ArithmeticError ON SERVER
ADD EVENT sqlos.exception_ring_buffer_recorded (
ACTION (sqlserver.tsql_stack, sqlserver.sql_text)
                  WHERE (error = 8115 ))
ADD TARGET package0.asynchronous_file_target( -- file target
SET filename='c:\temp\ArithmeticErrors.xet',
metadatafile=
'c:\temp\ArithmeticErrors.xem')
WITH (MAX_MEMORY = 4096KB,
EVENT_RETENTION_MODE
= ALLOW_MULTIPLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY
= 10 SECONDS,
MAX_EVENT_SIZE
= 0KB,
MEMORY_PARTITION_MODE
= NONE,
TRACK_CAUSALITY
= OFF,
STARTUP_STATE
= OFF
)

GO
-- Start event session
ALTER EVENT SESSION ArithmeticError ON SERVER
state=start

GO

After creating Extended Event session, I ran a long running T-SQL batch whose elapsed time would eventually result in large integer value. I re-started the application that caused this arithmetic overflow error. I let the long running query and the application run for a while. I issued the following T-SQL statement to gather the list of captured events in extended event file target.

-- Query Gathered events from file target
SELECT CAST(event_data as XML) eventdata
      FROM sys.fn_xe_file_target_read_file('c:\temp\*.xet',
       'c:\temp\*.xem',
       NULL,
       NULL)
GO

Using the output of above query, I was able to identify the T-SQL batch that caused this arithmetic overflow issue. Captured event data looks like the following.

<event name="exception_ring_buffer_recorded" package="sqlos" id="21" version="1" timestamp="2009-09-09T20:10:02.503Z">

  <data name="error">

    <value>8115</value>

    <text />

  </data>

  <data name="severity">

    <value>16</value>

    <text />

  </data>

  <data name="state">

    <value>2</value>

    <text />

  </data>

 

 

  <action name="sql_text" package="sqlserver">

    <value>

SELECT  session_id, command,

r.total_elapsed_time * 1000 AS total_elapsed_time_in_ms -- Convert ms to us.

FROM sys.dm_exec_requests AS r </value>

    <text />

  </action>

</event>

sql_text captured along with event data was useful to isolate the exact query that caused this issue.

After identifying the rootcause of this problem, we no longer need the event seesion to be active, You can drop the event session using following DDL

 -- Drop event session if it was defined earlier
IF EXISTS(SELECT * from sys.server_event_sessions where name='ArithmeticError')

      DROP EVENT SESSION ArithmeticError ON SERVER

GO

References:

1)   SQL 2008 books online

2)   http://technet.microsoft.com/en-us/library/bb630282.aspx

 

 

 This posting is provided "AS IS" with no warranties, and confers no rights. 
Use of included script samples are subject to the terms specified athttp://www.microsoft.com/info/cpyright.htm