Recently I was troubleshooting a performance issue on an internal tracking workload and needed to collect some very low level events over a period of 3-4 hours. During analysis of the data I found that a common pattern I was using was to find a batch with a duration that was longer than average and follow all the events it produced.
This pattern got me thinking that I was discarding a substantial amount of event data that had been collected, and that it would be great to be able to reduce the collection overhead on the server if I could still get all activity from some batches. In the past I’ve used a sampling technique based on the counter predicate to build a baseline of overall activity (see Mikes post here). This isn’t exactly what I want though as there would certainly be events from a particular batch that wouldn’t pass the predicate. What I need is a way to identify streams of work and select say one in ten of them to watch, and sql server provides just such a mechanism: session_id. Session_id is a server assigned integer that is bound to a connection at login and lasts until logout. So by combining the session_id predicate source and the divides_by_uint64 predicate comparator we can limit collection, and still get all the events in batches for investigation. CREATE EVENT SESSION session_10_percent ON SERVER ADD EVENT sqlserver.sql_statement_starting( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlos.wait_info ( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlos.wait_info_external ( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlserver.sql_statement_completed( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))) ADD TARGET ring_buffer WITH (MAX_DISPATCH_LATENCY=30 SECONDS,TRACK_CAUSALITY=ON) GO There we go; event collection is reduced while still providing enough information to find the root of the problem.
This pattern got me thinking that I was discarding a substantial amount of event data that had been collected, and that it would be great to be able to reduce the collection overhead on the server if I could still get all activity from some batches. In the past I’ve used a sampling technique based on the counter predicate to build a baseline of overall activity (see Mikes post here). This isn’t exactly what I want though as there would certainly be events from a particular batch that wouldn’t pass the predicate. What I need is a way to identify streams of work and select say one in ten of them to watch, and sql server provides just such a mechanism: session_id. Session_id is a server assigned integer that is bound to a connection at login and lasts until logout. So by combining the session_id predicate source and the divides_by_uint64 predicate comparator we can limit collection, and still get all the events in batches for investigation.
This pattern got me thinking that I was discarding a substantial amount of event data that had been collected, and that it would be great to be able to reduce the collection overhead on the server if I could still get all activity from some batches.
In the past I’ve used a sampling technique based on the counter predicate to build a baseline of overall activity (see Mikes post here). This isn’t exactly what I want though as there would certainly be events from a particular batch that wouldn’t pass the predicate. What I need is a way to identify streams of work and select say one in ten of them to watch, and sql server provides just such a mechanism: session_id. Session_id is a server assigned integer that is bound to a connection at login and lasts until logout. So by combining the session_id predicate source and the divides_by_uint64 predicate comparator we can limit collection, and still get all the events in batches for investigation.
CREATE EVENT SESSION session_10_percent ON SERVER
ADD EVENT sqlserver.sql_statement_starting(
WHERE (package0.divides_by_uint64(sqlserver.session_id,10))),
ADD EVENT sqlos.wait_info (
ADD EVENT sqlos.wait_info_external (
ADD EVENT sqlserver.sql_statement_completed(
WHERE (package0.divides_by_uint64(sqlserver.session_id,10)))
ADD TARGET ring_buffer
WITH (MAX_DISPATCH_LATENCY=30 SECONDS,TRACK_CAUSALITY=ON)
GO
By the way the performance issue turned out to be an IO issue, and the session definition above was more than enough to show long waits on PAGEIOLATCH*.