HoppeRx - the cure for your ailing device

A community site dedicated to the support of device problems found by Hopper

Some threads are stuck in the Sql SpinLock: why?

Some threads are stuck in the Sql SpinLock: why?

  • Comments 3

Contributed by Javier Flores Assad (AKA "MTTF Dude")

A little bit about SqlCE

 

SQL CE is a very powerful component that is basically responsible for all the “Database” related operations, every time you mount, unmount, open or close a database or volume you enter into SqlCE code to accomplish the operation. Even if your app is not directly calling any database related code, you could be causing SqlCE operations and requests by calling POOM, accessing contacts or any other data store in the system.

 

Almost all the code inside the SqlCE requires the requests and operations to be “serialized”, to accomplish this in a multi-process multi-threaded system SqlCE uses a spinlock synchronization mechanism around a shared heap object.

 

The general SpinLock concept

 

          A spinlock is one more way to enforce serialization that has impressive performance if implemented correctly and if used for the correct scenario; it does not use critical section, mutex or semaphores to accomplish the serialization; instead, a control object is declared on a shared heap and when a thread needs to enter the serialized code it checks the state of that object, if the object is “locked” then it spins around it until it is not longer locked and as soon as it is unlocked the thread locks it and enters into the protected area. The thread “unlocks” it when exits the protect area.  

(More conceptual info: http://en.wikipedia.org/wiki/Spinlock)

 

 Void EnterSpinlock()

{

       While(TRUE)

       {

              If (SharedHeapObjectLock())

                      Break;

              Sleep(5 + (rand() % 10));

       }

}

 

As we can see, the concept is very simple but its implementation is not simple, the trickiest part is determining if your serialization scenario is adequate for a spinlock or not because if it is not adequate the spinlock will become problematic.

 

What is “the” problem with the spinlock?

 

          The spinlock implementation is entirely unrelated to the synchronization mechanisms provided by the kernel as primitives (such as critical sections), the kernel is unaware of any spinlocks and it doesn’t know that such code portion is actually protecting a serialized area. Since the kernel is unaware of that, it cannot apply any of the contingency measures (such as priority inversion) that are applied to the default synchronization objects.  (http://www.microsoft.com/technet/prodtechnol/wce/plan/realtime.mspx)

 

          Assume that you have 4 threads running in the system: T1, T2, T3 and T4. Both T1 and T2 are normal priority threads, T3 is a below normal priority thread and T4 is an above normal priority thread. Now assume that T3 is currently the owner of a spinlock and both T1 and T4 are trying to get the spinlock (so they are in a while loop making attempts and sleeping for a little bit); Finally assume that T2 is just another thread in the system that is not trying to do anything related with the spinlock but it is just spinning in an infinite while loop without any sleep time.

What will happen is that T2 will not allow T3 to run (because T2 has higher priority than T3), T1 and T4 will keep trying to get the spinlock but since the spinlock is owned by T3 they will just spend the rest of their time trying and never getting the lock. This is what we call a starvation condition in which T2 is starving T3 and as a side effect T1 and T4 are “stuck” trying to get the spinlock.

 

          If you break into the debugger at this point you will se T1 and T4 around the spinlock code and if you search a little bit you will find T3 sitting not been able to run while holding the lock, but the root cause of this condition is actually something else, it is T2 the one that is generating this by spinning like crazy so you would need to check which thread is consuming most of CPU time and it is not T1 or T4 (because them will also be consuming some CPU time)

 

          In a perfect world in which the spinlock supports priority inversion, this would never happen. The kernel would invert the priorities granting T3 with T4’s priority preventing the starvation; however, since that is not the case (the spinlock doesn’t support priority inversion) we need to ensure that no thread will cause starvation in the system.

 

What can I do if I see some threads around the spinlock?

 

1. If you see a callstack inside the SpinLock code that about to call RaiseException:

    1. Most likely your system is out of memory and shell32 started to force app termination to free resources. At this point a dev-health log or even a “mi” log (if you in a PB session) can help you identify who is consuming your memory.

 

2. If you see a callstack inside the spinlock (but not raising exceptions)

a.   Most likely your system has a spinner in which case hangRx can detect the culprit.

    1. You can use “gi thrd” and then multiple “gi delta” if you are connected in a PB session, that will give you which threads are consuming the CPU cycles.

 

          Find out which thread is consuming most of the CPU cycles. To do this you can use “gi delta” from the PB command prompt or from jshell. The gi delta command will give you the increments of time consumed by each thread between this time and the last time you performed a gi delta.

 

Example:

    - Un-break the debugger (hit F5, let it continue)

  1. Do a gi delta and ignore the data
  2. Wait for 5 seconds
  3. Do a second gi delta command
  4. Check: which thread consumed most of the time.
  5. Break into the debugger and then check its callstack to see what is it doing.

 

Windows CE>gi delta

{ignore and wait around 5 seconds}

Windows CE>gi delta

PROC: Name            hProcess: CurAKY :dwVMBase:CurZone

THRD: State :hCurThrd:hCurProc: CurAKY :Cp :Bp :Kernel Time  User Time

 P00: NK.EXE     07fb4002    00000001 c2000000 00000000

 P01: filesys.exe       e7f3f69e                 00000002 04000000 00000020

 P02: shell.exe      27d6952e          00000004 06000000 00000000

 T    Runing    27d694fe       27d6952e 00000005 130 130 00:00:00.000 00:00:00.015

 P03: device.exe   27d20f02           00000008 08000000 00000000

 P04: myApp.exe 275ccc56           00000010 0a000000 00000000

 T    Runabl     46906e32      c783ea9e 00000031 249 251 00:00:00.000 00:00:05.273

 

Contributed by Javier Flores Assad (AKA "MTTF Dude")

Comments
Page 1 of 1 (3 items)
Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post