Gestures and Tools for Kinect - Eternal Coding - HTML5 / Windows / Kinect / 3D development - Site Home - MSDN Blogs

Gestures and Tools for Kinect


 

Gestures and Tools for Kinect

  • Comments 52

You have certainly not missed (as a regular reader of this blog Sourire) that the Kinect for Windows SDK is out!

image

For now, however, no gestures recognition services are available. So throughout this paper we will create our own library that will automatically detect simple movements such as swipe but also movements more complex such as drawing a circle with your hand.

image_thumb6image_thumb8

The detection of such gestures enable Powerpoint control the Jedi way ! (similar to the Kinect Keyboard Simulator demo).

If you are not familiar with the Kinect for Windows SDK, you should read a previous post that addressed the topic: http://blogs.msdn.com/b/eternalcoding/archive/2011/06/13/unleash-the-power-of-kinect-for-windows-sdk.aspx

How to detect gestures ?

There is an infinite number of solutions for detecting a gesture. In this article I will explorer two of them:

  • Algorithmic search
  • Template based search

Note that these two techniques have many variants and refinements.

You can find the code used in this article just here: http://kinecttoolbox.codeplex.com

image_thumb[15]

GestureDetector class

To standardize the use of our gestures system, we will therefore propose an abstract class GestureDetector inherited by all gesture classes:

image_thumb5

This class provides the Add method used to record the different positions of the skeleton’s joints.

It also provides the abstract method LookForGesture implemented by the children.

It stores a list of Entry in the property Entries whose role is to save the properties and timing of each recorded position.

Drawing stored positions

The Entry class also stores an WPF ellipse that will be used to draw the stored position:image_thumb7

Via the TraceTo method of the GestureDetector class, we will indicate which canvas will be used to draw the stored positions.

In the end, all the work is done in the Add method:

  1. public virtual void Add(Vector position, SkeletonEngine engine)
  2. {
  3.     Entry newEntry = new Entry {Position = position.ToVector3(), Time = DateTime.Now};
  4.     Entries.Add(newEntry);
  5.  
  6.     if (displayCanvas != null)
  7.     {
  8.         newEntry.DisplayEllipse = new Ellipse
  9.         {
  10.             Width = 4,
  11.             Height = 4,
  12.             HorizontalAlignment = HorizontalAlignment.Left,
  13.             VerticalAlignment = VerticalAlignment.Top,
  14.             StrokeThickness = 2.0,
  15.             Stroke = new SolidColorBrush(displayColor),
  16.             StrokeLineJoin = PenLineJoin.Round
  17.         };
  18.  
  19.  
  20.         float x, y;
  21.  
  22.         engine.SkeletonToDepthImage(position, out x, out y);
  23.  
  24.         x = (float)(x * displayCanvas.ActualWidth);
  25.         y = (float)(y * displayCanvas.ActualHeight);
  26.  
  27.         Canvas.SetLeft(newEntry.DisplayEllipse, x - newEntry.DisplayEllipse.Width / 2);
  28.         Canvas.SetTop(newEntry.DisplayEllipse, y - newEntry.DisplayEllipse.Height / 2);
  29.  
  30.         displayCanvas.Children.Add(newEntry.DisplayEllipse);
  31.     }
  32.  
  33.     if (Entries.Count > WindowSize)
  34.     {
  35.         Entry entryToRemove = Entries[0];
  36.         
  37.         if (displayCanvas != null)
  38.         {
  39.             displayCanvas.Children.Remove(entryToRemove.DisplayEllipse);
  40.         }
  41.  
  42.         Entries.Remove(entryToRemove);
  43.     }
  44.  
  45.     LookForGesture();
  46. }

Note the use of the SkeletonToDepthImage method which converts a 3D coordinate to a 2D coordinate between 0 and 1 on each axis.

So in addition to saving the position of the joints, the GestureDetector class can draw them to give visual feedback that greatly simplifies the development and debug phases:

image_thumb

As we can see above, the positions being analyzed are shown in red above the Kinect image. To activate this service, the developer just needs to put a canvas over the image that shows the stream of the Kinect camera and pass this canvas to the GestureDetector.TraceTo method:

  1. <Viewbox Margin="5" Grid.RowSpan="5">
  2.     <Grid Width="640" Height="480" ClipToBounds="True">
  3.         <Image x:Name="kinectDisplay"></Image>
  4.         <Canvas x:Name="kinectCanvas"></Canvas>
  5.         <Canvas x:Name="gesturesCanvas"></Canvas>
  6.         <Rectangle Stroke="Black" StrokeThickness="1"/>
  7.     </Grid>
  8. </Viewbox>

The Viewbox is used to keep the image and the canvas at the same size. The second canvas (kinectCanvas) is used to display the green skeleton (using a class available in the sample : SkeletonDisplayManager).

Event-based approach

GestureDetector class provides one last service to his children: the RaiseGestureDetected method that reports the detection of a new gesture via an OnGestureDetected event. The SupportedGesture argument of this event contains the following values:

  • SwipeToLeft
  • SwipeToRight
  • Circle

Obviously the solution is extensible and I encourage you to add new gestures to the system.

The RaiseGestureDetected method (with MinimalPeriodBetweenGestures property) also guarantee that a certain time will elapse between two gestures (in order to remove badly executed gestures).

Now that our foundations are laid, we can develop our algorithms.

Algorithmic search

The algorithmic search browses the list of positions and checks that predefined constraints are always valid.

The SwipeGestureDetector class is responsible for this search:

image_thumb3

For the SwipeToRight gesture, we will use the following constraints:

  • Each new position should be to the right of the previous one
  • Each position must not exceed in height the first by more than a given distance (20 cm)
  • The time between the first and last position must be between 250ms and 1500ms
  • The gesture must be at least 40 cm long

The SwipeToLeft gesture is based on the same constraints except for the direction of the movement of course.

To effectively manage these two gestures, we use a generic algorithm that checks the four constraints mentioned above:

  1. bool ScanPositions(Func<Vector3, Vector3, bool> heightFunction, Func<Vector3, Vector3, bool> directionFunction, Func<Vector3, Vector3, bool> lengthFunction, int minTime, int maxTime)
  2. {
  3.     int start = 0;
  4.  
  5.     for (int index = 1; index < Entries.Count - 1; index++)
  6.     {
  7.         if (!heightFunction(Entries[0].Position, Entries[index].Position) || !directionFunction(Entries[index].Position, Entries[index + 1].Position))
  8.         {
  9.             start = index;
  10.         }
  11.  
  12.         if (lengthFunction(Entries[index].Position, Entries[start].Position))
  13.         {
  14.             double totalMilliseconds = (Entries[index].Time - Entries[start].Time).TotalMilliseconds;
  15.             if (totalMilliseconds >= minTime && totalMilliseconds <= maxTime)
  16.             {
  17.                 return true;
  18.             }
  19.         }
  20.     }
  21.  
  22.     return false;
  23. }

To use this method, we must provide three functions and a time-delay to check.

So to manage the two gestures, simply call the following code:

  1. protected override void LookForGesture()
  2. {
  3.     // Swipe to right
  4.     if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight, // Height
  5.         (p1, p2) => p2.X - p1.X > -0.01f, // Progression to right
  6.         (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
  7.         SwipeMininalDuration, SwipeMaximalDuration)) // Duration
  8.     {
  9.         RaiseGestureDetected(SupportedGesture.SwipeToRight);
  10.         return;
  11.     }
  12.  
  13.     // Swipe to left
  14.     if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight,  // Height
  15.         (p1, p2) => p2.X - p1.X < 0.01f, // Progression to right
  16.         (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
  17.         SwipeMininalDuration, SwipeMaximalDuration))// Duration
  18.     {
  19.         RaiseGestureDetected(SupportedGesture.SwipeToLeft);
  20.         return;
  21.     }
  22. }

With this class it is really simple to add gestures that are detectable with constraints.

Skeleton stability

To ensure the proper operation for our detection, we must validate that the skeleton is actually static so as not to generate wrong gestures (i.e. detection of movement associated with a swipe of the whole body) .

To do that, we will use the BarycenterHelper class:

  1. public class BarycenterHelper
  2. {
  3.     readonly Dictionary<int, List<Vector3>> positions = new Dictionary<int, List<Vector3>>();
  4.     readonly int windowSize;
  5.  
  6.     public float Threshold { get; set; }
  7.  
  8.     public BarycenterHelper(int windowSize = 20, float threshold = 0.05f)
  9.     {
  10.         this.windowSize = windowSize;
  11.         Threshold = threshold;
  12.     }
  13.  
  14.     public bool IsStable(int trackingID)
  15.     {
  16.         List<Vector3> currentPositions = positions[trackingID];
  17.         if (currentPositions.Count != windowSize)
  18.             return false;
  19.  
  20.         Vector3 current = currentPositions[currentPositions.Count - 1];
  21.  
  22.         for (int index = 0; index < currentPositions.Count - 2; index++)
  23.         {
  24.             Debug.WriteLine((currentPositions[index] - current).Length());
  25.  
  26.             if ((currentPositions[index] - current).Length() > Threshold)
  27.                 return false;
  28.         }
  29.  
  30.         return true;
  31.     }
  32.  
  33.     public void Add(Vector3 position, int trackingID)
  34.     {
  35.         if (!positions.ContainsKey(trackingID))
  36.             positions.Add(trackingID, new List<Vector3>());
  37.  
  38.         positions[trackingID].Add(position);
  39.  
  40.         if (positions[trackingID].Count > windowSize)
  41.             positions[trackingID].RemoveAt(0);
  42.     }
  43. }

By providing to this class the successive positions of the skeleton, it will tell us (via the IsStable method) if the skeleton is moving or static.

Thus, we can use this information to send the positions of the joints to the detection systems only when the skeleton is not in motion:

  1. void ProcessFrame(ReplaySkeletonFrame frame)
  2. {
  3.     Dictionary<int, string> stabilities = new Dictionary<int, string>();
  4.     foreach (var skeleton in frame.Skeletons)
  5.     {
  6.         if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
  7.             continue;
  8.  
  9.         barycenterHelper.Add(skeleton.Position.ToVector3(), skeleton.TrackingID);
  10.  
  11.         stabilities.Add(skeleton.TrackingID, barycenterHelper.IsStable(skeleton.TrackingID) ? "Stable" : "Unstable");
  12.         if (!barycenterHelper.IsStable(skeleton.TrackingID))
  13.             continue;
  14.  
  15.         foreach (Joint joint in skeleton.Joints)
  16.         {
  17.             if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)
  18.                 continue;
  19.  
  20.             if (joint.ID == JointID.HandRight)
  21.             {
  22.                 swipeGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);
  23.                 circleGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);
  24.             }
  25.         }
  26.  
  27.         postureRecognizer.TrackPostures(skeleton);
  28.     }
  29.  
  30.     skeletonDisplayManager.Draw(frame);
  31.  
  32.     stabilitiesList.ItemsSource = stabilities;
  33.  
  34.     currentPosture.Text = "Current posture: " + postureRecognizer.CurrentPosture.ToString();
  35. }


Replay & Recording tools

Another important point when developing with the Kinect for Windows SDK is the ability to efficiently test. It may sound silly, but to test you must get up, come in front of the sensor and make the appropriate gesture. And unless you have an assistant, this can quickly become painful.

So we'll have a record and replay service for the information sent by Kinect.

Recording

The recording part is pretty simple because we only have to take a SkeletonFrame and browse each skeleton to serialize its contents:

  1. public void Record(SkeletonFrame frame)
  2. {
  3.     if (writer == null)
  4.         throw new Exception("You must call Start before calling Record");
  5.  
  6.     TimeSpan timeSpan = DateTime.Now.Subtract(referenceTime);
  7.     referenceTime = DateTime.Now;
  8.     writer.Write((long)timeSpan.TotalMilliseconds);
  9.     writer.Write(frame.FloorClipPlane);
  10.     writer.Write((int)frame.Quality);
  11.     writer.Write(frame.NormalToGravity);
  12.  
  13.     writer.Write(frame.Skeletons.Length);
  14.  
  15.     foreach (SkeletonData skeleton in frame.Skeletons)
  16.     {
  17.         writer.Write((int)skeleton.TrackingState);
  18.         writer.Write(skeleton.Position);
  19.         writer.Write(skeleton.TrackingID);
  20.         writer.Write(skeleton.EnrollmentIndex);
  21.         writer.Write(skeleton.UserIndex);
  22.         writer.Write((int)skeleton.Quality);
  23.  
  24.         writer.Write(skeleton.Joints.Count);
  25.         foreach (Joint joint in skeleton.Joints)
  26.         {
  27.             writer.Write((int)joint.ID);
  28.             writer.Write((int)joint.TrackingState);
  29.             writer.Write(joint.Position);
  30.         }
  31.     }
  32. }


Replay

The main problem with the replay mechanism is about data reconstruction. Indeed, Kinect classes are sealed and do not expose their constructor. To work around this problem, we will replicate and mimic the Kinect class hierarchy adding implicit cast operators to Kinect classes:

image_thumb[3]

The SkeletonReplay class is responsible of the replay with its Start method:

  1. public void Start()
  2. {
  3.     context = SynchronizationContext.Current;
  4.  
  5.     CancellationToken token = cancellationTokenSource.Token;
  6.  
  7.     Task.Factory.StartNew(() =>
  8.     {
  9.         foreach (ReplaySkeletonFrame frame in frames)
  10.         {
  11.             Thread.Sleep(TimeSpan.FromMilliseconds(frame.TimeStamp));
  12.  
  13.             if (token.IsCancellationRequested)
  14.                 return;
  15.                                       
  16.             ReplaySkeletonFrame closure = frame;
  17.             context.Send(state =>
  18.                             {
  19.                                 if (SkeletonFrameReady != null)
  20.                                     SkeletonFrameReady(this, new ReplaySkeletonFrameReadyEventArgs {SkeletonFrame = closure});
  21.                             }, null);
  22.         }
  23.     }, token);
  24. }

Finally, we can record and replay gestures to debug our application:

image_thumb[6]

You can download a replay sample here: http://www.catuhe.com/msdn/davca.replay.zip

Know when to start

The last remaining problem to resolve is deciding when to begin the analysis of gestures. We already know that we must do so when the body is stable but it is not enough.

Even if I stay static, I use my hands a lot when I speak and I can inadvertently trigger a gesture.

To protect ourselves from this, it is possible to complete the detection by adding a posture condition.

That’s why we are going to use the PostureDetector class:

  1. public class PostureDetector
  2. {
  3.     const float Epsilon = 0.1f;
  4.     const float MaxRange = 0.25f;
  5.     const int AccumulatorTarget = 10;
  6.  
  7.     Posture previousPosture = Posture.None;
  8.     public event Action<Posture> PostureDetected;
  9.     int accumulator;
  10.     Posture accumulatedPosture = Posture.None;
  11.  
  12.     public Posture CurrentPosture
  13.     {
  14.         get { return previousPosture; }
  15.     }
  16.  
  17.     public void TrackPostures(ReplaySkeletonData skeleton)
  18.     {
  19.         if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
  20.             return;
  21.  
  22.         Vector3? headPosition = null;
  23.         Vector3? leftHandPosition = null;
  24.         Vector3? rightHandPosition = null;
  25.  
  26.         foreach (Joint joint in skeleton.Joints)
  27.         {
  28.             if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)
  29.                 continue;
  30.  
  31.             switch (joint.ID)
  32.             {
  33.                 case JointID.Head:
  34.                     headPosition = joint.Position.ToVector3();
  35.                     break;
  36.                 case JointID.HandLeft:
  37.                     leftHandPosition = joint.Position.ToVector3();
  38.                     break;
  39.                 case JointID.HandRight:
  40.                     rightHandPosition = joint.Position.ToVector3();
  41.                     break;
  42.             }
  43.         }
  44.  
  45.         // HandsJoined
  46.         if (CheckHandsJoined(rightHandPosition, leftHandPosition))
  47.             return;
  48.  
  49.         // LeftHandOverHead
  50.         if (CheckHandOverHead(headPosition, leftHandPosition))
  51.         {
  52.             RaisePostureDetected(Posture.LeftHandOverHead);
  53.             return;
  54.         }
  55.  
  56.         // RightHandOverHead
  57.         if (CheckHandOverHead(headPosition, rightHandPosition))
  58.         {
  59.             RaisePostureDetected(Posture.RightHandOverHead);
  60.             return;
  61.         }
  62.  
  63.         // LeftHello
  64.         if (CheckHello(headPosition, leftHandPosition))
  65.         {
  66.             RaisePostureDetected(Posture.LeftHello);
  67.             return;
  68.         }
  69.  
  70.         // RightHello
  71.         if (CheckHello(headPosition, rightHandPosition))
  72.         {
  73.             RaisePostureDetected(Posture.RightHello);
  74.             return;
  75.         }
  76.  
  77.         previousPosture = Posture.None;
  78.         accumulator = 0;
  79.     }
  80.  
  81.     bool CheckHandOverHead(Vector3? headPosition, Vector3? handPosition)
  82.     {
  83.         if (!handPosition.HasValue || !headPosition.HasValue)
  84.             return false;
  85.  
  86.         if (handPosition.Value.Y < headPosition.Value.Y)
  87.             return false;
  88.  
  89.         if (Math.Abs(handPosition.Value.X - headPosition.Value.X) > MaxRange)
  90.             return false;
  91.  
  92.         if (Math.Abs(handPosition.Value.Z - headPosition.Value.Z) > MaxRange)
  93.             return false;
  94.  
  95.         return true;
  96.     }
  97.  
  98.  
  99.     bool CheckHello(Vector3? headPosition, Vector3? handPosition)
  100.     {
  101.         if (!handPosition.HasValue || !headPosition.HasValue)
  102.             return false;
  103.  
  104.         if (Math.Abs(handPosition.Value.X - headPosition.Value.X) < MaxRange)
  105.             return false;
  106.  
  107.         if (Math.Abs(handPosition.Value.Y - headPosition.Value.Y) > MaxRange)
  108.             return false;
  109.  
  110.         if (Math.Abs(handPosition.Value.Z - headPosition.Value.Z) > MaxRange)
  111.             return false;
  112.  
  113.         return true;
  114.     }
  115.  
  116.     bool CheckHandsJoined(Vector3? leftHandPosition, Vector3? rightHandPosition)
  117.     {
  118.         if (!leftHandPosition.HasValue || !rightHandPosition.HasValue)
  119.             return false;
  120.  
  121.         float distance = (leftHandPosition.Value - rightHandPosition.Value).Length();
  122.  
  123.         if (distance > Epsilon)
  124.             return false;
  125.  
  126.         RaisePostureDetected(Posture.HandsJoined);
  127.         return true;
  128.     }
  129.  
  130.     void RaisePostureDetected(Posture posture)
  131.     {
  132.         if (accumulator < AccumulatorTarget)
  133.         {
  134.             if (accumulatedPosture != posture)
  135.             {
  136.                 accumulator = 0;
  137.                 accumulatedPosture = posture;
  138.             }
  139.             accumulator++;
  140.             return;
  141.         }
  142.  
  143.         if (previousPosture == posture)
  144.             return;
  145.  
  146.         previousPosture = posture;
  147.         if (PostureDetected != null)
  148.             PostureDetected(posture);
  149.  
  150.         accumulator = 0;
  151.     }
  152. }

The PostureDetector class is based on a comparative analysis of the positions of each joint. For example, to trigger the Hello posture, we have to know if the hand is at the same height as the head and if it is at least 25cm on the side.

In addition, the system use an accumulator to ensure that the posture is stable during a given number of frames.

Once again this class is highly extensible.

Template-based search

The main drawback of the algorithmic search is that all gestures are not easily describable with constraints. We will therefore consider another more general approach.

We will assume that a gesture can be recorded and subsequently, the system will determine if the current gesture is one that is already known.

Finally, our goal will be to efficiently compare two gestures.

Compare the comparable

Before we start to write a comparison algorithm, we have to standardize our data.

Indeed, a gesture is a sequence of points (for this article we will simply compare 2D gestures such as circles). However, the coordinates of these points define a distance to the sensor and we will have to bring them together to a common reference. To do this we will do the following;

  1. Generate a new gesture with a defined number of points
  2. Rotate the gesture so the first point is at 0 degree
  3. Rescale the gesture to a 1x1 reference graduation
  4. Center the gesture to origin

image

With these changes, we will be able to compare points array of the same size centered on a common graduation with a common direction.

To pack points array using this technique, we use the following code:

  1. public static List<Vector2> Pack(List<Vector2> positions, int samplesCount)
  2. {
  3.     List<Vector2> locals = ProjectListToDefinedCount(positions, samplesCount);
  4.  
  5.     float angle = GetAngleBetween(locals.Center(), positions[0]);
  6.     locals = locals.Rotate(-angle);
  7.  
  8.     locals.ScaleToReferenceWorld();
  9.     locals.CenterToOrigin();
  10.  
  11.     return locals;
  12. }

Methods and extension methods are available in the GoldenSectionExtensions static class:

  1. public static class GoldenSectionExtensions
  2. {
  3.     // Get length of path
  4.     public static float Length(this List<Vector2> points)
  5.     {
  6.         float length = 0;
  7.  
  8.         for (int i = 1; i < points.Count; i++)
  9.         {
  10.             length += (points[i - 1] - points[i]).Length();
  11.         }
  12.  
  13.         return length;
  14.     }
  15.  
  16.     // Get center of path
  17.     public static Vector2 Center(this List<Vector2> points)
  18.     {
  19.         Vector2 result = points.Aggregate(Vector2.Zero, (current, point) => current + point);
  20.  
  21.         result /= points.Count;
  22.  
  23.         return result;
  24.     }
  25.  
  26.     // Rotate path by given angle
  27.     public static List<Vector2> Rotate(this List<Vector2> positions, float angle)
  28.     {
  29.         List<Vector2> result = new List<Vector2>(positions.Count);
  30.         Vector2 c = positions.Center();
  31.  
  32.         float cos = (float)Math.Cos(angle);
  33.         float sin = (float)Math.Sin(angle);
  34.  
  35.         foreach (Vector2 p in positions)
  36.         {
  37.             float dx = p.X - c.X;
  38.             float dy = p.Y - c.Y;
  39.  
  40.             Vector2 rotatePoint = Vector2.Zero;
  41.             rotatePoint.X = dx * cos - dy * sin + c.X;
  42.             rotatePoint.Y = dx * sin + dy * cos + c.Y;
  43.  
  44.             result.Add(rotatePoint);
  45.         }
  46.         return result;
  47.     }
  48.  
  49.     // Average distance betweens paths
  50.     public static float DistanceTo(this List<Vector2> path1, List<Vector2> path2)
  51.     {
  52.         return path1.Select((t, i) => (t - path2[i]).Length()).Average();
  53.     }
  54.  
  55.     // Compute bounding rectangle
  56.     public static Rectangle BoundingRectangle(this List<Vector2> points)
  57.     {
  58.         float minX = points.Min(p => p.X);
  59.         float maxX = points.Max(p => p.X);
  60.         float minY = points.Min(p => p.Y);
  61.         float maxY = points.Max(p => p.Y);
  62.  
  63.         return new Rectangle(minX, minY, maxX - minX, maxY - minY);
  64.     }
  65.  
  66.     // Check bounding rectangle size
  67.     public static bool IsLargeEnough(this List<Vector2> positions, float minSize)
  68.     {
  69.         Rectangle boundingRectangle = positions.BoundingRectangle();
  70.  
  71.         return boundingRectangle.Width > minSize && boundingRectangle.Height > minSize;
  72.     }
  73.  
  74.     // Scale path to 1x1
  75.     public static void ScaleToReferenceWorld(this List<Vector2> positions)
  76.     {
  77.         Rectangle boundingRectangle = positions.BoundingRectangle();
  78.         for (int i = 0; i < positions.Count; i++)
  79.         {
  80.             Vector2 position = positions[i];
  81.  
  82.             position.X *= (1.0f / boundingRectangle.Width);
  83.             position.Y *= (1.0f / boundingRectangle.Height);
  84.  
  85.             positions[i] = position;
  86.         }
  87.     }
  88.  
  89.     // Translate path to origin (0, 0)
  90.     public static void CenterToOrigin(this List<Vector2> positions)
  91.     {
  92.         Vector2 center = positions.Center();
  93.         for (int i = 0; i < positions.Count; i++)
  94.         {
  95.             positions[i] -= center;
  96.         }
  97.     }
  98. }


Golden Section

The comparison between our data could be done via a simple average distance function between each point. However, this solution is not accurate enough.

So I used a much more powerful algorithm called Golden Section Search. He comes from a paper available here: http://www.math.uic.edu/~jan/mcs471/Lec9/gss.pdf

A JavaScript implementation is available here: http://depts.washington.edu/aimgroup/proj/dollar/

The implementation in C#:

  1. public static float Search(List<Vector2> current, List<Vector2> target, float a, float b, float epsilon)
  2. {
  3.     float x1 = ReductionFactor * a + (1 - ReductionFactor) * b;
  4.     List<Vector2> rotatedList = current.Rotate(x1);
  5.     float fx1 = rotatedList.DistanceTo(target);
  6.  
  7.     float x2 = (1 - ReductionFactor) * a + ReductionFactor * b;
  8.     rotatedList = current.Rotate(x2);
  9.     float fx2 = rotatedList.DistanceTo(target);
  10.  
  11.     do
  12.     {
  13.         if (fx1 < fx2)
  14.         {
  15.             b = x2;
  16.             x2 = x1;
  17.             fx2 = fx1;
  18.             x1 = ReductionFactor * a + (1 - ReductionFactor) * b;
  19.             rotatedList = current.Rotate(x1);
  20.             fx1 = rotatedList.DistanceTo(target);
  21.         }
  22.         else
  23.         {
  24.             a = x1;
  25.             x1 = x2;
  26.             fx1 = fx2;
  27.             x2 = (1 - ReductionFactor) * a + ReductionFactor * b;
  28.             rotatedList = current.Rotate(x2);
  29.             fx2 = rotatedList.DistanceTo(target);
  30.         }
  31.     }
  32.     while (Math.Abs(b - a) > epsilon);
  33.  
  34.     float min = Math.Min(fx1, fx2);
  35.  
  36.     return 1.0f - 2.0f * min / Diagonal;
  37. }

With this algorithm, we can simply compare a template with the current gesture and get a score (between 0 and 1).

Learning Machine

To improve our success rate, we need to have multiple templates. For this, we will work with the LearningMachine class whose role is precisely to store our templates (i.e. to learn new models) and to compare with the current gesture:

  1. public class LearningMachine
  2. {
  3.     readonly List<RecordedPath> paths;
  4.  
  5.     public LearningMachine(Stream kbStream)
  6.     {
  7.         if (kbStream == null || kbStream.Length == 0)
  8.         {
  9.             paths = new List<RecordedPath>();
  10.             return;
  11.         }
  12.  
  13.         BinaryFormatter formatter = new BinaryFormatter();
  14.  
  15.         paths = (List<RecordedPath>)formatter.Deserialize(kbStream);
  16.     }
  17.  
  18.     public List<RecordedPath> Paths
  19.     {
  20.         get { return paths; }
  21.     }
  22.  
  23.     public bool Match(List<Vector2> entries, float threshold, float minimalScore, float minSize)
  24.     {
  25.         return Paths.Any(path => path.Match(entries, threshold, minimalScore, minSize));
  26.     }
  27.  
  28.     public void Persist(Stream kbStream)
  29.     {
  30.         BinaryFormatter formatter = new BinaryFormatter();
  31.  
  32.         formatter.Serialize(kbStream, Paths);
  33.     }
  34.  
  35.     public void AddPath(RecordedPath path)
  36.     {
  37.         path.CloseAndPrepare();
  38.         Paths.Add(path);
  39.     }
  40. }

Each RecordedPath implements the Match method and therefore call the Golden Section Search:

  1. public bool Match(List<Vector2> positions, float threshold, float minimalScore, float minSize)
  2. {
  3.     if (positions.Count < samplesCount)
  4.         return false;
  5.  
  6.     if (!positions.IsLargeEnough(minSize))
  7.         return false;
  8.  
  9.     List<Vector2> locals = GoldenSection.Pack(positions, samplesCount);
  10.  
  11.     float score = GoldenSection.Search(locals, points, -MathHelper.PiOver4, MathHelper.PiOver4, threshold);
  12.  
  13.     return score > minimalScore;
  14. }


Thus, thanks to our detection algorithm and to our learning machine, we are dealing with a system that is both reliable and very little dependent on the quality of information supplied by Kinect.

You will find below an example of a knowledge base used to recognize a circle: http://www.catuhe.com/msdn/circleKB.zip.

Conclusion

So we have at our disposal a set of tools for working with Kinect. In addition we have two systems to detect a large number of gestures.

It's now your turn to use these services in your Kinect applications!

To go further

Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post
  • You just have to record a new file using the Capture Circle button. If you switch to the button's code, you will see that I just call some methods on the gesturedetection class.

  • thank u David.. i ll have a look into those methods.

  • David, you're Ace, awesome work!

  • Hey David,

    Have you taken the CodePlex project down? I am seeing "This project is not yet published" error at the url kinecttoolkit.codeplex.com

  • Hi Daren, I just renamed the project to kinecttoolbox.codeplex.com :)

  • Is it possible in any way to replay the data in a .replay file with the kinect not plugged in?

  • Hi Andrea, for now the toolbox needs a Kinect to use the 2D<->3D conversion system even with the .replay data

  • thank you very much for the reply and for the awesome job! will it be possible in the future to replay the data without the kinect?

  • Sure I will add this feature to the next release

  • thank you again! :)

  • Hi David,

    I am new to C# and was trying my hands just to see kinect working..i have installed sdk & was able to run few samples & see some basic functions..but when i try to run the sample that i have downloaded from this link kinecttoolbox.codeplex.com/.../71078  i am unable to do so...can you let me know steps to set-up this project?

    Thanks & regards,

    saurabh

  • Can you be more explicit ? What kind of error did you get?

    Kinect Toolbox only need Visual studio 2010, Kinect for Windows SDK and Speech Recognition services

  • how can i make a template just like  circle?

  • In the Toolbox project you will find a project called GesturesViewer where I defined a button to record circle gesture. In the handler of the button, you will find the required code to record and create new types of gestures.

  • I tried out what u said, now i can make gestures,

    Now how must i define this

    IF i make a gesture, i want to be the person, controling the mouse cursor... in my program the kinect can recognize  the gesture.

    what can i do to make this possible ? i already have the code for controling the mouse cursor.

Page 2 of 4 (52 items) 1234