Unleash the power of Kinect for Windows SDK! - Eternal Coding - HTML5 / Windows / Kinect / 3D development - Site Home - MSDN Blogs

Unleash the power of Kinect for Windows SDK!


 

Unleash the power of Kinect for Windows SDK!

  • Comments 28

This post is a translation of http://blogs.msdn.com/b/eternalcoding/archive/2011/06/14/fr-prenez-le-contr-244-le-avec-kinect-pour-windows-sdk.aspx

image_thumb16

Introductionimage_thumb5[1]

Microsoft Research has released the first beta of Kinect for Windows SDK. You can find resources and download the SDK at:

http://research.microsoft.com/kinectsdk

This SDK also install drivers for the Kinect sensors. However, be aware that the SDK will only install on Windows 7 (x86 and x64).

The first point worth noting is that the latter is actually available in two versions: one for C++ developers and one for managed developers.

So there is no jealousy. As I prefer managed environment, my samples will be presented in C # (matter of taste only).

Regarding the licensing mode, this version released by Microsoft Research, is free for private use (basically, anything that is not commercial).

Architecture

Basically, the Kinect sensors will send a set of three streams:

image

Image stream can be displayed like with any other camera (for example to do augmented reality). The Kinect video sensor can return a stream with 2 resolutions: one at 640x480 (at 30 frames per second) and one at 1280x1024 (but at 15 frames per second).

The depth stream is the determining factor in our case. It will indeed add to each pixel a depth defined by the sensor. So in addition to the 2D position of each pixel (and color) we now have depth. This will greatly simplify the writing of shapes detection algorithms.

A third stream is sent from the sensor: it is the audio stream from the four microphones (more on this subject at the end of the article).

Therefore, the key point here concerns the ability of Kinect to give us three-dimensional data. Using the NUI library (which comes with the SDK and stand for Natural User Interfaces) you will be able to detect the presence of humans in front of the sensor. Kinect can "see" up to 4 peoples and accurately track both of them.

When Kinect precisely follows a person, it can provide a skeleton made ​​up of key points detected on the user:

image

As shown in this diagram of the Vitruvian Man, there are 20 key points (which we call joints) that are detected and tracked by the NUI library.

For best results, it is necessary to stand between 4 and 11 feet from the sensor. Beyond these limits, the sensors accuracy decreases quickly. It is also not possible (yet) to follow a user sitting in front of his computer.

Getting started

To use Kinect for Windows SDK in your .Net application, you only have to reference Microsoft.Research.Kinect.dll.

image_thumb9

Then you have two new namespaces, one for accessing video streams and skeletons and one for audio.

  1. using Microsoft.Research.Kinect.Nui;
  2. using Microsoft.Research.Kinect.Audio

To initialize the NUI library, you must instantiate an object from the Runtime class and configure the streams you want to receive:

  1. kinectRuntime = new Runtime();
  2. kinectRuntime.Initialize(RuntimeOptions.UseDepthAndPlayerIndex | RuntimeOptions.UseSkeletalTracking | RuntimeOptions.UseColor);

In our example, we initialize the library with the support of depth stream, video stream and tracking of skeletons.

Video buffer

To use the video stream, you must first define the awaited format, To do so, we'll ask the library to retrieve the data in a given resolution with a defined pixel format:

  1. kinectRuntime.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);

Here, we ask for a resolution of 640x480 with a RGB pixel format. It is also possible to request a resolution of 1280x1024 (with lower performances) and pixel can be in YUV format. Resolution as mentioned earlier has an impact on performances. This is not the case of pixel format so it can be chosen according to what suits better for your application.

Subsequently, to be informed of the availability of each image, you must subscribe to an event of the library:

  1. kinectRuntime.VideoFrameReady += kinectRuntime_VideoFrameReady;

In the handler of this event, we can simply produce a BitmapSource to display it in a WPF application:

  1. public class ColorStreamManager
  2. {
  3.     public BitmapSource ColorBitmap { get; private set; }
  4.  
  5.     public void Update(ImageFrameReadyEventArgs e)
  6.     {
  7.         PlanarImage Image = e.ImageFrame.Image;
  8.  
  9.         ColorBitmap = BitmapSource.Create(Image.Width, Image.Height, 96, 96, PixelFormats.Bgr32, null, Image.Bits, Image.Width * Image.BytesPerPixel);
  10.     }
  11. }


Depth buffer

Besides the video stream, Kinect can send a stream coming from the infrared sensor that gives depth data.

The initialization is similar to that of the video stream:

  1. kinectRuntime.DepthStream.Open(ImageStreamType.Depth, 2, ImageResolution.Resolution320x240, ImageType.DepthAndPlayerIndex);
  2. kinectRuntime.DepthFrameReady += kinectRuntime_DepthFrameReady;

The depth data are stored as arrays of 16 bits integers. The depth stream can be retrieved at 320x240 or 80x60.

  • The 13 high-order bits of each pixel represent the distance from the depth sensor to the closest object, in millimeters.
  • The 3 low-order bits of each pixel represent the index of the tracked user who is visible at the pixel's x and y coordinates.

So if you want to view the depth stream while coloring the areas occupied by humans, it is possible to use this code:

  1. void ConvertDepthFrame(ImageFrameReadyEventArgs e)
  2. {
  3.     depthFrame32 = new byte[e.ImageFrame.Image.Width * e.ImageFrame.Image.Height * 4];
  4.  
  5.     byte[] depthFrame16 = e.ImageFrame.Image.Bits;
  6.  
  7.     for (int i16 = 0, i32 = 0; i16 < depthFrame16.Length && i32 < depthFrame32.Length; i16 += 2, i32 += 4)
  8.     {
  9.         // R?cup?ration de l'utilisateur courant
  10.         int user = depthFrame16[i16] & 0x07;
  11.  
  12.         // Profondeur (en mm)
  13.         int realDepth = (depthFrame16[i16 + 1] << 5) | (depthFrame16[i16] >> 3);
  14.  
  15.         // Profondeur->Intensit?
  16.         byte intensity = (byte)(255 - (255 * realDepth / 0x0fff));
  17.  
  18.         depthFrame32[i32] = 0;
  19.         depthFrame32[i32 + 1] = 0;
  20.         depthFrame32[i32 + 2] = 0;
  21.         depthFrame32[i32 + 3] = 255;
  22.  
  23.         switch (user)
  24.         {
  25.             case 0: // no one
  26.                 depthFrame32[i32] = (byte)(intensity / 2);
  27.                 depthFrame32[i32 + 1] = (byte)(intensity / 2);
  28.                 depthFrame32[i32 + 2] = (byte)(intensity / 2);
  29.                 break;
  30.             case 1:
  31.                 depthFrame32[i32] = intensity;
  32.                 break;
  33.             case 2:
  34.                 depthFrame32[i32 + 1] = intensity;
  35.                 break;
  36.             case 3:
  37.                 depthFrame32[i32 + 2] = intensity;
  38.                 break;
  39.             case 4:
  40.                 depthFrame32[i32] = intensity;
  41.                 depthFrame32[i32 + 1] = intensity;
  42.                 break;
  43.             case 5:
  44.                 depthFrame32[i32] = intensity;
  45.                 depthFrame32[i32 + 2] = intensity;
  46.                 break;
  47.             case 6:
  48.                 depthFrame32[i32 + 1] = intensity;
  49.                 depthFrame32[i32 + 2] = intensity;
  50.                 break;
  51.             case 7:
  52.                 depthFrame32[i32] = intensity;
  53.                 depthFrame32[i32 + 1] = intensity;
  54.                 depthFrame32[i32 + 2] = intensity;
  55.                 break;
  56.         }
  57.     }
  58. }

This stream can be extremely useful to detect shapes. Thus, it is possible to monitor and detect hands or fingers movements to produce new ways to interact with the PC.

Skeleton tracking

One of the big strengths of Kinect for Windows SDK is its ability to discover the skeleton of joints of an human standing in front of the sensor. And unlike the hacks that have sprung up on Internet (like OpenNI), the Kinect for Windows SDK incorporates a very fast recognition system and requires no training to use. This is the result of a long training of a learning machine. Microsoft Research has given many examples to the recognition system to serve its apprenticeship.

So once you pass behind the sensor (at the right distance of course), the NUI library will discover your skeleton and will raise an event with useful data about it.

To enable skeleton tracking system, you must activate the depth stream and handle the appropriate event:

  1. kinectRuntime.DepthFrameReady += kinectRuntime_DepthFrameReady;

In the handler for this event, we can loop through all the skeletons found by the system:

  1. void kinectRuntime_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
  2. {
  3.     SkeletonFrame skeletonFrame = e.SkeletonFrame;
  4.  
  5.     foreach (SkeletonData data in skeletonFrame.Skeletons)
  6.     {
  7.         if (data.TrackingState == SkeletonTrackingState.Tracked)
  8.         {
  9.  
  10.             foreach (Joint joint in data.Joints)
  11.             {
  12.                 switch (joint.ID)
  13.                 {
  14.                     case JointID.HandLeft:
  15.                         if (joint.Position.W > 0.6f) // Quality check
  16.                             leftHandGestureRecognizer.Add(joint.Position.ToVector3());
  17.                         break;
  18.                     case JointID.HandRight:
  19.                         if (joint.Position.W > 0.6f) // Quality check
  20.                             rightHandGestureRecognizer.Add(joint.Position.ToVector3());
  21.                         break;
  22.                 }
  23.             }
  24.             return;
  25.         }
  26.     }
  27. }

Several points are noteworthy here:

  • The NUI library cannot track more than 2 skeletons. It is the property TrackingState == SkeletntrackingState.Tracked that defines whether a skeleton is 'tracked' or not. The untracked skeletons only give their position.
  • Each joint has a Position property that is defined by a Vector4: (x, y, z, w). The first three attributes define the position in camera space. The last attribute (w) gives the quality level (between 0 and 1) of the position. This allows you to filter and take only the data that the library is almost certain.

image

  • Each skeleton has a property TrackingID which remains the same on every frame. This allows us to uniquely identify the skeletons between each call.
  • Each joint is identified by an enum which define its reference position (hands, head, etc.).

It is also possible to pool the current skeletons with the SkeletonEngine.GetNextFrame() method.

Finally, the NUI library provides an algorithm for filtering and smoothing incoming data from the sensor. Indeed, by default, the skeletons data are sent without smoothing or filtering. However, the Kinect depth sensor has not sufficient resolution to ensure consistent accuracy over time. Thus, the data seem to vibrate around their positions. To correct this problem, you can call this code:

  1. kinectRuntime.SkeletonEngine.TransformSmooth = true;
  2. var parameters = new TransformSmoothParameters
  3. {
  4.     Smoothing = 1.0f,
  5.     Correction = 0.1f,
  6.     Prediction = 0.1f,
  7.     JitterRadius = 0.05f,
  8.     MaxDeviationRadius = 0.05f
  9. };
  10. kinectRuntime.SkeletonEngine.SmoothParameters = parameters;

As we can see, it is possible to smooth and correct data. Depending on what you need, you should manipulate these parameters to provide the best experience possible.

It is now your imagination’s turn to propose future applications. Thus, it is possible to look for gestures to control applications (the famous PowerPoint Jedi control) or even to make fun with augmented reality...

Kinect and sound

Kinect comes with a group of four microphones (microphone array) that capture sound at very high quality. Indeed, directly on the sensor, a signal processor (DSP) is used to remove background noise and cancel echo effects.

Moreover, thanks to its group of microphones, Kinect can provide the direction of the recorded sound source (beamforming). Then it becomes possible to know who is speaking in a meeting for example.

The Kinect for Windows SDK is also able to be a source for Microsoft.Speech API and thus it becomes possible to do voice recognition using Kinect.

So to use all of these services, simply instantiate an object of class KinectAudioSource:

  1. var source = new KinectAudioSource {SystemMode = SystemMode.OptibeamArrayOnly};

This instantiation can select microphone array mode or single microphone mode with or without echo cancellation (Audio Echo Cancellation: AEC).

To start capturing, we'll just ask our KinectAudioSource to start providing us the audio stream:

  1. byte[] buffer = new byte[16000];
  2. var audioStream = source.Start();
  3. audioStream.Read(buffer, 0, buffer.Length);

The audio is in 16-kHz, 16-bit mono pulse code modulation (PCM).

To use the beamforming services, use this code:

  1. source.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam;

There are several ways to detect the sound beam. Here, we let the system itself selects the appropriate beam. It would also be possible to make the selection manually, or take only the central beam or use the average of all beams.

Then, the system can raise an event when the beam moves:

  1. source.BeamChanged += source_BeamChanged;
  1. static void source_BeamChanged(object sender, BeamChangedEventArgs e)
  2. {
  3.     Console.WriteLine("Angle : {0} radians", e.Angle);
  4. }

The returned angle is in radians and is relative to the center of your Kinect. If you are in front of it: an angle of 0 indicates that the sound comes across the sensor, an angle <0 indicates that the sound comes from left and an angle> 0 indicates that the sound comes from the right.

At any time, it is also possible to request the current value of the angle by calling source.SoundSourcePosition:

  1. if (source.SoundSourcePositionConfidence > 0.9)
  2.     Console.Write("Position (radians): {0}", source.SoundSourcePosition);


Like the positions of the skeletons joints, we can get a data level of quality to decide if we want to use it.

Regarding filters (anti-echo and noise reduction) the KinectAudioSource class provides updatable parameters to achieve the expected results.

Conclusion

So we could see that the Kinect for Windows SDK provides many tools to play with. We are about to discover a lot of new kinds of interaction between man and machine. Of course it lacks some high level tools (like a library of gestures), but it's an easy bet that the next few weeks will show up many projects from the community and Microsoft (for example the Coding4Fun Kinect Toolkit on CodePlex).

To go further

Leave a Comment
  • Please add 7 and 6 and type the answer here:
  • Post
  • We've recently released a cross platform open source machine vision library.  It is actually quit a bit simpler than much of this code as well.

    camera = Kinect()

    camera.getDepth() #This will get your depth array

    camera.getDepthImage() # Get's the image.

    Many other libraries also integrated as well, give it a look:

    http://www.simplecv.org

  • Hi, I had a problem. It mentions leftHandGestureRecognizer.Add(joint.Position.ToVector3()); what exactly is the data type of leftHandGestureRecognizer? Am I missing an assembly reference?

  • Hi Shanuapril,

    you can download the code used by this article here:

    download.microsoft.com/.../Kinexna.zip

    But I recommend you to read this article also:

    blogs.msdn.com/.../gestures-and-tools-for-kinect.aspx

  • Thank you! It helped!

  • hey may i know how to determine a specific value for the X and Y axis of the left and right hand? for example: when both hands are down, the flame will appear? are u able to provide the source code?

  • If you talk about the source code for this article, just look at the previous comments.

    You may want to have a look at this one too:

    blogs.msdn.com/.../gestures-and-tools-for-kinect.aspx

  • hey really thanks for the reply. here is another question. i need to make the flame appear on hand when my hand are at a certain position. Are u able to help?

  • or do u have any source code for it? like when my left and right hand are below the hip, then the flame will appear on hand and wherever my hands moves, the flame moves with the hand

  • Hello yes I have code for that:)

    Just have a look at my light saber simulator on channel9:

    channel9.msdn.com/.../Of-course-our-first-Kinect-for-Windows-SDK-Project-has-to-involve-a-Light-Saber

  • i have seen the light saber....but now i need is to make the flame appear on hand. Are u able to help?

  • actually i did seen the light saber one. But how come when i run it....it has error? are u able to tell me what is wrong or i did not install anthing?

  • you must install xna game studio 4

  • yes i have install it. To make the flame appear on hand one is it roughly the same as your light saber one?

  • yes i install....ok then now the flame to appear on hand is it roughly the same as your light saber?

  • yes install. To make the flame appear on hand is it roughly the same as your light saber?

Page 1 of 2 (28 items) 12