Image and Video processing applications have been always interesting and impressive. You can easily “wow” your audience with the magical stuff you can do with your machine.

The issue we always have with this kind of applications; we need a specific set of skills in math and good understanding of the image processing science. It is easy to receive the image stream from the camera to the computer, but its really hard to make the machine understands these images. Other issue is the hardware. You need a good PC with high specs to handle all this kind of heavy processes.

image


Kinect is a motion sensing device, created by Microsoft to be used in Xbox (and recently on Windows PCs). It’s a new way to interact with your system. You can control your system using hand and body gestures.

If you never heard of Kinect, then you have to watch the following 2 minutes video:

 

Now that we agreed how awesome the Kinect sensor is. Lets start by understanding the hardware.

Kinect Hardware

image

First of all, let me clarify that Kinect sensor isn’t just a camera! (Some of you would be like “duh”, but I met some people asking me; “why would we use Kinect if we have an HD web camera?”).
Kinect device comes with an RGB camera which receives a stream of colored images. (so yeah, this part is just a camera). Next to this camera, we have two IR sensors, Emitter and Receiver. These two sensors are used to get the depth of the objects from the Kinect sensor. They are used to get the skeleton of the bodies (as we’ll see in the part 2 of this post). We have also an array of three microphones. I can hear you thinking; “Why would we need three mics instead of one?” Kinect uses these three microphones to get the direction of the sound source. So lets say you’re building a robot that receives the input from the Kinect. You can make this robot listens and decides which direction the sound comes from, and then rotates to that direction. Kinect uses the three microphones also for noise cancellation (Three microphones are always better than one). Finally, you can see the motorized tilt, which can be controlled to tilt the Kinect sensor by calling some simple methods.

 

Kinect for Xbox vs. Kinect For Windows

To start developing your application, you will need to get a Kinect Device. You have two options; Either to get the Kinect for Xbox (which comes with Xbox console). Or order “Kinect for Windows” device separately.
Note: if you going to get the “Kinect for Xbox” , you will also need to order the USB cable to connect it to your machine.

These are some differences between the two devices:

1- Default vs. Near modes

image
In the Kinect for Windows, Microsoft added a new mode called “Near” mode which enables the sensor to get objects from 40cm to 4m. In the other hand, Kinect for Xbox recognizes objects only from 80cm to 4m.

2- Standing vs. Seated modes

image
The new “Seated” mode, makes the Kinect for Windows more PC environment friendly. So Kinect can understand gestures from the upper body and no need to stand up. Kinect for Xbox supports only the “Standing” mode

3- Making commercial applications

This is not a technical point. But If you want to create an application for commercial use, then you have to get the Kinect for Windows device. Besides you have to get some license from Microsoft to use it in public.
If you don’t have an access to the Kinect for Windows device, then I suggest you start developing your app using Kinect for Xbox and once you’re done you can get the Windows device. Both devices use the same set of APIs.

 

Kinect Architecture

image
To develop applications on Kinect, you can either use “C++” to write native application which has direct access to the Kinect drivers. Or you can use any “.Net” language like VB.Net or C#, which will use Kinect.dll, Speech.dll and .Net Framework to write managed applications.
Right now, Kinect APIs can be accessed from WPF, WinForms or Console applications.

Note: You can’t use Windows Store app to access the Kinect APIs. But anyway in that case, its better to use WPF, as it will use very similar XAML code and full access to the .NET 4.5 Framework. (Also, it doesn’t make sense to use WindowsRT tablet to run a Kinect Application).

 

First App

Okay, enough talking and lets start coding. What I love about any set of APIs Microsoft provides, is that its extremely easy to use, and just makes sense!

Lets start by creating a WPF project. Using Visual Studio 2012, create a new project, select Windows-> WPF and enter the name of the project
image

 

Open the MainWindow.xaml, and add an image which will be used to show the image stream from Kinect. And change the width and height properties as the following snippet:

<Window x:Class="FirstApp.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow"  Width="620" Height="820">
    <Canvas x:Name="skeletonCanvas"  Width="620" Height="820" >
        <Image x:Name="kinectImage" VerticalAlignment="Top" Width="620" Height="820" />
    </Canvas>
</Window>

Now Open the MainWindow.xaml.cs, and register the Loaded event for the MainWindow, and add the KinectSensor variable. Your constructor method should look like this now:


KinectSensor myKinect;
public MainWindow()
{
    InitializeComponent();
    this.Loaded += Window_Loaded;
}

In our application, we will need one Kinect Sensor variable. You can control up to four sensors.
Note: As the amount of data coming from the Kinect device to your machine is huge, you will need to connect the USB directly to your machine, don’t use any USB hubs.

In the Window Loaded event, add the following code:

private void Window_Loaded(object sender, RoutedEventArgs e)
{
    myKinect = KinectSensor.KinectSensors[0];
    myKinect.ColorStream.Enable();
    myKinect.ColorFrameReady +=          new EventHandler<ColorImageFrameReadyEventArgs>
                           (myKinect_ColorFrameReady);
    myKinect.Start();
}


First, we assigned the first Kinect Sensor to “myKinect” variable. Then we enable the ColorStream which enables the input from the RGB camera. After than we register the event ColorFrameReady, which will fire every time we get a new frame from the Kinect device (so 30 times per second). Finally we start the Kinect by calling the start method.

Now lets have a look at the EvenHandler we have:

 

void myKinect_ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
    using (var colorFrame = e.OpenColorImageFrame())
    {
        byte[] colorData = new byte[colorFrame.PixelDataLength];
        colorFrame.CopyPixelDataTo(colorData);
        kinectImage.Source = BitmapSource.Create(colorFrame.Width,
                                                    colorFrame.Height,
                                                    96, 96,
                                                    PixelFormats.Bgr32,
                                                    null,
                                                    colorData,
                                                    colorFrame.Width * colorFrame.BytesPerPixel);
    }
}
 

We start by opening the color image frame by the keyword “using” (“using” keyword makes sure that the system can dispose of frame when we’re done). To show the image we need to convert the ImageFrame to array of byes. Kinect API has a method for that! We use CopyPixelDataTo, and then create the BitmapSource.

Now by running your application, you can see the video from the Kinect device shows on your screen.

Impressive, ha? not really. I’m sure you didn’t get the Kinect device to use it as a camera. So let’s do one more thing.

Notice when we get the Image, and convert it to array of bytes, We get an access to each byte in that frame. That means we can do a lot of cool stuff to it, for example we can apply some filters to the bytes. Lets start by something simple. Lets get the modulus of each byte to get some funky image.

void myKinect_ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
    using (var colorFrame = e.OpenColorImageFrame())
    {
        byte[] colorData = new byte[colorFrame.PixelDataLength];
        colorFrame.CopyPixelDataTo(colorData);
        for (int i = 0; i < colorData.Length; i++)
              colorData[i] %= 60;
        kinectImage.Source = BitmapSource.Create(colorFrame.Width,
                                                    colorFrame.Height,
                                                    96, 96,
                                                    PixelFormats.Bgr32,
                                                    null,
                                                    colorData,
                                                    colorFrame.Width * colorFrame.BytesPerPixel);
    }
}

This simple “filter” will get the modulus of each bye, and we will get a funky dark image. You can replace this for loop with your own filter (you can find a lot of these filters online).

 

That was a simple introduction on developing applications using Kinect SDK. In the part 2 of this post, we will talk about using the depth & skeleton, to create cool augmented reality applications

 

You can read more about Kinect & other MS techs from my blog