• Kinect for Windows Product Blog

    Kinect for Windows expands its developer preview program

    • 0 Comments

    Last June, we announced that we would be hosting a limited, exclusive developer preview program for Kinect for Windows v2 prior to its general availability in the summer (northern hemisphere) of 2014. And a few weeks ago, we began shipping Kinect for Windows v2 Developer Preview kits to thousands of participants all over the world.

    It’s been exciting to hear from so many developers as they take their maiden voyage with Microsoft’s new generation NUI technology. We’ve seen early unboxing videos that were recorded all over the world, from London to Tokyo. We’ve heard about some promising early experiments that are taking advantage of the higher resolution data and the ability to see six people.  People have told us about early success with the new sensor’s ability to track the tips of hands and thumbs.  And some developers have even described how easy it’s been to port their v1 apps to the new APIs.

    Kinect for Windows v2 Developer Preview kit (Photo courtesy of Vladimir Kolesnikov [@vladkol], a developer preview program participant)
    Kinect for Windows v2 Developer Preview kit
    (Photo courtesy of Vladimir Kolesnikov [@vladkol], a developer preview program participant)

    But we’ve also heard from many people who were not able to secure a place in the program and are eager to get their hands on the Kinect for Windows v2 sensor and SDK as soon as possible. For everyone who has been hoping and waiting, we’re pleased to announce that we are expanding the program so that more of you can participate!

    We are creating 500 additional developer preview kits for people who have great ideas they want to bring to life with the Kinect for Windows sensor and SDK. Like before, the program is open to professional developers, students, researchers, artists, and other creative individuals.

    The program fee is US$399 (or local equivalent) and offers the following benefits:

    • Direct access to the Kinect for Windows engineering team via a private forum and exclusive webcasts
    • Early SDK access (alpha, beta, and any updates along the way to release)
    • Private access to all API and sample documentation
    • A pre-release version of the new generation sensor
    • A final, released sensor at launch next summer (northern hemisphere)

    Applications must be completed and submitted by January 31, 2014, at 9:00 A.M. (Pacific Time), but don’t wait until then to apply! We will award positions in the program on a rolling basis to qualified applicants. Once all 500 kits have been awarded, the application process will be closed.

    Learn more and apply now

    The Kinect for Windows Team

    Key links

  • Kinect for Windows Product Blog

    Using WebGL to Render Kinect Webserver Image Data

    • 1 Comments

    Our 1.8 release includes a sample called WebserverBasics-WPF that shows how HTML5 web applications can leverage our webserver component and JavaScript API to get data from Kinect sensors. Among other things, the client-side code demonstrates how to bind Kinect image streams (specifically, the user viewer and background removal streams) to HTML canvas elements in order to render the sequence of images as they arrive so that users see a video feed of Kinect data.

    In WebserverBasics-WPF, the images are processed by first sending the image data arriving from the server to a web worker thread, which then copies the data pixel-by-pixel into a canvas image data structure which can then be rendered via the canvas “2d” context. While this solution was adequate for our needs, being able to process more than the minimum required 30 frames per second of Kinect image data, we would start dropping image frames and get a kind of stutter in the video feed when we attempted to display data from more than one image stream (e.g.: user viewer + background removal streams) simultaneously. Even when displaying only one image stream at a time there was a noticeable load added to the computer's CPU. This reduced the amount of multitasking that the system could perform.

    Now that the latest version of every major browser supports the WebGL API we can get even better performance without requiring a dedicated background worker thread (that can occupy a full CPU core).  While you should definitely test on your own hardware, using WebGL gave me over 3x improvement in image fps capability—and I don’t even have a high-end GPU!
    Also, once we are using WebGL it is very easy to apply additional image processing or perform other kinds of tasks without adding latency or burdening the CPU. For example, we can use convolution kernels to perform edge detection to process this image
    Original

    and obtain this image:
    EdgeDetected

    So let’s send some Kinect data over to WebGL!

    Getting Started

    1. Make sure you have the Kinect for Windows v1.8 SDK and Toolkit installed
    2. Make sure you have a WebGL-compatible web browser installed
    3. Get the WebserverWebGL sample code, project and solution from CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (also available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).


    Note: The entirety of this post focuses on JavaScript code that runs on the browser client. Also, this post is meant to provide a quick overview on how to use WebGL functionality to render Kinect data. For a more comprehensive tutorial on WebGL itself you can go here or here.

    Encapsulating WebGL Functionality

    WebGL requires a non-trivial amount of setup code so, to avoid cluttering the main sample code in SamplePage.html, we defined a KinectWebGLHelper object constructor in KinectWebGLHelper.js file. This object exposes 3 functions:

    • bindStreamToCanvas(DOMString streamName, HTMLCanvasElement canvas)
      Binds the specified canvas element with the specified image stream.
      This function mirrors KinectUIAdapter.bindStreamToCanvas function, but uses canvas “webgl” context rather than the “2d” context.
    • unbindStreamFromCanvas(DOMString streamName)
      Unbinds the specified image stream from previously bound canvas element, if any.
      This function mirrors KinectUIAdapter.unbindStreamToCanvas function.
    • getMetadata(DOMString streamName)
      Allows clients to access the “webgl” context managed by KinectWebGLHelper object.


    The code modifications (relative to code in WebserverBasics-WPF) necessary to get code in SamplePage.html to start using this helper object are fairly minimal. We replaced

    uiAdapter.bindStreamToCanvas(
        Kinect.USERVIEWER_STREAM_NAME,
        userViewerCanvasElement);
    uiAdapter.bindStreamToCanvas(
        Kinect.BACKGROUNDREMOVAL_STREAM_NAME,
        backgroundRemovalCanvasElement);

    with

    glHelper = new KinectWebGLHelper(sensor); 
    glHelper.bindStreamToCanvas(
        Kinect.USERVIEWER_STREAM_NAME,
        userViewerCanvasElement);
    glHelper.bindStreamToCanvas(
        Kinect.BACKGROUNDREMOVAL_STREAM_NAME,
        backgroundRemovalCanvasElement);

    Additionally, we used the glHelper instance for general housekeeping such as clearing the canvas state whenever it’s supposed to become invisible.

    The KinectWebGLHelper further encapsulates the logic to actually set up and manipulate the WebGL context within an “ImageMetadata” object constructor.

    Setting up the WebGL context

    The first step is to get the webgl context and set up the clear color. For historical reasons, some browsers still use “experimental-webgl” rather than “webgl” as the WebGL context name:

    var contextAttributes = { premultipliedAlpha: true }; 
    var glContext = imageCanvas.getContext('webgl', contextAttributes) ||
        imageCanvas.getContext('experimental-webgl', contextAttributes);
    glContext.clearColor(0.0, 0.0, 0.0, 0.0); // Set clear color to black, fully transparent

    Defining a Vertex Shader

    Next we define a geometry to render plus a corresponding vertex shader. When rendering a 3D scene to a 2D screen the vertex shader would typically transform a 3D world-space coordinate into a 2D screen-space coordinate but, since we’re rendering 2D image data coming from a Kinect sensor into a 2D screen, we just need to define a rectangle using 2D coordinates and map the Kinect image onto this rectangle as a texture so the vertex shader ends up being pretty simple:

    // vertices representing entire viewport as two triangles which make up the whole 
    // rectangle, in post-projection/clipspace coordinates
    var VIEWPORT_VERTICES = new Float32Array([
        -1.0, -1.0,
        1.0, -1.0,
        -1.0, 1.0,
        -1.0, 1.0,
        1.0, -1.0,
        1.0, 1.0]);
    var NUM_VIEWPORT_VERTICES = VIEWPORT_VERTICES.length / 2;
    // Texture coordinates corresponding to each viewport vertex
    var VERTEX_TEXTURE_COORDS = new Float32Array([
        0.0, 1.0,
        1.0, 1.0,
        0.0, 0.0,
        0.0, 0.0,
        1.0, 1.0,
        1.0, 0.0]);

    var vertexShader = createShaderFromSource(
        glContext.VERTEX_SHADER,
    "\
    attribute vec2 aPosition;\
    attribute vec2 aTextureCoord;\
    \
    varying highp vec2 vTextureCoord;\
    \
    void main() {\
         gl_Position = vec4(aPosition, 0, 1);\
         vTextureCoord = aTextureCoord;\
    }");

    We specify the shader program as a literal string that gets compiled by the WebGL context. Note that you could instead choose to get the shader code from the server as a separate resource from a designated URI.
    We also need to let the shader know how to find its input data:

    var positionAttribute = glContext.getAttribLocation(
        program,
        "aPosition");
    glContext.enableVertexAttribArray(positionAttribute);

    var textureCoordAttribute = glContext.getAttribLocation(
        program,
        "aTextureCoord");
    glContext.enableVertexAttribArray(textureCoordAttribute);

    // Create a buffer used to represent whole set of viewport vertices
    var vertexBuffer = glContext.createBuffer();
    glContext.bindBuffer(
        glContext.ARRAY_BUFFER,
        vertexBuffer);
    glContext.bufferData(
        glContext.ARRAY_BUFFER,
       VIEWPORT_VERTICES,
        glContext.STATIC_DRAW);
    glContext.vertexAttribPointer(
        positionAttribute,
        2,
        glContext.FLOAT,
        false,
        0,
        0);

    // Create a buffer used to represent whole set of vertex texture coordinates
    var textureCoordinateBuffer = glContext.createBuffer();
    glContext.bindBuffer(
        glContext.ARRAY_BUFFER,
        textureCoordinateBuffer);
    glContext.bufferData(
        glContext.ARRAY_BUFFER,
        VERTEX_TEXTURE_COORDS,
        glContext.STATIC_DRAW);
    glContext.vertexAttribPointer(textureCoordAttribute,
        2,
        glContext.FLOAT,
        false,
        0,
        0);

    // Create a texture to contain images from Kinect server
    // Note: TEXTURE_MIN_FILTER, TEXTURE_WRAP_S and TEXTURE_WRAP_T parameters need to be set
    // so we can handle textures whose width and height are not a power of 2.
    var texture = glContext.createTexture();
    glContext.bindTexture(
        glContext.TEXTURE_2D,
        texture);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_MAG_FILTER,
        glContext.LINEAR);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_MIN_FILTER,
        glContext.LINEAR);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_WRAP_S,
        glContext.CLAMP_TO_EDGE);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_WRAP_T,
        glContext.CLAMP_TO_EDGE);
    glContext.bindTexture(
        glContext.TEXTURE_2D,
        null);

    Defining a Fragment Shader

    Fragment Shaders (also known as Pixel Shaders) are used to compute the appropriate color for each geometry fragment. This is where we’ll sample color values from our texture and also apply the chosen convolution kernel to process the image.

    // Convolution kernel weights (blurring effect by default) 
    var CONVOLUTION_KERNEL_WEIGHTS = new Float32Array([
        1, 1, 1,
        1, 1, 1,
        1, 1, 1]);
    var TOTAL_WEIGHT = 0;
    for (var i = 0; i < CONVOLUTION_KERNEL_WEIGHTS.length; ++i) {
        TOTAL_WEIGHT += CONVOLUTION_KERNEL_WEIGHTS[i];
    }

    var fragmentShader = createShaderFromSource(
        glContext.FRAGMENT_SHADER,
        "\
        precision mediump float;\
        \
        varying highp vec2 vTextureCoord;\
        \
        uniform sampler2D uSampler;\
        uniform float uWeights[9];\
        uniform float uTotalWeight;\
        \
        /* Each sampled texture coordinate is 2 pixels appart rather than 1,\
        to make filter effects more noticeable. */ \
        const float xInc = 2.0/640.0;\
        const float yInc = 2.0/480.0;\
        const int numElements = 9;\
        const int numCols = 3;\
        \
        void main() {\
            vec4 centerColor = texture2D(uSampler, vTextureCoord);\
            vec4 totalColor = vec4(0,0,0,0);\
            \
            for (int i = 0; i < numElements; i++) {\
                int iRow = i / numCols;\
                int iCol = i - (numCols * iRow);\
                float xOff = float(iCol - 1) * xInc;\
                float yOff = float(iRow - 1) * yInc;\
                vec4 colorComponent = texture2D(\
                    uSampler,\
                    vec2(vTextureCoord.x+xOff, vTextureCoord.y+yOff));\
                totalColor += (uWeights[i] * colorComponent);\
            }\
            \
            float effectiveWeight = uTotalWeight;\
            if (uTotalWeight <= 0.0) {\
                effectiveWeight = 1.0;\
            }\
            /* Premultiply colors with alpha component for center pixel. */\
            gl_FragColor = vec4(\
                totalColor.rgb * centerColor.a / effectiveWeight,\
                centerColor.a);\
        
    }");

    Again, we specify the shader program as a literal string that gets compiled by the WebGL context and, again, we need to let the shader know how to find its input data:

    // Associate the uniform texture sampler with TEXTURE0 slot 
    var textureSamplerUniform = glContext.getUniformLocation(
        program,
        "uSampler");
    glContext.uniform1i(textureSamplerUniform, 0);

    // Since we're only using one single texture, we just make TEXTURE0 the active one
    // at all times
    glContext.activeTexture(glContext.TEXTURE0);

    Drawing Kinect Image Data to Canvas

    After getting the WebGL context ready to receive data from Kinect, we still need to let WebGL know whenever we have a new image to be rendered. So, every time that the KinectWebGLHelper object receives a valid image frame from the KinectSensor object, it calls the ImageMetadata.processImageData function, which looks like this:

    this.processImageData = function(imageBuffer, width, height) { 
        if ((width != metadata.width) || (height != metadata.height)) {
            // Whenever the image width or height changes, update tracked metadata and canvas
            // viewport dimensions.
            this.width = width;
            this.height = height;
            this.canvas.width = width;
            this.canvas.height = height;
            glContext.viewport(0, 0, width, height);
        }

        glContext.bindTexture(
            glContext.TEXTURE_2D,
            texture);
        glContext.texImage2D(
            glContext.TEXTURE_2D,
            0,
            glContext.RGBA,
            width,
            height,
            0,
            glContext.RGBA,
            glContext.UNSIGNED_BYTE,
            new Uint8Array(imageBuffer));

        glContext.drawArrays(
            glContext.TRIANGLES,
            0,
            NUM_VIEWPORT_VERTICES);
        glContext.bindTexture(
            glContext.TEXTURE_2D,
            null);
    };

    Customizing the Processing Effect Applied to Kinect Image

    You might have noticed while reading this post that the default value for CONVOLUTION_KERNEL_WEIGHTS provided by this WebGL sample maps to the following 3x3 convolution kernel:

    1 1 1
    1 1 1
    1 1 1

    and this corresponds to a blurring effect. The following table shows additional examples of effects that can be achieved using 3x3 convolution kernels:

        

    Effect

    Kernel Weights

    Resulting Image

    Original

    0 0 0
    0 1 0
    0 0 0
    Original_sample

    Blurring

    1 1 1
    1 1 1
    1 1 1
    Blurred_sample

    Sharpening

    0 -1 0
    -1 5 -1
    0 -1 0
    Sharpened_sample

    Edge Detection

    -1 0 1
    -2 0 2
    -1 0 1
    EdgeDetected_sample

    It is very easy to experiment with different weights of a 3x3 kernel to apply these and other effects by changing the CONVOLUTION_KERNEL_WEIGHTS coefficients and reloading application in browser. Other kernel sizes can also be supported by changing the fragment shader and its associated setup code.

    Summary

    The new WebserverWebGL sample is very similar in user experience to WebserverBasics-WPF, but the fact that it uses the WebGL API to leverage the power of your GPU means that your web applications can perform powerful kinds of Kinect data processing without burdening the CPU or adding latency to your user experience. We didn't add WebGL functionality previously because it was only recently that WebGL became supported in all major browsers. If you're not sure if your clients will have a WebGL-compatible browser but still want to guarantee they can display image-stream data, you should implement a hybrid approach that uses "webgl" canvas context when available and falls back to using "2d" context otherwise.

    Happy coding!

    Additional Resources

     

    Eddy Escardo-Raffo
    Senior Software Development Engineer
    Kinect for Windows

  • Kinect for Windows Product Blog

    Thousands of developers are participating in Kinect for Windows v2 Developer Preview—starting today

    • 1 Comments

    In addition to being a great day for Xbox One, today is also a great day for Kinect for Windows. We have started delivering Kinect for Windows v2 Developer Preview kits to program participants. The Developer Preview includes a pre-release Kinect for Windows v2 sensor, access to the new generation Kinect for Windows software development kit (SDK), as well as ongoing updates and access to private program forums. Participants will also receive a Kinect for Windows v2 sensor when they become available next summer (northern hemisphere).

    Microsoft is committed to making the Kinect for Windows sensor and SDK available early to qualifying developers and designers so they can prepare to have their new-generation applications ready in time for general availability next summer. We continue to see a groundswell for Kinect for Windows. We received thousands of applications for this program and selected participants based on the applicants’ expertise, passion, and the raw creativity of their ideas. We are impressed by the caliber of the applications we received and look forward to seeing the innovative NUI experiences our Developer Preview customers will create.

    The new Kinect for Windows v2 sensor will feature the core capabilities of the new Kinect for Xbox One sensor. With the first version of Kinect for Xbox 360, developers and businesses saw the potential to apply the technology beyond gaming—in many different computing environments. Microsoft believes that the opportunities for revolutionizing computing experiences will be even greater with this new sensor. The benefits will raise the bar and accelerate the development of NUI applications across multiple industries, from retail and manufacturing to healthcare, education, communications, and more:

    Real Vision
    Kinect Real Vision technology dramatically expands its field of view for greater line of sight. An all-new active IR camera enables it to see in the dark. And by using advanced three-dimensional geometry, it can even tell if you’re standing off balance.

    Real Motion
    Kinect Real Motion technology tracks even the slightest gestures. So a simple squeeze of your hand results in precise control over an application, whether you’re standing up or sitting down.

    Real Voice
    Kinect Real Voice technology focuses on the sounds that matter. Thanks to an all-new multi-microphone array, the advanced noise isolation capability lets the sensor know who to listen to, even in a crowded space.

    2014 will be exciting, to say the least. We will keep you updated as the Developer Preview program evolves and we get closer to the Kinect for Windows v2 worldwide launch next summer. Additionally, follow the progress of the early adopter community by keeping an eye on them (#k4wdev) and by following us (@kinectwindows).

    The Kinect for Windows Team

    Key links

  • Kinect for Windows Product Blog

    Using Kinect Background Removal with Multiple Users

    • 7 Comments

    Introduction: Background Removal in Kinect for Windows

    The 1.8 release of the Kinect for Windows Developer Toolkit includes a component for isolating a user from the background of the scene. The component is called the BackgroundRemovedColorStream. This capability has many possible uses, such as simulating chroma-key or “green-screen” replacement of the background – without needing to use an actual green screen; compositing a person’s image into a virtual environment; or simply blurring out the background, so that video conference participants can’t see how messy your office really is.

    BackgroundRemovalBasics

    To use this feature in an application, you create the BackgroundRemovedColorStream, and then feed it each incoming color, depth, and skeleton frame when they are delivered by your Kinect for Windows sensor. You also specify which user you want to isolate, using their skeleton tracking ID. The BackgroundRemovedColorStream produces a sequence of color frames, in BGRA (blue/green/red/alpha) format. These frames are identical in content to the original color frames from the sensor, except that the alpha channel is used to distinguish foreground pixels from background pixels. Pixels that the background removal algorithm considers part of the background will have an alpha value of 0 (fully transparent), while foreground pixels will have their alpha at 255 (fully opaque). The foreground region is given a smoother edge by using intermediate alpha values (between 0 and 255) for a “feathering” effect. This image format makes it easy to combine the background-removed frames with other images in your application.

    As a developer, you get the choice of which user you want in the foreground. The BackgroundRemovalBasics-WPF sample has some simple logic that selects the user nearest the sensor, and then continues to track the same user until they are no longer visible in the scene.

    private void ChooseSkeleton()
    {
        var isTrackedSkeltonVisible = false;
        var nearestDistance = float.MaxValue;
        var nearestSkeleton = 0;
     
        foreach (var skel in this.skeletons)
        {
            if (null == skel)
            {
                 continue;
            }
     
            if (skel.TrackingState != SkeletonTrackingState.Tracked)
            {
                continue;
            }
     
            if (skel.TrackingId == this.currentlyTrackedSkeletonId)
            {
                isTrackedSkeltonVisible = true;
                break;
            }
     
            if (skel.Position.Z < nearestDistance)
            {
                nearestDistance = skel.Position.Z;
                nearestSkeleton = skel.TrackingId;
            }
        }
     
        if (!isTrackedSkeltonVisible && nearestSkeleton != 0)
        {
            this.backgroundRemovedColorStream.SetTrackedPlayer(nearestSkeleton);
            this.currentlyTrackedSkeletonId = nearestSkeleton;
        }
    }

    Wait, only one person?

    If you wanted to select more than one person from the scene to appear in the foreground, it would seem that you’re out of luck, because the BackgroundRemovedColorStream’s SetTrackedPlayer method accepts only one tracking ID. But you can work around this limitation by running two separate instances of the stream, and sending each one a different tracking ID. Each of these streams will produce a separate color image, containing one of the users. These images can then be combined into a single image, or used separately, depending on your application’s needs.

    Wait, only two people?

    In the most straightforward implementation of the multiple stream approach, you’d be limited to tracking just two people, due to an inherent limitation in the skeleton tracking capability of Kinect for Windows. Only two skeletons at a time can be tracked with full joint-level fidelity. The joint positions are required by the background removal implementation in order to perform its job accurately.

    However, there is an additional trick we can apply, to escape the two-skeleton limit. This trick relies on an assumption that the people in the scene will not be moving at extremely high velocities (generally a safe bet). If a particular skeleton is not fully tracked for a frame or two, we can instead reuse the most recent frame in which that skeleton actually was fully tracked. Since the skeleton tracking API lets us choose which two skeletons to track at full fidelity, we can choose a different pair of skeletons each frame, cycling through up to six skeletons we wish to track, over three successive frames.

    Each additional instance of BackgroundRemovedColor stream will place increased demands on CPU and memory. Depending on your own application’s needs and your hardware configuration, you may need to dial back the number of simultaneous users you can process in this way.

    Wait, only six people?

    Demanding, aren’t we? Sorry, the Kinect for Windows skeleton stream can monitor at most six people simultaneously (two at full fidelity, and four at lower fidelity). This is a hard limit.

    Introducing a multi-user background removal sample

    We’ve created a new sample application, called BackgroundRemovalMultiUser-WPF, to demonstrate how to use the technique described above to perform background removal on up to six people. We started with the code from the BackgroundRemovalBasics-WPF sample, and changed it to support multiple streams, one per user. The output from each stream is then overlaid on the backdrop image.

    BackgroundRemovalMultiUser

    Factoring the code: TrackableUser

    The largest change to the original sample was refactoring the application code that interacts the BackgroundRemovedColorStream, so that we can have multiple copies of it running simultaneously. This code, in the new sample, resides in a new class named TrackableUser. Let’s take a brief tour of the interesting parts of this class.

    The application can instruct TrackableUser to track a specific user by setting the TrackingId property appropriately.

    public int TrackingId
    {
        get
        {
            return this.trackingId;
        }
     
        set
        {
            if (value != this.trackingId)
            {
                if (null != this.backgroundRemovedColorStream)
                {
                    if (InvalidTrackingId != value)
                    {
                        this.backgroundRemovedColorStream.SetTrackedPlayer(value);
                        this.Timestamp = DateTime.UtcNow;
                    }      
                    else
                    {
                        // Hide the last frame that was received for this user.
                        this.imageControl.Visibility = Visibility.Hidden;      
                        this.Timestamp = DateTime.MinValue;
                    }      
                }
     
                this.trackingId = value;
            }
        }
    }

    The Timestamp property indicates when the TrackingId was most recently set to a valid value. We’ll see later how this property is used by the sample application’s user-selection logic.

    public DateTime Timestamp { get; private set; }

    Whenever the application is notified that the default Kinect sensor has changed (at startup time, or when the hardware is plugged in or unplugged), it passes this information along to each TrackableUser by calling OnKinectSensorChanged. The TrackableUser, in turn, sets up or tears down its BackgroundRemovedColorStream accordingly.

    public void OnKinectSensorChanged(KinectSensor oldSensor, KinectSensor newSensor)
    {
        if (null != oldSensor)
        {
            // Remove sensor frame event handler.
            oldSensor.AllFramesReady -= this.SensorAllFramesReady;
     
            // Tear down the BackgroundRemovedColorStream for this user.
            this.backgroundRemovedColorStream.BackgroundRemovedFrameReady -=
                 this.BackgroundRemovedFrameReadyHandler;
            this.backgroundRemovedColorStream.Dispose();
            this.backgroundRemovedColorStream = null;
            this.TrackingId = InvalidTrackingId;
        }
     
        this.sensor = newSensor;
     
        if (null != newSensor)
        {
            // Setup a new BackgroundRemovedColorStream for this user.
            this.backgroundRemovedColorStream = new BackgroundRemovedColorStream(newSensor);
            this.backgroundRemovedColorStream.BackgroundRemovedFrameReady +=
                this.BackgroundRemovedFrameReadyHandler;
            this.backgroundRemovedColorStream.Enable(
                newSensor.ColorStream.Format,
                newSensor.DepthStream.Format);
     
            // Add an event handler to be called when there is new frame data from the sensor.
            newSensor.AllFramesReady += this.SensorAllFramesReady;
        }
    }

    Each time the Kinect sensor produces a matched set of depth, color, and skeleton frames, we forward each frame’s data along to the BackgroundRemovedColorStream.

    private void SensorAllFramesReady(object sender, AllFramesReadyEventArgs e)
    {
        ...
            if (this.IsTracked)
            {
                using (var depthFrame = e.OpenDepthImageFrame())
                {      
                    if (null != depthFrame)
                    {
                         // Process depth data for background removal.
                         this.backgroundRemovedColorStream.ProcessDepth(
                             depthFrame.GetRawPixelData(),
                             depthFrame.Timestamp);
                    }
                }
     
                using (var colorFrame = e.OpenColorImageFrame())      
                {      
                    if (null != colorFrame)      
                    {
                        // Process color data for background removal.
                        this.backgroundRemovedColorStream.ProcessColor(
                            colorFrame.GetRawPixelData(),
                            colorFrame.Timestamp);      
                    }
                }
     
                using (var skeletonFrame = e.OpenSkeletonFrame())
                {
                    if (null != skeletonFrame)
                    {
                        // Save skeleton frame data for subsequent processing.
                        CopyDataFromSkeletonFrame(skeletonFrame);
     
                        // Locate the most recent data in which this user was fully tracked.
                        bool isUserPresent = UpdateTrackedSkeletonsArray();
     
                        // If we have an array in which this user is fully tracked,
                        // process the skeleton data for background removal.
                        if (isUserPresent && null != this.skeletonsTracked)
                        {
                            this.backgroundRemovedColorStream.ProcessSkeleton(
                                this.skeletonsTracked,
                                skeletonFrame.Timestamp);
                         }
                    }
                }
            }
        ...
    }

    The UpdateTrackedSkeletonsArray method implements the logic to reuse skeleton data from an older frame when the newest frame contains the user’s skeleton, but not in a fully-tracked state. It also informs the caller whether the user with the requested tracking ID is still present in the scene.

    private bool UpdateTrackedSkeletonsArray()
    {
        // Determine if this user is still present in the scene.
        bool isUserPresent = false;
        foreach (var skeleton in this.skeletonsNew)
        {
            if (skeleton.TrackingId == this.TrackingId)
             {
                isUserPresent = true;
                if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
                 {
                    // User is fully tracked: save the new array of skeletons,
                    // and recycle the old saved array for reuse next time.
                    var temp = this.skeletonsTracked;
                    this.skeletonsTracked = this.skeletonsNew;
                    this.skeletonsNew = temp;
                }
     
                break;
            }
        }
     
        if (!isUserPresent)
        {
            // User has disappeared; stop trying to track.
            this.TrackingId = TrackableUser.InvalidTrackingId;
        }
     
        return isUserPresent;
    }

    Whenever the BackgroundRemovedColorStream produces a frame, we copy its BGRA data to the bitmap that is the underlying Source for an Image element in the MainWindow. This causes the updated frame to appear within the application’s window, overlaid on the background image.

    private void BackgroundRemovedFrameReadyHandler(
        object sender,
        BackgroundRemovedColorFrameReadyEventArgs e)
    {
        using (var backgroundRemovedFrame = e.OpenBackgroundRemovedColorFrame())
        {
             if (null != backgroundRemovedFrame && this.IsTracked)
             {
                 int width = backgroundRemovedFrame.Width;
                 int height = backgroundRemovedFrame.Height;
     
                 WriteableBitmap foregroundBitmap =
                    this.imageControl.Source as WriteableBitmap;
     
                // If necessary, allocate new bitmap. Set it as the source of the Image
                // control.
                if (null == foregroundBitmap ||
                    foregroundBitmap.PixelWidth != width ||
                    foregroundBitmap.PixelHeight != height)
                {
                    foregroundBitmap = new WriteableBitmap(
                        width,      
                        height,
                        96.0,
                        96.0,      
                        PixelFormats.Bgra32,
                        null);
     
                    this.imageControl.Source = foregroundBitmap;
                }
     
                // Write the pixel data into our bitmap.
                foregroundBitmap.WritePixels(
                    new Int32Rect(0, 0, width, height),
                    backgroundRemovedFrame.GetRawPixelData(),
                    width * sizeof(uint),
                    0);
     
                // A frame has been delivered; ensure that it is visible.
                this.imageControl.Visibility = Visibility.Visible;
            }
        }
    }

    Limiting the number of users to track

    As mentioned earlier, the maximum number of trackable users may have a practical limit, depending on your hardware. To specify the limit, we define a constant in the MainWindow class:

    private const int MaxUsers = 6;

    You can modify this constant to have any value from 2 to 6. (Values larger than 6 are not useful, as Kinect for Windows does not track more than 6 users.)

    Selecting users to track: The User View

    We want to provide a convenient way to choose which users will be tracked for background removal. To do this, we present a view of the detected users in a small inset. By clicking on the users displayed in this inset, we can select which of those users are associated with our TrackableUser objects, causing them to be included in the foreground.

    UserView

    We update the user view each time a depth frame is received by the sample’s main window.

    private void UpdateUserView(DepthImageFrame depthFrame)
    {
        ...
        // Store the depth data.
        depthFrame.CopyDepthImagePixelDataTo(this.depthData);      
        ...
        // Write the per-user colors into the user view bitmap, one pixel at a time.
        this.userViewBitmap.Lock();
       
        unsafe
        {
            uint* userViewBits = (uint*)this.userViewBitmap.BackBuffer;
            fixed (uint* userColors = &this.userColors[0])
            {      
                // Walk through each pixel in the depth data.
                fixed (DepthImagePixel* depthData = &this.depthData[0])      
                {
                    DepthImagePixel* depthPixel = depthData;
                    DepthImagePixel* depthPixelEnd = depthPixel + this.depthData.Length;
                    while (depthPixel < depthPixelEnd)
                    {
                        // Lookup a pixel color based on the player index.
                        // Store the color in the user view bitmap's buffer.
                        *(userViewBits++) = *(userColors + (depthPixel++)->PlayerIndex);
                    }
                }
            }
        }
     
        this.userViewBitmap.AddDirtyRect(new Int32Rect(0, 0, width, height));
        this.userViewBitmap.Unlock();
    }

    This code fills the user view bitmap with solid-colored regions representing each of the detected users, as distinguished by the value of the PlayerIndex field at each pixel in the depth frame.

    The main window responds to a mouse click within the user view by locating the corresponding pixel in the most recent depth frame, and using its PlayerIndex to look up the user’s TrackingId in the most recent skeleton data. The TrackingID is passed along to the ToggleUserTracking method, which will attempt to toggle the tracking of that TrackingID between the tracked and untracked states.

    private void UserViewMouseLeftButtonDown(object sender, MouseButtonEventArgs e)
    {
        // Determine which pixel in the depth image was clicked.
        Point p = e.GetPosition(this.UserView);
        int depthX =
            (int)(p.X * this.userViewBitmap.PixelWidth / this.UserView.ActualWidth);
        int depthY =
            (int)(p.Y * this.userViewBitmap.PixelHeight / this.UserView.ActualHeight);
        int pixelIndex = (depthY * this.userViewBitmap.PixelWidth) + depthX;
        if (pixelIndex >= 0 && pixelIndex < this.depthData.Length)
        {
            // Find the player index in the depth image. If non-zero, toggle background
            // removal for the corresponding user.
            short playerIndex = this.depthData[pixelIndex].PlayerIndex;
            if (playerIndex > 0)
            {      
                // playerIndex is 1-based, skeletons array is 0-based, so subtract 1.
                this.ToggleUserTracking(this.skeletons[playerIndex - 1].TrackingId);
            }
        }
    }

    Picking which users will be tracked

    When MaxUsers is less than 6, we need some logic to handle a click on an untracked user, and we are already tracking the maximum number of users. We choose to stop tracking the user who was tracked earliest (based on timestamp), and start tracking the newly chosen user immediately. This logic is implemented in ToggleUserTracking.

    private void ToggleUserTracking(int trackingId)
    {
        if (TrackableUser.InvalidTrackingId != trackingId)
        {
            DateTime minTimestamp = DateTime.MaxValue;
            TrackableUser trackedUser = null;
            TrackableUser staleUser = null;
     
            // Attempt to find a TrackableUser with a matching TrackingId.
            foreach (var user in this.trackableUsers)
            {
                if (user.TrackingId == trackingId)
                {
                    // Yes, this TrackableUser has a matching TrackingId.
                    trackedUser = user;
                }
     
                // Find the "stale" user (the trackable user with the earliest timestamp).
                if (user.Timestamp < minTimestamp)
                {      
                    staleUser = user;
                    minTimestamp = user.Timestamp;
                }
            }
     
            if (null != trackedUser)
            {
                // User is being tracked: toggle to not tracked.
                trackedUser.TrackingId = TrackableUser.InvalidTrackingId;
            }
            else
            {      
                // User is not currently being tracked: start tracking, by reusing
                // the "stale" trackable user.
                staleUser.TrackingId = trackingId;
            }
        }
    }

    Once we’ve determined which users will be tracked by the TrackableUser objects, we need to ensure that those users are being targeted for tracking by the skeleton stream on a regular basis (at least once every three frames). UpdateChosenSkeletons implements this using a round-robin scheme.

    private void UpdateChosenSkeletons()
    {
        KinectSensor sensor = this.sensorChooser.Kinect;
        if (null != sensor)
        {
            // Choose which of the users will be tracked in the next frame.
            int trackedUserCount = 0;
            for (int i = 0; i < MaxUsers && trackedUserCount < this.trackingIds.Length; ++i)
             {
                // Get the trackable user for consideration.
                var trackableUser = this.trackableUsers[this.nextUserIndex];
                if (trackableUser.IsTracked)
                {
                    // If this user is currently being tracked, copy its TrackingId to the
                    // array of chosen users.
                    this.trackingIds[trackedUserCount++] = trackableUser.TrackingId;
                }
     
                // Update the index for the next user to be considered.
                this.nextUserIndex = (this.nextUserIndex + 1) % MaxUsers;
            }      
     
            // Fill any unused slots with InvalidTrackingId.
            for (int i = trackedUserCount; i < this.trackingIds.Length; ++i)
            {
                this.trackingIds[i] = TrackableUser.InvalidTrackingId;
            }
     
            // Pass the chosen tracking IDs to the skeleton stream.
            sensor.SkeletonStream.ChooseSkeletons(this.trackingIds[0], this.trackingIds[1]);
        }
    }

    Combining multiple foreground images

    Now that we can have multiple instances of TrackableUser, each producing a background-removed image of a user, we need to combine those images on-screen. We do this by creating multiple overlapping Image elements (one per trackable user), each parented by the MaskedColorImages element, which itself is a sibling of the Backdrop element. Wherever the background has been removed from each image, the backdrop image will show through.

    As each image is created, we associate it with its own TrackableUser.

    public MainWindow()
    {
        ...
        // Create one Image control per trackable user.
        for (int i = 0; i < MaxUsers; ++i)
        {
            Image image = new Image();
            this.MaskedColorImages.Children.Add(image);
            this.trackableUsers[i] = new TrackableUser(image);
        }
    }

    To capture and save a snapshot of the current composited image, we create two VisualBrush objects, one for the Backdrop, and one for MaskedColorImages. We draw rectangles with each of these brushes, into a bitmap, and then write the bitmap to a file.

    private void ButtonScreenshotClick(object sender, RoutedEventArgs e)
    {
        ...
        var dv = new DrawingVisual();
        using (var dc = dv.RenderOpen())
        {
            // Render the backdrop.
            var backdropBrush = new VisualBrush(Backdrop);      
            dc.DrawRectangle(
                backdropBrush,      
                null,
                new Rect(new Point(), new Size(colorWidth, colorHeight)));
     
            // Render the foreground.
            var colorBrush = new VisualBrush(MaskedColorImages);      
            dc.DrawRectangle(
                colorBrush,
                null,
                new Rect(new Point(), new Size(colorWidth, colorHeight)));
        }
     
        renderBitmap.Render(dv);
        ...
    }

    Summary

    While the BackgroundRemovedColorStream is limited to tracking only one user at a time, the new BackgroundRemovalMultiUser-WPF sample demonstrates that you can run multiple stream instances to track up to six users simultaneously. When using this technique, you should consider – and measure – the increased resource demands (CPU and memory) that the additional background removal streams will have, and determine for yourself how many streams your configuration can handle.

    We hope that this sample opens up new possibilities for using background removal in your own applications.

    John Elsbree
    Principal Software Development Engineer
    Kinect for Windows

  • Kinect for Windows Product Blog

    Enabling retailers to drive business in new, innovative ways

    • 1 Comments

    It is essential for retailers to find ways to attract and connect with customers—and to stand out from the competition. To help them do so, the industry is grappling with how to build interactive experiences at scale that engage and truly help customers make satisfying purchasing decisions while also using retail space strategically to provide the best possible experience.

    To get a deeper understanding of what this means, we did extensive first-hand research with dozens of retailers and big brands . We learned how retailers think about implementing natural user interface technology (NUI) and how they see these experiences helping propel their businesses forward.

    What we heard:

    • NUI offers one of the best ways to interact with large screens in stores.
    • Exploring virtual merchandise by gesturing naturally is easy, engaging, and fun for customers.
    • Immersive experiences can improve the purchase process and are an impactful way to market and sell to customers.

    We agree. And we believe it’s important for us to bring these findings back into Kinect for Windows by delivering features that facilitate the best retail innovations. To help support this, we recently released an update to our SDK (Kinect for Windows SDK 1.8) that includes new features specifically designed to enable the development of higher-quality digital signage applications. Key features include the ability to remove backgrounds, an adaptive UI sample, and an HTML interaction sample.

    To help illustrate what this all means, our team developed the following three videos. They show how Kinect for Windows experiences can help retailers attract new customers and engage customers in deeper ways. They offer examples of ways that digital signs powered by Kinect for Windows can draw customers into the business—making it possible for retailers to share offerings, cross-sell and upsell merchandise, bring the “endless aisle” concept to life, and, ultimately, inspire shoppers to purchase. And all of this is accomplished in a beautiful way that feels natural to the customer.


    This enjoyable and easy-to-use application engages new customers by helping them understand and experience the resort, while also providing them with an offer to receive a discount on future stays.



    This digital sign application is powered by Kinect for Windows and makes it easy for shoppers to engage with products, try them on, and purchase them. It also incorporates social media for additional marketing reach.



    This last video demonstrates the ability to welcome a person or people into an immersive real-time experience with the store’s merchandise. It demonstrates the Kinect Fusion scanning features that can be used as part of this and many other retail experiences.


    These videos highlight some of the core benefits retailers tell us Kinect for Windows offers them:

    • Capture a customer's attention
    • Educate customers about products
    • Move a customer through the decision-making cycle to close a sale

    Kinect for Windows does this by optimizing interactions with existing large screens and enhancing the overall retail space—using gesture and voice control, background removal, proximity-based interface, and more.

    So many companies have already created exciting retail experiences with Kinect for Windows: Bloomingdales, Build-a-Bear, Coca-Cola, Mattel, Nissan, Pepsi, and others. We are excited to see the new ways that Kinect for Windows is being applied in retail. The dramatic shifts in consumer shopping behaviors, preferences, and expectations in retail today are driving innovation to new levels. The possibilities are endless when we use the latest technology to put the customer at the heart of the business.

    Kinect for Windows Team

    Key links

     

     

     

  • Kinect for Windows Product Blog

    Using Kinect Webserver to Expose Speech Events to Web Clients

    • 2 Comments

    In our 1.8 release, we made it easy to create Kinect-enabled HTML5 web applications. This is possible because we added an extensible webserver for Kinect data along with a Javascript API which gives developers some great functionality right out of the box:

    • Interactions : hand pointer movements, press and grip events useful for controlling a cursor, buttons and other UI
    • User Viewer: visual representation of the users currently visible to Kinect sensor. Uses different colors to indicate different user states
    • Background Removal: “Green screen” image stream for a single person at a time
    • Skeleton: standard skeleton data such as tracking state, joint positions, joint orientations, etc.
    • Sensor Status: Events corresponding to sensor connection/disconnection


    This is enough functionality to write a compelling application but it doesn’t represent the whole range of Kinect sensor capabilities. In this article I will show you step-by-step how to extend the WebserverBasics-WPF sample (see C# code in CodePlex  or documentation in MSDN) available from Kinect Toolkit Browser to enable web applications to respond to speech commands, where the active speech grammar is configurable by the web client.

    A solution containing the full, final sample code is available on CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).

    Getting Started

    To follow along step-by-step:

    1. If you haven’t done so already, install the Kinect for Windows v1.8 SDK and Toolkit
    2. Launch the Kinect Toolkit Browser
    3. Install WebserverBasics-WPF sample in a local directory
    4. Open the WebserverBasics-WPF.sln solution in Visual Studio
    5. Go to line 136 in MainWindow.xaml.cs file


    You should see the following TODO comment which describes exactly how we’re going to expose speech recognition functionality:

    //// TODO: Optionally add factories here for custom handlers:
    ////       this.webserver.SensorStreamHandlerFactories.Add(new MyCustomSensorStreamHandlerFactory());
    //// Your custom factory would implement ISensorStreamHandlerFactory, in which the
    //// CreateHandler method would return a class derived from SensorStreamHandlerBase
    //// which overrides one or more of its virtual methods.

    We will replace this comment with the functionality described below.

    So, What Functionality Are We Implementing?

    image

    More specifically, on the server side we will:

    1. Create a speech recognition engine
    2. Bind the engine to a Kinect sensor’s audio stream whenever sensor gets connected/disconnected
    3. Allow a web client to specify the speech grammar to be recognized
    4. Forward speech recognition events generated by engine to web client
    5. Registering a factory for the speech stream handler with the Kinect webserver


    This will be accomplished by creating a class called SpeechStreamHandler, derived fromMicrosoft.Samples.Kinect.Webserver.Sensor.SensorStreamHandlerBase. SensorStreamHandlerBase is an implementation of ISensorStreamHandler that frees us from writing boilerplate code. ISensorStreamHandler is an abstraction that gets notified whenever a Kinect sensor gets connected/disconnected, when color, depth and skeleton frames become available and when web clients request to view or update configuration values. In response, our speech stream handler will send event messages to web clients.

    On the web client side we will:

    1. Configure speech recognition stream (enable and specify the speech grammar to be recognized)
    2. Modify the web UI in response to recognized speech events


    All new client-side code is in SamplePage.html

    Creating a Speech Recognition Engine

    In the constructor for SpeechStreamHandler you’ll see the following code:

    RecognizerInfo ri = GetKinectRecognizer();
    if (ri != null)
    {
        this.speechEngine = new SpeechRecognitionEngine(ri.Id);

        if (this.speechEngine != null)
        {
            // disable speech engine adaptation feature
            this.speechEngine.UpdateRecognizerSetting("AdaptationOn", 0);
            this.speechEngine.UpdateRecognizerSetting("PersistedBackgroundAdaptation", 0);
            this.speechEngine.AudioStateChanged += this.AudioStateChanged;
            this.speechEngine.SpeechRecognitionRejected += this.SpeechRecognitionRejected;
            this.speechEngine.SpeechRecognized += this.SpeechRecognized;
        }
    }

    This code snippet will be familiar if you’ve looked at some of our other speech samples such as SpeechBasics-WPF. Basically, we’re getting the metadata corresponding to the Kinect acoustic model (GetKinectRecognizer is hardcoded to use English language acoustic model in this sample, but this can be changed by installing additional language packs and modifying GetKinectRecognizer to look for the desired culture name), using it to create a speech engine, turning off some settings related to audio adaptation feature (which makes speech engine better suited for long-running scenarios) and registering to receive events when speech is recognized or rejected, or when audio state (e.g.: silence vs someone speaking) changes.

    Binding the Speech Recognition Engine to a Kinect Sensor’s Audio Stream

    In order to do this, we override SensorStreamHandlerBase’s implementation of OnSensorChanged, so we can find out about sensors connecting and disconnecting.

    public override void OnSensorChanged(KinectSensor newSensor)
    {
        base.OnSensorChanged(newSensor);
        if (this.sensor != null)
        {
            if (this.speechEngine != null)
            {
                this.StopRecognition();
                this.speechEngine.SetInputToNull();
                this.sensor.AudioSource.Stop();
            }
        }

        this.sensor = newSensor;
        if (newSensor != null)
        {
            if (this.speechEngine != null)
            {
                this.speechEngine.SetInputToAudioStream(
                    newSensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
                this.StartRecognition(this.grammar);
            }
        }
    }

    The main thing we need to do here is Start the AudioSource of the newly connected Kinect sensor in order to get an audio stream that we can hook up as the input to the speech engine. We also need to specify the format of the audio stream, which is a single-channel, 16-bits per sample, Pulse Code Modulation (PCM) stream, sampled at 16kHz.

    Allow Web Clients to Specify Speech Grammar

    We will let clients send us the whole speech grammar that they want recognized, as XML that conforms to the W3C Speech Recognition Grammar Specification format version 1.0. To do this, we will expose a configuration property called “grammarXml”.

    Let’s backtrack a little bit because earlier we glossed over the bit of code in the SpeechStreamHandler constructor where we register the handlers for getting and setting stream configuration properties:

    this.AddStreamConfiguration(SpeechEventCategory, new StreamConfiguration(this.GetProperties, this.SetProperty));

    Now, in the SetProperty method we call LoadGrammarXml method whenever a client sets the “grammarXml” property:

    case GrammarXmlPropertyName:
        this.LoadGrammarXml((string)propertyValue);
        break;

    And in the LoadGrammarXml method we do the real work of updating the speech grammar:

    private void LoadGrammarXml(string grammarXml)
    {
        this.StopRecognition();

        if (!string.IsNullOrEmpty(grammarXml))
        {
            using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(grammarXml)))
            {
                Grammar newGrammar;
                try
                {
                    newGrammar = new Grammar(memoryStream);
                }
                catch (ArgumentException e)
                {
                    throw new InvalidOperationException("Requested grammar might not contain a root rule", e);
                }
                catch (FormatException e)
                {
                    throw new InvalidOperationException("Requested grammar was specified with an invalid format", e);
                }

                this.StartRecognition(newGrammar);
            }
        }
    }

    We first stop the speech recognition because we don’t yet know if the specified grammar is going to be valid or not, then we try to create a new Microsoft.Speech.Recognition.Grammar object from the specified property value. If the property value does not represent a valid grammar, newGrammar variable will remain null. Finally, we call StartRecognition method, which loads the grammar into the speech engine (if grammar is valid), and tells the speech engine to start recognizing and keep recognizing speech phrases until we explicitly tell it to stop.

    private void StartRecognition(Grammar g)
    {
        if ((this.sensor != null) && (g != null))
        {
            this.speechEngine.LoadGrammar(g);
            this.speechEngine.RecognizeAsync(RecognizeMode.Multiple);
        }

        this.grammar = g;
    }

    Send Speech Recognition Events to Web Client

    When we created the speech recognition engine, we registered for 3 events: AudioStateChanged, SpeechRecognized and SpeechRecognitionRejected. Whenever any of these events happen we just want to forward the event to the web client. Since the code ends up being very similar, we will focus on the SpeechRecognized event handler:

    private async void SpeechRecognized(object sender, SpeechRecognizedEventArgs args)
    {
        var message = new RecognizedSpeechMessage(args);
        await this.ownerContext.SendEventMessageAsync(message);
    }

    To send messages to web clients we use functionality exposed by ownerContext, which is an instance of the SensorStreamHandlerContext class which was passed to us in the constructor. The messages are sent to clients using a web socket channel, and could be

    • Stream messages: Messages that are generated continuously, at a predictable rate (e.g.: 30 skeleton stream frames are generated every second), where the data from each message replaces the data from the previous message. If we drop one of these messages every so often there is no major consequence because another will arrive shortly thereafter with more up-to-date data, so the framework might decide to drop one of these messages if it detects a bottleneck in the web socket channel.
    • Event messages: Messages that are generated sporadically, at an unpredictable rate, where each event represents an isolated incident. As such, it is not desirable to drop any one of these kind of messages.


    Given the nature of speech recognition, we chose to communicate with clients using event messages. Specifically, we created the RecognizedSpeechMessage class, which is a subclass of EventMessage that serves as a representation of SpeechRecognizedEventArgs which can be easily serialized as JSON and follows the JavaScript naming conventions.

    You might have noticed the usage of the “async” and “await” keywords in this snippet. They are described in much more detail in MSDN but, in summary, they enable an asynchronous programming model so that long-running operations don’t block thread execution while not necessarily using more than one thread. The Kinect webserver uses a single thread to schedule tasks so the consequence for you is that ISensorStreamHandler implementations don’t need to be thread-safe, but should be aware of potential re-entrancy due to asynchronous behavior.

    Registering a Speech Stream Handler Factory with the Kinect Webserver

    The Kinect webserver can be started, stopped and restarted, and each time it is started it creates ISensorStreamHandler instances in a thread dedicated to Kinect data handling, which is the only thread that ever calls these objects. To facilitate this behavior, the server doesn’t allow for direct registration of ISensorStreamHandler instances and instead expects ISensorStreamHandlerFactory instances to be registered in KinectWebserver.SensorStreamHandlerFactories property.

    For the purposes of this sample, we declared a private factory class that is exposed as a static singleton instance directly from the SpeechStreamHandler class:

    public class SpeechStreamHandler : SensorStreamHandlerBase, IDisposable
    {
        ...

        static SpeechStreamHandler()
        {
            Factory = new SpeechStreamHandlerFactory();
        }
        ...

        public static ISensorStreamHandlerFactory Factory { get; private set; }
        ...

        private class SpeechStreamHandlerFactory : ISensorStreamHandlerFactory
        {
            public ISensorStreamHandler CreateHandler(SensorStreamHandlerContext context)
            {
                return new SpeechStreamHandler(context);
            }
        }
    }

    Finally, back in line 136 of MainWindow.xaml.cs, we replace the TODO comment mentioned above with

    // Add speech stream handler to the list of available handlers, so web client
    // can configure speech grammar and receive speech events
    this.webserver.SensorStreamHandlerFactories.Add(SpeechStreamHandler.Factory);

    Configure Speech Recognition Stream in Web Client

    The sample web client distributed with WebserverBasics-WPF is already configuring a couple of other streams in the function called updateUserState in SamplePage.html, so we will add the following code to this function:

    var speechGrammar = '\
    <grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" root="DefaultRule" xmlns="\' data-mce-href=">\'>\'>\'>\'>http://www.w3.org/2001/06/grammar">\
        <rule id="DefaultRule" scope="public">\
            <one-of>\
                <item>\
                    <tag>SHOW</tag>\
                    <one-of><item>Show Panel</item><item>Show</item></one-of>\
                </item>\
                <item>\
                    <tag>HIDE</tag>\
                    <one-of><item>Hide Panel</item><item>Hide</item></one-of>\
                </item>\
            </one-of>\
        </rule>\
    </grammar>';

    immediateConfig["speech"] = { "enabled": true, "grammarXml": speechGrammar };

    This code enables the speech stream and specifies a grammar that

    • triggers a recognition event with “SHOW” as semantic value whenever a user utters the phrases “Show” or “Show Panel”
    • triggers a recognition event with “HIDE” as semantic value whenever a user utters the phrases “Hide” or “Hide Panel”

    Modify the web UI in response to recognized speech events

    The sample web client already registers an event handler function, so we just need to update it to respond to speech events in addition to user state events:

    function onSpeechRecognized(recognizedArgs) {
        if (recognizedArgs.confidence > 0.7) {
            switch (recognizedArgs.semantics.value) {
                case "HIDE":
                    setChoosePanelVisibility(false);
                    break;
                case "SHOW":
                    setChoosePanelVisibility(true);
                    break;
            }
        }
    }
    ...

    sensor.addEventHandler(function (event) {
        switch (event.category) {
            ...            
            case "speech":
                switch (event.eventType) {
                    case "recognized":
                        onSpeechRecognized(event.recognized);
                        break;
                }
                break;
        }
    });

    Party Time!

    At this point you can rebuild the updated solution and run it to see the server UI. From this UI you can click on the link that reads “Open sample page in default browser” and play with the sample UI. It will look the same as before the code changes, but will respond to the speech phrases “Show”, “Show Panel”, “Hide” and “Hide Panel”. Now try changing the grammar to include more phrases and update the UI in different ways in response to speech events.

    Happy coding!

    Additional Resources

  • Kinect for Windows Product Blog

    Updated SDK, with HTML5, Kinect Fusion improvements, and more

    • 9 Comments

    I am pleased to announce that we released the Kinect for Windows software development kit (SDK) 1.8 today. This is the fourth update to the SDK since we first released it commercially one and a half years ago. Since then, we’ve seen numerous companies using Kinect for Windows worldwide, and more than 700,000 downloads of our SDK.

    We build each version of the SDK with our customers in mind—listening to what the developer community and business leaders tell us they want and traveling around the globe to see what these dedicated teams do, how they do it, and what they most need out of our software development kit.

    The new background removal API is useful for advertising, augmented reality gaming, training and simulation, and more.
    The new background removal API is useful for advertising, augmented reality gaming, training
     and simulation, and more.

    Kinect for Windows SDK 1.8 includes some key features and samples that the community has been asking for, including:

    • New background removal. An API removes the background behind the active user so that it can be replaced with an artificial background. This green-screening effect was one of the top requests we’re heard in recent months. It is especially useful for advertising, augmented reality gaming, training and simulation, and other immersive experiences that place the user in a different virtual environment.
    • Realistic color capture with Kinect Fusion. A new Kinect Fusion API scans the color of the scene along with the depth information so that it can capture the color of the object along with its three-dimensional (3D) model. The API also produces a texture map for the mesh created from the scan. This feature provides a full fidelity 3D model of a scan, including color, which can be used for full color 3D printing or to create accurate 3D assets for games, CAD, and other applications.
    • Improved tracking robustness with Kinect Fusion. This algorithm makes it easier to scan a scene. With this update, Kinect Fusion is better able to maintain its lock on the scene as the camera position moves, yielding a more reliable and consistent scanning.
    • HTML interaction sample. This sample demonstrates implementing Kinect-enabled buttons, simple user engagement, and the use of a background removal stream in HTML5. It allows developers to use HTML5 and JavaScript to implement Kinect-enabled user interfaces, which was not possible previously—making it easier for developers to work in whatever programming languages they prefer and integrate Kinect for Windows into their existing solutions.
    • Multiple-sensor Kinect Fusion sample. This sample shows developers how to use two sensors simultaneously to scan a person or object from both sides—making it possible to construct a 3D model without having to move the sensor or the object! It demonstrates the calibration between two Kinect for Windows sensors, and how to use Kinect Fusion APIs with multiple depth snapshots. It is ideal for retail experiences and other public kiosks that do not include having an attendant available to scan by hand.
    • Adaptive UI sample. This sample demonstrates how to build an application that adapts itself depending on the distance between the user and the screen—from gesturing at a distance to touching a touchscreen. The algorithm in this sample uses the physical dimensions and positions of the screen and sensor to determine the best ergonomic position on the screen for touch controls as well as ways the UI can adapt as the user approaches the screen or moves further away from it. As a result, the touch interface and visual display adapt to the user’s position and height, which enables users to interact with large touch screen displays comfortably. The display can also be adapted for more than one user.

    We also have updated our Human Interface Guidelines (HIG) with guidance to complement the new Adaptive UI sample, including the following:

    Design a transition that reveals or hides additional information without obscuring the anchor points in the overall UI.
    Design a transition that reveals or hides additional information
    without obscuring the anchor points in the overall UI.

    Design UI where users can accomplish all tasks for each goal within a single range.
    Design UI where users can accomplish all tasks for each goal
    within a single range.

    My team and I believe that communicating naturally with computers means being able to gesture and speak, just like you do when communicating with people. We believe this is important to the evolution of computing, and are committed to helping this future come faster by giving our customers the tools they need to build truly innovative solutions. There are many exciting applications being created with Kinect for Windows, and we hope these new features will make those applications better and easier to build. Keep up the great work, and keep us posted!

    Bob Heddle, Director
    Kinect for Windows

    Key links

  • Kinect for Windows Product Blog

    Announcing the 1.8 SDK

    • 0 Comments

    Today we are happy to announce the release of our 1.8 SDK!  Web developers can now bring the interactivity of Kinect for Windows to their world by using our new HTML5/JavaScript app model.  We've also added color to Kinect Fusion (which is mind blowing to see), created a new API for background removal, and added some other improvements and new samples too.  Our product blog has all the deets or, if you are impatient, you can just download the new bits.

    Ben

    @benlower | kinectninja@microsoft.com | mobile: +1 (206) 659-NINJA (6465)

  • Kinect for Windows Product Blog

    Dev Kit Program Update

    • 3 Comments

    Back in June at the Build Conference we announced and started taking applications for our upcoming developer kit program.

    The response & interest we’ve seen has been tremendous: thousands of developers from 74 different countries applied.

    Mea Culpa

    When we announced the program we said we’d start notifying successful applicants in August. Many people interpreted that to mean that we’d be done with notifications in August. I apologize for not being clearer on this. We never intended to have all the notifications done in August. While we did start in August and have notified many developers of their acceptance to the program, there are still many more applicants to be notified.

    Over the coming weeks we will continue to let applicants know if they are admitted to the program, denied admission, or waitlisted (just like college :-)). Every applicant will hear something from us by the end of September. Anyone who is waitlisted will have a final decision by the end of the year.

    We are still planning to start sending out the developer kits in late November to all program participants.

    Again, apologies for any confusion. Please stay tuned…you will hear something from us soon!

    Ben

    @benlower | kinectninja@microsoft.com | mobile: +1 (206) 659-NINJA (6465)

  • Kinect for Windows Product Blog

    Joshua Blake on Kinect for Windows and the Natural User Interface Revolution (Part 3)

    • 0 Comments

     

    The following blog post was guest authored by Kinect for Windows (K4W) MVP, Joshua Blake. Josh is the Technical Director of the InfoStrat Advanced Technology Group in Washington, D.C. where he and his team work on cutting-edge Kinect and NUI projects for their clients. You can find him on twitter @joshblake or at his blog, http://nui.joshland.org.

    Josh recently recorded several videos for our Kinect for Windows Developer Center. This is the third of three posts he will be contributing this month to the blog.


     

    In part 1, I shared videos covering the core natural user interface concepts and a sample application that I use to control presentations called Kinect PowerPoint Control. In part 2, I shared two more advanced sample applications: Kinect Weather Map and Face Fusion. In this post, I’m going to share videos that show some of the real-life applications that my team and I created for one of our clients. I’ll also provide some additional detail about how and why we created a custom object tracking interaction. These applications put my NUI concepts into action and show what is possible with Kinect for Windows.

     

    Making it fun to learn

    Our client, Kaplan Early Learning Company, sells teaching resources focused on early childhood education. Kaplan approached us with an interest in creating a series of educational applications for preschool and kindergarten-aged children designed to teach one of several core skills such as basic patterns, spelling simple words, shapes, and spatial relationships. While talking to Kaplan, we learned they had a goal of improving student engagement and excitement while making core skills fun to learn.

    We suggested using Kinect for Windows because it would allow the students to not just interact with the activity but also be immersed in virtual worlds and use their bodies and physical objects for interacting. Kaplan loved the idea and we began creating the applications. After a few iterations of design and development, testing with real students, and feedback, we shipped the final builds of four applications to Kaplan earlier this summer. Kaplan is now selling these applications bundled with a Kinect for Windows sensor in their catalog as Kaplan Move-NG.

    The Kinect for Windows team and I created the videos embedded below to discuss our approach to addressing challenges involved in designing these applications and to demonstrate the core parts of three of the Move-NG applications.

     

    Designing early childhood education apps for Kaplan

    In the video below, I discuss InfoStrat’s guiding principles to creating great applications for Kinect as well as some of the specific challenges we faced creating applications that are fun and exciting for young children while being educational and fitting in a classroom environment. In the next section below the video, read on for additional discussion and three more videos showing the actual applications.

     

    Real-world K4W apps: Designing early childhood education apps for Kaplan (7:32)

     

     

     

    One of the key points covered in this video is that when designing a NUI application, we have to consider the context in which the application will be used. In the education space, especially in early childhood education, this context often includes both teachers and students, so we have to design the applications with both types of users in mind. Here are a few of the questions we thought about while designing these apps for Kaplan:

    • When will the teacher use the app and when will the students use the app?
    • Will the teacher be more comfortable using the mouse or the Kinect for specific tasks? Which input device is most appropriate for each task?
    • Will non-technical teachers understand how to set up the space and use the application? Does there need to be a special setup screen to help the teacher configure the classroom space?
    • How will the teachers and students interact while the application is running?
    • How long would it take to give every student a turn in a typical size classroom?
    • What is the social context in the classroom, and what unwritten social behavior rules can we take into account to simplify the application design?
    • Will the user interaction work with both adults and the youngest children?
    • Will the user interaction work across the various ways children respond to visual cues and voice prompts?
    • Is the application fun?
    • Do students across the entire target age group understand what to do with minimal or no additional prompts from the teacher?

     

    And most importantly:

    • Does the design satisfy the educational goals set for the application?

     

    As you can imagine, finding a solution to all of these questions was quite a challenge. We took an iterative approach and tested with real children in the target age range as often as possible. Fortunately, my three daughters are in the target age range so I could do quick tests at home almost daily and get feedback. We also sent early builds to Kaplan to get a broader range of feedback from their educators and additional children.

    In several cases, we created a prototype of a design or interaction that worked well for ourselves as adults, but failed completely when tested with children. Sometimes the problem was the data from the children’s smaller bodies had more noise. Other times the problem was that the children just didn’t understand what they were supposed to do, even with prompting, guidance, or demonstration. It was particularly challenging when a concept worked with older kindergarten kids but was too complex for the youngest of the preschooler age range. In those cases there was a cognitive development milestone in the age range that the design relied upon and we simply had to find another solution. I will share an example of this near the end of this post.

     

    Kaplan Move-NG application and behind-the scenes videos

    The next three videos each cover one of the Kaplan Move-NG applications. The videos introduce the educational goal of the app and show a demonstration of the core interaction. In addition, I discuss the design challenges mentioned above as well as implementation details such as what parts of the Kinect for Windows SDK we used, how we created a particular interaction, or how feedback from student testing affected the application design. These videos should give you a quick overview of the apps as well as a behind-the-scenes view into what went into the designs.  I hope sharing our experience will help you create better applications which incorporate the interactivity and fun of Kinect.

     

    Real-world K4W apps: Kaplan Move-NG Patterns (6:28)

     

     

     

     

     

    Real-world K4W apps: Kaplan Move-NG Where Am I (5:57)

     

     

     

     

     

    Real-world K4W apps: Kaplan Move-NG Word Pop (7:41)

     

     

     

     

    Object tracking as a natural interaction

    The last video above showed Word Pop, which has the unique feature of letting the user spell words by catching letters with a physical basket (or box). In the video, I showed how we created a custom basket tracker by transforming the Kinect depth data. (My technique was inspired by Kyle McDonald’s work at the Art && Code 2011 conference, as shown at 1:43 in his festival demonstration.) Figure 1 shows the basket tracker developer UI as shown in the Word Pop video. In this section, I’m going to give a little more detail on how this basket tracker works and what led to this design.

    fig1 - box tracking
    Figure 1: The basket tracker developer UI used internally during development of Word Pop. The left image in the interface shows the background removed user and basket, with a rectangle drawn around the basket. The right image shows a visualization of how the application is transforming the depth data.

    To find the basket, we excluded the background and user’s torso from the depth image and then applied the Sobel operator. This produces a gradient value representing the curvature at each point. We mark pixels with low curvature as flat pixels, shown in white in figure 1. The curvature threshold value for determining flat pixels was found empirically.

    The outline of the basket is determined by using histograms of flat pixels across the horizontal and vertical dimensions, shown along the top and left edges of the right image in figure 1. The largest continuous area of flat pixels in each dimension is assumed to be the basket. The basket area is expanded slightly, smoothed across frames, and then the application hit tests this area against the letters falling from the sky to determine when the student has caught a letter.

    In testing, we found this implementation to be robust even when the user moves the basket around quickly or holds it out at the end of one arm. In particular, we did not need to depend upon skeleton tracking, which was often interrupted by the basket itself.

    One of our early Word Pop prototypes used hand-based interaction with skeleton tracking, but this was challenging for the youngest children in the target age range to use or understand. For example, given a prompt of “touch the letter M”, my three-year-old would always run to the computer screen to touch the “M” physically rather than moving her mirror image avatar to touch it. On the other hand, my seven-year-old used the avatar without a problem, illustrating the cognitive development milestone challenge I mentioned earlier. When we added the basket, skeleton tracking data became worse, but we could easily track the interactions of even the youngest children. Since “catching” with the basket has only one physical interpretation – using the avatar image – the younger kids started interacting without trouble.

    The basket in Word Pop was a very simple and natural interaction that the children immediately understood. This may seem like a basic point, but it is a perfect example of what makes Kinect unique and important: Kinect lets the computer see and understand our real world, instead of us having to learn and understand the computer. In this case, the Kinect let the children reuse a skill they already had – catching things in baskets – and focus on the fun and educational aspects of the application, rather than being distracted by learning a complex interface.

    I hope you enjoyed a look behind-the-scenes of our design process and seeing how we approached the challenge of designing fun and educational Kinect applications for young children. Thanks to Ben Lower for giving me the opportunity to record the videos in this post and the previous installments. Please feel free to comment or contact me if you have any questions or feedback on anything in this series. (Don’t forget to check out part 1 and part 2 if you haven’t seen those posts and videos already.)

    Thanks for reading (and watching)!

    -Josh

    @joshblake | joshb@infostrat.com | mobile +1 (703) 946-7176 | http://nui.joshland.org

Page 5 of 11 (109 items) «34567»