• Kinect for Windows Product Blog

    Kinect for Windows Academic Pricing Now Available in the US

    • 5 Comments

    Students, teachers, researchers, and other educators have been quick to embrace Kinect’s natural user interface (NUI), which makes it possible to interact with computers using movement, speech, and gestures. In fact, some of the earliest Kinect for Windows applications to emerge were projects done by students, including several at last year’s Imagine Cup.

    One project, from an Imagine Cup team in Italy, created an application for people with severe disabilities that enables them to communicate, learn, and play games on computers using a Kinect sensor instead of a traditional mouse or keyboard. Another innovative Imagine Cup project, done by university students in Russia, used the Kinect natural user interface to fold, rotate, and examine online origami models.

    To encourage students, educators, and academic researchers to continue innovating with Kinect for Windows, special academic pricing on Kinect for Windows sensors is now available in the United States. The academic price is $149.99 through Microsoft Stores.

    If you are an educator or faculty with an accredited school, such as a university, community college, vocational school, or K-12, you can purchase a Kinect for Windows sensor at this price.

    Find out if you qualify, and then purchase online or visit a Microsoft store in your area.

    Kinect for Windows team

  • Kinect for Windows Product Blog

    Using Kinect Background Removal with Multiple Users

    • 5 Comments

    Introduction: Background Removal in Kinect for Windows

    The 1.8 release of the Kinect for Windows Developer Toolkit includes a component for isolating a user from the background of the scene. The component is called the BackgroundRemovedColorStream. This capability has many possible uses, such as simulating chroma-key or “green-screen” replacement of the background – without needing to use an actual green screen; compositing a person’s image into a virtual environment; or simply blurring out the background, so that video conference participants can’t see how messy your office really is.

    BackgroundRemovalBasics

    To use this feature in an application, you create the BackgroundRemovedColorStream, and then feed it each incoming color, depth, and skeleton frame when they are delivered by your Kinect for Windows sensor. You also specify which user you want to isolate, using their skeleton tracking ID. The BackgroundRemovedColorStream produces a sequence of color frames, in BGRA (blue/green/red/alpha) format. These frames are identical in content to the original color frames from the sensor, except that the alpha channel is used to distinguish foreground pixels from background pixels. Pixels that the background removal algorithm considers part of the background will have an alpha value of 0 (fully transparent), while foreground pixels will have their alpha at 255 (fully opaque). The foreground region is given a smoother edge by using intermediate alpha values (between 0 and 255) for a “feathering” effect. This image format makes it easy to combine the background-removed frames with other images in your application.

    As a developer, you get the choice of which user you want in the foreground. The BackgroundRemovalBasics-WPF sample has some simple logic that selects the user nearest the sensor, and then continues to track the same user until they are no longer visible in the scene.

    private void ChooseSkeleton()
    {
        var isTrackedSkeltonVisible = false;
        var nearestDistance = float.MaxValue;
        var nearestSkeleton = 0;
     
        foreach (var skel in this.skeletons)
        {
            if (null == skel)
            {
                 continue;
            }
     
            if (skel.TrackingState != SkeletonTrackingState.Tracked)
            {
                continue;
            }
     
            if (skel.TrackingId == this.currentlyTrackedSkeletonId)
            {
                isTrackedSkeltonVisible = true;
                break;
            }
     
            if (skel.Position.Z < nearestDistance)
            {
                nearestDistance = skel.Position.Z;
                nearestSkeleton = skel.TrackingId;
            }
        }
     
        if (!isTrackedSkeltonVisible && nearestSkeleton != 0)
        {
            this.backgroundRemovedColorStream.SetTrackedPlayer(nearestSkeleton);
            this.currentlyTrackedSkeletonId = nearestSkeleton;
        }
    }

    Wait, only one person?

    If you wanted to select more than one person from the scene to appear in the foreground, it would seem that you’re out of luck, because the BackgroundRemovedColorStream’s SetTrackedPlayer method accepts only one tracking ID. But you can work around this limitation by running two separate instances of the stream, and sending each one a different tracking ID. Each of these streams will produce a separate color image, containing one of the users. These images can then be combined into a single image, or used separately, depending on your application’s needs.

    Wait, only two people?

    In the most straightforward implementation of the multiple stream approach, you’d be limited to tracking just two people, due to an inherent limitation in the skeleton tracking capability of Kinect for Windows. Only two skeletons at a time can be tracked with full joint-level fidelity. The joint positions are required by the background removal implementation in order to perform its job accurately.

    However, there is an additional trick we can apply, to escape the two-skeleton limit. This trick relies on an assumption that the people in the scene will not be moving at extremely high velocities (generally a safe bet). If a particular skeleton is not fully tracked for a frame or two, we can instead reuse the most recent frame in which that skeleton actually was fully tracked. Since the skeleton tracking API lets us choose which two skeletons to track at full fidelity, we can choose a different pair of skeletons each frame, cycling through up to six skeletons we wish to track, over three successive frames.

    Each additional instance of BackgroundRemovedColor stream will place increased demands on CPU and memory. Depending on your own application’s needs and your hardware configuration, you may need to dial back the number of simultaneous users you can process in this way.

    Wait, only six people?

    Demanding, aren’t we? Sorry, the Kinect for Windows skeleton stream can monitor at most six people simultaneously (two at full fidelity, and four at lower fidelity). This is a hard limit.

    Introducing a multi-user background removal sample

    We’ve created a new sample application, called BackgroundRemovalMultiUser-WPF, to demonstrate how to use the technique described above to perform background removal on up to six people. We started with the code from the BackgroundRemovalBasics-WPF sample, and changed it to support multiple streams, one per user. The output from each stream is then overlaid on the backdrop image.

    BackgroundRemovalMultiUser

    Factoring the code: TrackableUser

    The largest change to the original sample was refactoring the application code that interacts the BackgroundRemovedColorStream, so that we can have multiple copies of it running simultaneously. This code, in the new sample, resides in a new class named TrackableUser. Let’s take a brief tour of the interesting parts of this class.

    The application can instruct TrackableUser to track a specific user by setting the TrackingId property appropriately.

    public int TrackingId
    {
        get
        {
            return this.trackingId;
        }
     
        set
        {
            if (value != this.trackingId)
            {
                if (null != this.backgroundRemovedColorStream)
                {
                    if (InvalidTrackingId != value)
                    {
                        this.backgroundRemovedColorStream.SetTrackedPlayer(value);
                        this.Timestamp = DateTime.UtcNow;
                    }      
                    else
                    {
                        // Hide the last frame that was received for this user.
                        this.imageControl.Visibility = Visibility.Hidden;      
                        this.Timestamp = DateTime.MinValue;
                    }      
                }
     
                this.trackingId = value;
            }
        }
    }

    The Timestamp property indicates when the TrackingId was most recently set to a valid value. We’ll see later how this property is used by the sample application’s user-selection logic.

    public DateTime Timestamp { get; private set; }

    Whenever the application is notified that the default Kinect sensor has changed (at startup time, or when the hardware is plugged in or unplugged), it passes this information along to each TrackableUser by calling OnKinectSensorChanged. The TrackableUser, in turn, sets up or tears down its BackgroundRemovedColorStream accordingly.

    public void OnKinectSensorChanged(KinectSensor oldSensor, KinectSensor newSensor)
    {
        if (null != oldSensor)
        {
            // Remove sensor frame event handler.
            oldSensor.AllFramesReady -= this.SensorAllFramesReady;
     
            // Tear down the BackgroundRemovedColorStream for this user.
            this.backgroundRemovedColorStream.BackgroundRemovedFrameReady -=
                 this.BackgroundRemovedFrameReadyHandler;
            this.backgroundRemovedColorStream.Dispose();
            this.backgroundRemovedColorStream = null;
            this.TrackingId = InvalidTrackingId;
        }
     
        this.sensor = newSensor;
     
        if (null != newSensor)
        {
            // Setup a new BackgroundRemovedColorStream for this user.
            this.backgroundRemovedColorStream = new BackgroundRemovedColorStream(newSensor);
            this.backgroundRemovedColorStream.BackgroundRemovedFrameReady +=
                this.BackgroundRemovedFrameReadyHandler;
            this.backgroundRemovedColorStream.Enable(
                newSensor.ColorStream.Format,
                newSensor.DepthStream.Format);
     
            // Add an event handler to be called when there is new frame data from the sensor.
            newSensor.AllFramesReady += this.SensorAllFramesReady;
        }
    }

    Each time the Kinect sensor produces a matched set of depth, color, and skeleton frames, we forward each frame’s data along to the BackgroundRemovedColorStream.

    private void SensorAllFramesReady(object sender, AllFramesReadyEventArgs e)
    {
        ...
            if (this.IsTracked)
            {
                using (var depthFrame = e.OpenDepthImageFrame())
                {      
                    if (null != depthFrame)
                    {
                         // Process depth data for background removal.
                         this.backgroundRemovedColorStream.ProcessDepth(
                             depthFrame.GetRawPixelData(),
                             depthFrame.Timestamp);
                    }
                }
     
                using (var colorFrame = e.OpenColorImageFrame())      
                {      
                    if (null != colorFrame)      
                    {
                        // Process color data for background removal.
                        this.backgroundRemovedColorStream.ProcessColor(
                            colorFrame.GetRawPixelData(),
                            colorFrame.Timestamp);      
                    }
                }
     
                using (var skeletonFrame = e.OpenSkeletonFrame())
                {
                    if (null != skeletonFrame)
                    {
                        // Save skeleton frame data for subsequent processing.
                        CopyDataFromSkeletonFrame(skeletonFrame);
     
                        // Locate the most recent data in which this user was fully tracked.
                        bool isUserPresent = UpdateTrackedSkeletonsArray();
     
                        // If we have an array in which this user is fully tracked,
                        // process the skeleton data for background removal.
                        if (isUserPresent && null != this.skeletonsTracked)
                        {
                            this.backgroundRemovedColorStream.ProcessSkeleton(
                                this.skeletonsTracked,
                                skeletonFrame.Timestamp);
                         }
                    }
                }
            }
        ...
    }

    The UpdateTrackedSkeletonsArray method implements the logic to reuse skeleton data from an older frame when the newest frame contains the user’s skeleton, but not in a fully-tracked state. It also informs the caller whether the user with the requested tracking ID is still present in the scene.

    private bool UpdateTrackedSkeletonsArray()
    {
        // Determine if this user is still present in the scene.
        bool isUserPresent = false;
        foreach (var skeleton in this.skeletonsNew)
        {
            if (skeleton.TrackingId == this.TrackingId)
             {
                isUserPresent = true;
                if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
                 {
                    // User is fully tracked: save the new array of skeletons,
                    // and recycle the old saved array for reuse next time.
                    var temp = this.skeletonsTracked;
                    this.skeletonsTracked = this.skeletonsNew;
                    this.skeletonsNew = temp;
                }
     
                break;
            }
        }
     
        if (!isUserPresent)
        {
            // User has disappeared; stop trying to track.
            this.TrackingId = TrackableUser.InvalidTrackingId;
        }
     
        return isUserPresent;
    }

    Whenever the BackgroundRemovedColorStream produces a frame, we copy its BGRA data to the bitmap that is the underlying Source for an Image element in the MainWindow. This causes the updated frame to appear within the application’s window, overlaid on the background image.

    private void BackgroundRemovedFrameReadyHandler(
        object sender,
        BackgroundRemovedColorFrameReadyEventArgs e)
    {
        using (var backgroundRemovedFrame = e.OpenBackgroundRemovedColorFrame())
        {
             if (null != backgroundRemovedFrame && this.IsTracked)
             {
                 int width = backgroundRemovedFrame.Width;
                 int height = backgroundRemovedFrame.Height;
     
                 WriteableBitmap foregroundBitmap =
                    this.imageControl.Source as WriteableBitmap;
     
                // If necessary, allocate new bitmap. Set it as the source of the Image
                // control.
                if (null == foregroundBitmap ||
                    foregroundBitmap.PixelWidth != width ||
                    foregroundBitmap.PixelHeight != height)
                {
                    foregroundBitmap = new WriteableBitmap(
                        width,      
                        height,
                        96.0,
                        96.0,      
                        PixelFormats.Bgra32,
                        null);
     
                    this.imageControl.Source = foregroundBitmap;
                }
     
                // Write the pixel data into our bitmap.
                foregroundBitmap.WritePixels(
                    new Int32Rect(0, 0, width, height),
                    backgroundRemovedFrame.GetRawPixelData(),
                    width * sizeof(uint),
                    0);
     
                // A frame has been delivered; ensure that it is visible.
                this.imageControl.Visibility = Visibility.Visible;
            }
        }
    }

    Limiting the number of users to track

    As mentioned earlier, the maximum number of trackable users may have a practical limit, depending on your hardware. To specify the limit, we define a constant in the MainWindow class:

    private const int MaxUsers = 6;

    You can modify this constant to have any value from 2 to 6. (Values larger than 6 are not useful, as Kinect for Windows does not track more than 6 users.)

    Selecting users to track: The User View

    We want to provide a convenient way to choose which users will be tracked for background removal. To do this, we present a view of the detected users in a small inset. By clicking on the users displayed in this inset, we can select which of those users are associated with our TrackableUser objects, causing them to be included in the foreground.

    UserView

    We update the user view each time a depth frame is received by the sample’s main window.

    private void UpdateUserView(DepthImageFrame depthFrame)
    {
        ...
        // Store the depth data.
        depthFrame.CopyDepthImagePixelDataTo(this.depthData);      
        ...
        // Write the per-user colors into the user view bitmap, one pixel at a time.
        this.userViewBitmap.Lock();
       
        unsafe
        {
            uint* userViewBits = (uint*)this.userViewBitmap.BackBuffer;
            fixed (uint* userColors = &this.userColors[0])
            {      
                // Walk through each pixel in the depth data.
                fixed (DepthImagePixel* depthData = &this.depthData[0])      
                {
                    DepthImagePixel* depthPixel = depthData;
                    DepthImagePixel* depthPixelEnd = depthPixel + this.depthData.Length;
                    while (depthPixel < depthPixelEnd)
                    {
                        // Lookup a pixel color based on the player index.
                        // Store the color in the user view bitmap's buffer.
                        *(userViewBits++) = *(userColors + (depthPixel++)->PlayerIndex);
                    }
                }
            }
        }
     
        this.userViewBitmap.AddDirtyRect(new Int32Rect(0, 0, width, height));
        this.userViewBitmap.Unlock();
    }

    This code fills the user view bitmap with solid-colored regions representing each of the detected users, as distinguished by the value of the PlayerIndex field at each pixel in the depth frame.

    The main window responds to a mouse click within the user view by locating the corresponding pixel in the most recent depth frame, and using its PlayerIndex to look up the user’s TrackingId in the most recent skeleton data. The TrackingID is passed along to the ToggleUserTracking method, which will attempt to toggle the tracking of that TrackingID between the tracked and untracked states.

    private void UserViewMouseLeftButtonDown(object sender, MouseButtonEventArgs e)
    {
        // Determine which pixel in the depth image was clicked.
        Point p = e.GetPosition(this.UserView);
        int depthX =
            (int)(p.X * this.userViewBitmap.PixelWidth / this.UserView.ActualWidth);
        int depthY =
            (int)(p.Y * this.userViewBitmap.PixelHeight / this.UserView.ActualHeight);
        int pixelIndex = (depthY * this.userViewBitmap.PixelWidth) + depthX;
        if (pixelIndex >= 0 && pixelIndex < this.depthData.Length)
        {
            // Find the player index in the depth image. If non-zero, toggle background
            // removal for the corresponding user.
            short playerIndex = this.depthData[pixelIndex].PlayerIndex;
            if (playerIndex > 0)
            {      
                // playerIndex is 1-based, skeletons array is 0-based, so subtract 1.
                this.ToggleUserTracking(this.skeletons[playerIndex - 1].TrackingId);
            }
        }
    }

    Picking which users will be tracked

    When MaxUsers is less than 6, we need some logic to handle a click on an untracked user, and we are already tracking the maximum number of users. We choose to stop tracking the user who was tracked earliest (based on timestamp), and start tracking the newly chosen user immediately. This logic is implemented in ToggleUserTracking.

    private void ToggleUserTracking(int trackingId)
    {
        if (TrackableUser.InvalidTrackingId != trackingId)
        {
            DateTime minTimestamp = DateTime.MaxValue;
            TrackableUser trackedUser = null;
            TrackableUser staleUser = null;
     
            // Attempt to find a TrackableUser with a matching TrackingId.
            foreach (var user in this.trackableUsers)
            {
                if (user.TrackingId == trackingId)
                {
                    // Yes, this TrackableUser has a matching TrackingId.
                    trackedUser = user;
                }
     
                // Find the "stale" user (the trackable user with the earliest timestamp).
                if (user.Timestamp < minTimestamp)
                {      
                    staleUser = user;
                    minTimestamp = user.Timestamp;
                }
            }
     
            if (null != trackedUser)
            {
                // User is being tracked: toggle to not tracked.
                trackedUser.TrackingId = TrackableUser.InvalidTrackingId;
            }
            else
            {      
                // User is not currently being tracked: start tracking, by reusing
                // the "stale" trackable user.
                staleUser.TrackingId = trackingId;
            }
        }
    }

    Once we’ve determined which users will be tracked by the TrackableUser objects, we need to ensure that those users are being targeted for tracking by the skeleton stream on a regular basis (at least once every three frames). UpdateChosenSkeletons implements this using a round-robin scheme.

    private void UpdateChosenSkeletons()
    {
        KinectSensor sensor = this.sensorChooser.Kinect;
        if (null != sensor)
        {
            // Choose which of the users will be tracked in the next frame.
            int trackedUserCount = 0;
            for (int i = 0; i < MaxUsers && trackedUserCount < this.trackingIds.Length; ++i)
             {
                // Get the trackable user for consideration.
                var trackableUser = this.trackableUsers[this.nextUserIndex];
                if (trackableUser.IsTracked)
                {
                    // If this user is currently being tracked, copy its TrackingId to the
                    // array of chosen users.
                    this.trackingIds[trackedUserCount++] = trackableUser.TrackingId;
                }
     
                // Update the index for the next user to be considered.
                this.nextUserIndex = (this.nextUserIndex + 1) % MaxUsers;
            }      
     
            // Fill any unused slots with InvalidTrackingId.
            for (int i = trackedUserCount; i < this.trackingIds.Length; ++i)
            {
                this.trackingIds[i] = TrackableUser.InvalidTrackingId;
            }
     
            // Pass the chosen tracking IDs to the skeleton stream.
            sensor.SkeletonStream.ChooseSkeletons(this.trackingIds[0], this.trackingIds[1]);
        }
    }

    Combining multiple foreground images

    Now that we can have multiple instances of TrackableUser, each producing a background-removed image of a user, we need to combine those images on-screen. We do this by creating multiple overlapping Image elements (one per trackable user), each parented by the MaskedColorImages element, which itself is a sibling of the Backdrop element. Wherever the background has been removed from each image, the backdrop image will show through.

    As each image is created, we associate it with its own TrackableUser.

    public MainWindow()
    {
        ...
        // Create one Image control per trackable user.
        for (int i = 0; i < MaxUsers; ++i)
        {
            Image image = new Image();
            this.MaskedColorImages.Children.Add(image);
            this.trackableUsers[i] = new TrackableUser(image);
        }
    }

    To capture and save a snapshot of the current composited image, we create two VisualBrush objects, one for the Backdrop, and one for MaskedColorImages. We draw rectangles with each of these brushes, into a bitmap, and then write the bitmap to a file.

    private void ButtonScreenshotClick(object sender, RoutedEventArgs e)
    {
        ...
        var dv = new DrawingVisual();
        using (var dc = dv.RenderOpen())
        {
            // Render the backdrop.
            var backdropBrush = new VisualBrush(Backdrop);      
            dc.DrawRectangle(
                backdropBrush,      
                null,
                new Rect(new Point(), new Size(colorWidth, colorHeight)));
     
            // Render the foreground.
            var colorBrush = new VisualBrush(MaskedColorImages);      
            dc.DrawRectangle(
                colorBrush,
                null,
                new Rect(new Point(), new Size(colorWidth, colorHeight)));
        }
     
        renderBitmap.Render(dv);
        ...
    }

    Summary

    While the BackgroundRemovedColorStream is limited to tracking only one user at a time, the new BackgroundRemovalMultiUser-WPF sample demonstrates that you can run multiple stream instances to track up to six users simultaneously. When using this technique, you should consider – and measure – the increased resource demands (CPU and memory) that the additional background removal streams will have, and determine for yourself how many streams your configuration can handle.

    We hope that this sample opens up new possibilities for using background removal in your own applications.

    John Elsbree
    Principal Software Development Engineer
    Kinect for Windows

  • Kinect for Windows Product Blog

    The Power of Enthusiasm

    • 4 Comments

    OpenKinect founder Josh Blake at Microsoft’s Kinect for Windows Code CampWhen we launched Kinect for Xbox 360 on November 4th, 2010, something amazing happened: talented Open Source hackers and enthusiasts around the world took the Kinect and let their imaginations run wild.  We didn’t know what we didn’t know about Kinect on Windows when we shipped Kinect for Xbox 360, and these early visionaries showed the world what was possible.  What we saw was so compelling that we created the Kinect for Windows commercial program.

    Our commercial program is designed to allow our partners— companies like Toyota, Mattel, American Express, Telefonica, and United Health Group—to deploy solutions to their customers and employees.  It is also designed to allow early adopters and newcomers alike to take their ideas and release them to the world on Windows, with hardware that’s supported by Microsoft.   At the same time, we wanted to let our early adopters keep working on the hardware they’d previously purchased. That is why our SDK continues to support the Kinect for Xbox 360 as a development device.

    Kinect developer Halimat Alabi at Microsoft’s 24-hour coding marathon, June 2011As I reflect back on the past eleven months since Microsoft announced we were bringing Kinect to Windows, one thing is clear: The efforts of these talented Open Source hackers and enthusiasts helped inspire us to develop Kinect for Windows faster.  And their continued ambition and drive will help the world realize the benefits of Kinect for Windows even faster still.  From all of us on the Kinect for Windows team:  thank you.

     Craig Eisler
    General Manager, Kinect for Windows

  • Kinect for Windows Product Blog

    Enabling retailers to drive business in new, innovative ways

    • 3 Comments

    It is essential for retailers to find ways to attract and connect with customers—and to stand out from the competition. To help them do so, the industry is grappling with how to build interactive experiences at scale that engage and truly help customers make satisfying purchasing decisions while also using retail space strategically to provide the best possible experience.

    To get a deeper understanding of what this means, we did extensive first-hand research with dozens of retailers and big brands . We learned how retailers think about implementing natural user interface technology (NUI) and how they see these experiences helping propel their businesses forward.

    What we heard:

    • NUI offers one of the best ways to interact with large screens in stores.
    • Exploring virtual merchandise by gesturing naturally is easy, engaging, and fun for customers.
    • Immersive experiences can improve the purchase process and are an impactful way to market and sell to customers.

    We agree. And we believe it’s important for us to bring these findings back into Kinect for Windows by delivering features that facilitate the best retail innovations. To help support this, we recently released an update to our SDK (Kinect for Windows SDK 1.8) that includes new features specifically designed to enable the development of higher-quality digital signage applications. Key features include the ability to remove backgrounds, an adaptive UI sample, and an HTML interaction sample.

    To help illustrate what this all means, our team developed the following three videos. They show how Kinect for Windows experiences can help retailers attract new customers and engage customers in deeper ways. They offer examples of ways that digital signs powered by Kinect for Windows can draw customers into the business—making it possible for retailers to share offerings, cross-sell and upsell merchandise, bring the “endless aisle” concept to life, and, ultimately, inspire shoppers to purchase. And all of this is accomplished in a beautiful way that feels natural to the customer.


    This enjoyable and easy-to-use application engages new customers by helping them understand and experience the resort, while also providing them with an offer to receive a discount on future stays.



    This digital sign application is powered by Kinect for Windows and makes it easy for shoppers to engage with products, try them on, and purchase them. It also incorporates social media for additional marketing reach.



    This last video demonstrates the ability to welcome a person or people into an immersive real-time experience with the store’s merchandise. It demonstrates the Kinect Fusion scanning features that can be used as part of this and many other retail experiences.


    These videos highlight some of the core benefits retailers tell us Kinect for Windows offers them:

    • Capture a customer's attention
    • Educate customers about products
    • Move a customer through the decision-making cycle to close a sale

    Kinect for Windows does this by optimizing interactions with existing large screens and enhancing the overall retail space—using gesture and voice control, background removal, proximity-based interface, and more.

    So many companies have already created exciting retail experiences with Kinect for Windows: Bloomingdales, Build-a-Bear, Coca-Cola, Mattel, Nissan, Pepsi, and others. We are excited to see the new ways that Kinect for Windows is being applied in retail. The dramatic shifts in consumer shopping behaviors, preferences, and expectations in retail today are driving innovation to new levels. The possibilities are endless when we use the latest technology to put the customer at the heart of the business.

    Kinect for Windows Team

    Key links

     

     

     

  • Kinect for Windows Product Blog

    New Director, Same Direction: the Momentum Continues with Kinect for Windows

    • 3 Comments

    Almost two years ago, Microsoft announced its intent to take Kinect beyond gaming and make it possible for developers and businesses to innovate with Kinect on computers. The Kinect for Windows team was born.

    Shortly after that, I joined the team to oversee Program Management, and over the past year, we’ve shipped the Kinect for Windows sensor as well as multiple updates to the Kinect for Windows software development kit (SDK). Throughout it all, Craig Eisler has been leading our business.

    This month, Craig is moving on to do other important work at Microsoft, and I am stepping in to lead the Kinect for Windows team. I am excited to maintain the amazing momentum we’ve seen in industries like healthcare, retail, education, and automotive. There have been more than 500,000 downloads of our free SDK, and the Kinect for Windows sensor can be purchased in 39 regions today.

    Such rapid growth would not have been possible without the community embracing the technology. Thanks to all of you—business leaders, technical leaders, creative visionaries, and developers—Kinect for Windows has been deployed across the globe. The community is developing new ways for consumers to shop for clothing and accessories, interesting digital signage that delights and inspires customers, remote monitoring tools that make physical therapy easier, more immersive training and simulation applications across multiple industries, and touch-free computing tools that enable surgeons to view patient information without having to leave the operating room. The list goes on and on…and the list is growing every day.

    We launched Kinect for Windows nearly one year ago—pioneering a commercial technology category that didn’t previously exist. I look forward to continuing to be at the forefront of touch-free computing and helping our partners develop innovative solutions that take the natural user interface vision even further. We’ve said it before and I’ll say it again: this is just the beginning. I’m thrilled to continue the great foundational work we did in 2012 and look forward to a very productive 2013.

    Bob Heddle
    Director, Kinect for Windows

  • Kinect for Windows Product Blog

    Using Kinect InteractionStream Outside of WPF

    • 3 Comments

    Last month with the release of version 1.7 of our SDK and toolkit we introduced something called the InteractionStream.  Included in this release were two new samples called Controls Basics and Interaction Gallery which, among other things, show how to use the new InteractionStream along with new interactions like Press and Grip.  Both of these new samples are written using managed code (C#) and WPF.

    One question I’ve been hearing from developers is, “I don’t want to use WPF but I still want to use InteractionStream with managed code.  How do I do this?”  In this post I’m going to show how to do exactly that.  I’m going to take it to the extreme by removing the UI layer completely:  we’ll use a console app using C#.

    The way our application will work is summarized in the diagram below:

    image

     

    There are a few things to note here:

    1. Upon starting the program, we initialize our sensor, interactions, and create FrameReady event handlers.
    2. Our sensor is generating data for every frame.  We use our FrameReady event handlers to respond and handle depth, skeleton, and interaction frames.
    3. The program implements the IInteractionClient interface which requires us to implement a method called GetInteractionInfoAtLocationwhich gives us back information about interactions happening with a particular user at a specified location:
      public InteractionInfo GetInteractionInfoAtLocation(int skeletonTrackingId, InteractionHandType handType, double x, double y)
      {
      var interactionInfo = new InteractionInfo
      {
      IsPressTarget = false,
      IsGripTarget = false
      };

      // Map coordinates from [0.0,1.0] coordinates to UI-relative coordinates
      double xUI = x * InteractionRegionWidth;
      double yUI = y * InteractionRegionHeight;

      var uiElement = this.PerformHitTest(xUI, yUI);

      if (uiElement != null)
      {
      interactionInfo.IsPressTarget = true;

      // If UI framework uses strings as button IDs, use string hash code as ID
      interactionInfo.PressTargetControlId = uiElement.Id.GetHashCode();

      // Designate center of button to be the press attraction point
      //// TODO: Create your own logic to assign press attraction points if center
      //// TODO: is not always the desired attraction point.
      interactionInfo.PressAttractionPointX = ((uiElement.Left + uiElement.Right) / 2.0) / InteractionRegionWidth;
      interactionInfo.PressAttractionPointY = ((uiElement.Top + uiElement.Bottom) / 2.0) / InteractionRegionHeight;
      }

      return interactionInfo;
      }
    4. The other noteworthy part of our program is in the InteractionFrameReady method.  This is where we process information about our users, route our UI events, handle things like Grip and GripRelease, etc.

     

    I’ve posted some sample code that you may download and use to get started using InteractStream in your own managed apps.  The code is loaded with tips in the comments that should get you started down the path of using our interactions in your own apps.  Thanks to Eddy Escardo Raffo on my team for writing the sample console app.

    Ben

    @benlower | kinectninja@microsoft.com | mobile: +1 (206) 659-NINJA (6465)

  • Kinect for Windows Product Blog

    Dev Kit Program Update

    • 3 Comments

    Back in June at the Build Conference we announced and started taking applications for our upcoming developer kit program.

    The response & interest we’ve seen has been tremendous: thousands of developers from 74 different countries applied.

    Mea Culpa

    When we announced the program we said we’d start notifying successful applicants in August. Many people interpreted that to mean that we’d be done with notifications in August. I apologize for not being clearer on this. We never intended to have all the notifications done in August. While we did start in August and have notified many developers of their acceptance to the program, there are still many more applicants to be notified.

    Over the coming weeks we will continue to let applicants know if they are admitted to the program, denied admission, or waitlisted (just like college :-)). Every applicant will hear something from us by the end of September. Anyone who is waitlisted will have a final decision by the end of the year.

    We are still planning to start sending out the developer kits in late November to all program participants.

    Again, apologies for any confusion. Please stay tuned…you will hear something from us soon!

    Ben

    @benlower | kinectninja@microsoft.com | mobile: +1 (206) 659-NINJA (6465)

  • Kinect for Windows Product Blog

    Windows Store app development is coming to Kinect for Windows

    • 2 Comments

    Today at Microsoft BUILD 2014, Microsoft made it official: the Kinect for Windows v2 sensor and SDK are coming this summer (northern hemisphere). With it, developers will be able to start creating Windows Store apps with Kinect for the first time. The ability to build such apps has been a frequent request from the developer community. We are delighted that it’s now on the immediate horizon—with the ability for developers to start developing this summer and to commercially deploy their solutions and make their apps available to Windows Store customers later this summer.

    The ability to create Windows Store apps with Kinect for Windows not only fulfills a dream of our developer community, it also marks an important step forward in Microsoft’s vision of providing a unified development platform across Windows devices, from phones to tablets to laptops and beyond. Moreover, access to the Windows Store opens a whole new marketplace for business and consumer experiences created with Kinect for Windows.

    The Kinect for Windows v2 has been re-engineered with major enhancements in color fidelity, video definition, field of view, depth perception, and skeletal tracking. In other words, the v2 sensor offers greater overall precision, improved responsiveness, and intuitive capabilities that will accelerate your development of voice and gesture experiences.

    Specifically, the Kinect for Windows v2 includes 1080p HD video, which allows for crisp, high-quality augmented scenarios; a wider field of view, which means that users can stand closer to the sensor—making it possible to use the sensor in smaller rooms; improved skeletal tracking, which opens up even better scenarios for health and fitness apps and educational solutions; and new active infrared detection, which provides better facial tracking and gesture detection, even in low-light situations.

    The Kinect for Windows v2 SDK brings the sensor’s new capabilities to life:

    • Window Store app development: Being able to integrate the latest human computing technology into Windows apps and publish those to the Windows Store will give our developers the ability to reach more customers and open up access to natural user experiences in the home.
    • Unity Support: We are committed to supporting the broader developer community with a mix of languages, frameworks, and protocols. With support for Unity this summer, more developers will be able to build and publish their apps to the Windows Store by using tools they already know.
    • Improved anatomical accuracy: With the first-generation SDK, developers were able to track up to two people simultaneously; now, their apps can track up to six. And the number of joints that can be tracked has increased from 20 to 25 joints per person. Lastly, joint orientation is better. The result is skeletal tracking that’s greatly enhanced overall, making it possible for developers to deliver new and improved applications with skeletal tracking, which our preview participants are calling “seamless.”
    • Simultaneous, multi-app support: Multiple Kinect-enabled applications can run simultaneously. Our community has frequently requested this feature and we’re excited to be able to give it to them with the upcoming release.

    Developers who have been part of the Kinect for Windows v2 Developer Preview program praise the new sensor’s capabilities, which take natural, human computing to the next level. We are in awe and humbled by what they’ve already been able to create.

    Technologists from a few participating companies are on hand at BUILD, showing off the apps they have created by using the Kinect for Windows v2. See what two of them, Freak’n Genius and Reflexion Health, have already been able to achieve, and learn more about these companies.

    The v2 sensor and SDK dramatically enhance the world of gesture and voice control that were pioneered in the original Kinect for Windows, opening up new ways for developers to create applications that transform how businesses and consumers interact with computers. If you’re using the original Kinect for Windows to develop natural voice- and gesture-based solutions, you know how intuitive and powerful this interaction paradigm can be. And if you haven’t yet explored the possibilities of building natural applications, what are you waiting for? Join us as we continue to make technology easier to use and more intuitive for everyone.

    The Kinect for Windows Team

    Key links

  • Kinect for Windows Product Blog

    Using Kinect Webserver to Expose Speech Events to Web Clients

    • 2 Comments

    In our 1.8 release, we made it easy to create Kinect-enabled HTML5 web applications. This is possible because we added an extensible webserver for Kinect data along with a Javascript API which gives developers some great functionality right out of the box:

    • Interactions : hand pointer movements, press and grip events useful for controlling a cursor, buttons and other UI
    • User Viewer: visual representation of the users currently visible to Kinect sensor. Uses different colors to indicate different user states
    • Background Removal: “Green screen” image stream for a single person at a time
    • Skeleton: standard skeleton data such as tracking state, joint positions, joint orientations, etc.
    • Sensor Status: Events corresponding to sensor connection/disconnection


    This is enough functionality to write a compelling application but it doesn’t represent the whole range of Kinect sensor capabilities. In this article I will show you step-by-step how to extend the WebserverBasics-WPF sample (see C# code in CodePlex  or documentation in MSDN) available from Kinect Toolkit Browser to enable web applications to respond to speech commands, where the active speech grammar is configurable by the web client.

    A solution containing the full, final sample code is available on CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).

    Getting Started

    To follow along step-by-step:

    1. If you haven’t done so already, install the Kinect for Windows v1.8 SDK and Toolkit
    2. Launch the Kinect Toolkit Browser
    3. Install WebserverBasics-WPF sample in a local directory
    4. Open the WebserverBasics-WPF.sln solution in Visual Studio
    5. Go to line 136 in MainWindow.xaml.cs file


    You should see the following TODO comment which describes exactly how we’re going to expose speech recognition functionality:

    //// TODO: Optionally add factories here for custom handlers:
    ////       this.webserver.SensorStreamHandlerFactories.Add(new MyCustomSensorStreamHandlerFactory());
    //// Your custom factory would implement ISensorStreamHandlerFactory, in which the
    //// CreateHandler method would return a class derived from SensorStreamHandlerBase
    //// which overrides one or more of its virtual methods.

    We will replace this comment with the functionality described below.

    So, What Functionality Are We Implementing?

    image

    More specifically, on the server side we will:

    1. Create a speech recognition engine
    2. Bind the engine to a Kinect sensor’s audio stream whenever sensor gets connected/disconnected
    3. Allow a web client to specify the speech grammar to be recognized
    4. Forward speech recognition events generated by engine to web client
    5. Registering a factory for the speech stream handler with the Kinect webserver


    This will be accomplished by creating a class called SpeechStreamHandler, derived fromMicrosoft.Samples.Kinect.Webserver.Sensor.SensorStreamHandlerBase. SensorStreamHandlerBase is an implementation of ISensorStreamHandler that frees us from writing boilerplate code. ISensorStreamHandler is an abstraction that gets notified whenever a Kinect sensor gets connected/disconnected, when color, depth and skeleton frames become available and when web clients request to view or update configuration values. In response, our speech stream handler will send event messages to web clients.

    On the web client side we will:

    1. Configure speech recognition stream (enable and specify the speech grammar to be recognized)
    2. Modify the web UI in response to recognized speech events


    All new client-side code is in SamplePage.html

    Creating a Speech Recognition Engine

    In the constructor for SpeechStreamHandler you’ll see the following code:

    RecognizerInfo ri = GetKinectRecognizer();
    if (ri != null)
    {
        this.speechEngine = new SpeechRecognitionEngine(ri.Id);

        if (this.speechEngine != null)
        {
            // disable speech engine adaptation feature
            this.speechEngine.UpdateRecognizerSetting("AdaptationOn", 0);
            this.speechEngine.UpdateRecognizerSetting("PersistedBackgroundAdaptation", 0);
            this.speechEngine.AudioStateChanged += this.AudioStateChanged;
            this.speechEngine.SpeechRecognitionRejected += this.SpeechRecognitionRejected;
            this.speechEngine.SpeechRecognized += this.SpeechRecognized;
        }
    }

    This code snippet will be familiar if you’ve looked at some of our other speech samples such as SpeechBasics-WPF. Basically, we’re getting the metadata corresponding to the Kinect acoustic model (GetKinectRecognizer is hardcoded to use English language acoustic model in this sample, but this can be changed by installing additional language packs and modifying GetKinectRecognizer to look for the desired culture name), using it to create a speech engine, turning off some settings related to audio adaptation feature (which makes speech engine better suited for long-running scenarios) and registering to receive events when speech is recognized or rejected, or when audio state (e.g.: silence vs someone speaking) changes.

    Binding the Speech Recognition Engine to a Kinect Sensor’s Audio Stream

    In order to do this, we override SensorStreamHandlerBase’s implementation of OnSensorChanged, so we can find out about sensors connecting and disconnecting.

    public override void OnSensorChanged(KinectSensor newSensor)
    {
        base.OnSensorChanged(newSensor);
        if (this.sensor != null)
        {
            if (this.speechEngine != null)
            {
                this.StopRecognition();
                this.speechEngine.SetInputToNull();
                this.sensor.AudioSource.Stop();
            }
        }

        this.sensor = newSensor;
        if (newSensor != null)
        {
            if (this.speechEngine != null)
            {
                this.speechEngine.SetInputToAudioStream(
                    newSensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
                this.StartRecognition(this.grammar);
            }
        }
    }

    The main thing we need to do here is Start the AudioSource of the newly connected Kinect sensor in order to get an audio stream that we can hook up as the input to the speech engine. We also need to specify the format of the audio stream, which is a single-channel, 16-bits per sample, Pulse Code Modulation (PCM) stream, sampled at 16kHz.

    Allow Web Clients to Specify Speech Grammar

    We will let clients send us the whole speech grammar that they want recognized, as XML that conforms to the W3C Speech Recognition Grammar Specification format version 1.0. To do this, we will expose a configuration property called “grammarXml”.

    Let’s backtrack a little bit because earlier we glossed over the bit of code in the SpeechStreamHandler constructor where we register the handlers for getting and setting stream configuration properties:

    this.AddStreamConfiguration(SpeechEventCategory, new StreamConfiguration(this.GetProperties, this.SetProperty));

    Now, in the SetProperty method we call LoadGrammarXml method whenever a client sets the “grammarXml” property:

    case GrammarXmlPropertyName:
        this.LoadGrammarXml((string)propertyValue);
        break;

    And in the LoadGrammarXml method we do the real work of updating the speech grammar:

    private void LoadGrammarXml(string grammarXml)
    {
        this.StopRecognition();

        if (!string.IsNullOrEmpty(grammarXml))
        {
            using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(grammarXml)))
            {
                Grammar newGrammar;
                try
                {
                    newGrammar = new Grammar(memoryStream);
                }
                catch (ArgumentException e)
                {
                    throw new InvalidOperationException("Requested grammar might not contain a root rule", e);
                }
                catch (FormatException e)
                {
                    throw new InvalidOperationException("Requested grammar was specified with an invalid format", e);
                }

                this.StartRecognition(newGrammar);
            }
        }
    }

    We first stop the speech recognition because we don’t yet know if the specified grammar is going to be valid or not, then we try to create a new Microsoft.Speech.Recognition.Grammar object from the specified property value. If the property value does not represent a valid grammar, newGrammar variable will remain null. Finally, we call StartRecognition method, which loads the grammar into the speech engine (if grammar is valid), and tells the speech engine to start recognizing and keep recognizing speech phrases until we explicitly tell it to stop.

    private void StartRecognition(Grammar g)
    {
        if ((this.sensor != null) && (g != null))
        {
            this.speechEngine.LoadGrammar(g);
            this.speechEngine.RecognizeAsync(RecognizeMode.Multiple);
        }

        this.grammar = g;
    }

    Send Speech Recognition Events to Web Client

    When we created the speech recognition engine, we registered for 3 events: AudioStateChanged, SpeechRecognized and SpeechRecognitionRejected. Whenever any of these events happen we just want to forward the event to the web client. Since the code ends up being very similar, we will focus on the SpeechRecognized event handler:

    private async void SpeechRecognized(object sender, SpeechRecognizedEventArgs args)
    {
        var message = new RecognizedSpeechMessage(args);
        await this.ownerContext.SendEventMessageAsync(message);
    }

    To send messages to web clients we use functionality exposed by ownerContext, which is an instance of the SensorStreamHandlerContext class which was passed to us in the constructor. The messages are sent to clients using a web socket channel, and could be

    • Stream messages: Messages that are generated continuously, at a predictable rate (e.g.: 30 skeleton stream frames are generated every second), where the data from each message replaces the data from the previous message. If we drop one of these messages every so often there is no major consequence because another will arrive shortly thereafter with more up-to-date data, so the framework might decide to drop one of these messages if it detects a bottleneck in the web socket channel.
    • Event messages: Messages that are generated sporadically, at an unpredictable rate, where each event represents an isolated incident. As such, it is not desirable to drop any one of these kind of messages.


    Given the nature of speech recognition, we chose to communicate with clients using event messages. Specifically, we created the RecognizedSpeechMessage class, which is a subclass of EventMessage that serves as a representation of SpeechRecognizedEventArgs which can be easily serialized as JSON and follows the JavaScript naming conventions.

    You might have noticed the usage of the “async” and “await” keywords in this snippet. They are described in much more detail in MSDN but, in summary, they enable an asynchronous programming model so that long-running operations don’t block thread execution while not necessarily using more than one thread. The Kinect webserver uses a single thread to schedule tasks so the consequence for you is that ISensorStreamHandler implementations don’t need to be thread-safe, but should be aware of potential re-entrancy due to asynchronous behavior.

    Registering a Speech Stream Handler Factory with the Kinect Webserver

    The Kinect webserver can be started, stopped and restarted, and each time it is started it creates ISensorStreamHandler instances in a thread dedicated to Kinect data handling, which is the only thread that ever calls these objects. To facilitate this behavior, the server doesn’t allow for direct registration of ISensorStreamHandler instances and instead expects ISensorStreamHandlerFactory instances to be registered in KinectWebserver.SensorStreamHandlerFactories property.

    For the purposes of this sample, we declared a private factory class that is exposed as a static singleton instance directly from the SpeechStreamHandler class:

    public class SpeechStreamHandler : SensorStreamHandlerBase, IDisposable
    {
        ...

        static SpeechStreamHandler()
        {
            Factory = new SpeechStreamHandlerFactory();
        }
        ...

        public static ISensorStreamHandlerFactory Factory { get; private set; }
        ...

        private class SpeechStreamHandlerFactory : ISensorStreamHandlerFactory
        {
            public ISensorStreamHandler CreateHandler(SensorStreamHandlerContext context)
            {
                return new SpeechStreamHandler(context);
            }
        }
    }

    Finally, back in line 136 of MainWindow.xaml.cs, we replace the TODO comment mentioned above with

    // Add speech stream handler to the list of available handlers, so web client
    // can configure speech grammar and receive speech events
    this.webserver.SensorStreamHandlerFactories.Add(SpeechStreamHandler.Factory);

    Configure Speech Recognition Stream in Web Client

    The sample web client distributed with WebserverBasics-WPF is already configuring a couple of other streams in the function called updateUserState in SamplePage.html, so we will add the following code to this function:

    var speechGrammar = '\
    <grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" root="DefaultRule" xmlns="\' data-mce-href=">\'>\'>\'>\'>http://www.w3.org/2001/06/grammar">\
        <rule id="DefaultRule" scope="public">\
            <one-of>\
                <item>\
                    <tag>SHOW</tag>\
                    <one-of><item>Show Panel</item><item>Show</item></one-of>\
                </item>\
                <item>\
                    <tag>HIDE</tag>\
                    <one-of><item>Hide Panel</item><item>Hide</item></one-of>\
                </item>\
            </one-of>\
        </rule>\
    </grammar>';

    immediateConfig["speech"] = { "enabled": true, "grammarXml": speechGrammar };

    This code enables the speech stream and specifies a grammar that

    • triggers a recognition event with “SHOW” as semantic value whenever a user utters the phrases “Show” or “Show Panel”
    • triggers a recognition event with “HIDE” as semantic value whenever a user utters the phrases “Hide” or “Hide Panel”

    Modify the web UI in response to recognized speech events

    The sample web client already registers an event handler function, so we just need to update it to respond to speech events in addition to user state events:

    function onSpeechRecognized(recognizedArgs) {
        if (recognizedArgs.confidence > 0.7) {
            switch (recognizedArgs.semantics.value) {
                case "HIDE":
                    setChoosePanelVisibility(false);
                    break;
                case "SHOW":
                    setChoosePanelVisibility(true);
                    break;
            }
        }
    }
    ...

    sensor.addEventHandler(function (event) {
        switch (event.category) {
            ...            
            case "speech":
                switch (event.eventType) {
                    case "recognized":
                        onSpeechRecognized(event.recognized);
                        break;
                }
                break;
        }
    });

    Party Time!

    At this point you can rebuild the updated solution and run it to see the server UI. From this UI you can click on the link that reads “Open sample page in default browser” and play with the sample UI. It will look the same as before the code changes, but will respond to the speech phrases “Show”, “Show Panel”, “Hide” and “Hide Panel”. Now try changing the grammar to include more phrases and update the UI in different ways in response to speech events.

    Happy coding!

    Additional Resources

  • Kinect for Windows Product Blog

    Kinect for Windows Academic Pricing Update

    • 2 Comments

    Shortly after the commercial release of Kinect for Windows in early 2012, Microsoft announced the availability of academic pricing for the Kinect for Windows sensor to higher education faculty and students for $149.99 at the Microsoft Store in the United States. We are now pleased to announce that we have broadened the availability of academic pricing through Microsoft Authorized Educational Resellers (AERs).

    Most of these resellers have the capability to offer academic pricing directly to educational institutions; academic researchers; and students, faculty, and staff of public or private K-12 schools, vocational schools, junior colleges, colleges, universities, and scientific or technical institutions. In the United States, eligible institutions are accredited by associations that are recognized by the US Department of Education and/or the State Board of Education. Academic pricing on the Kinect for Windows sensor is currently available through AERs in the United States, Taiwan, and Hong Kong SAR.

    Within the academic community, the potential of Kinect for Windows in the classroom is generating a lot of excitement. Researchers and academia in higher education collaborate with Microsoft Research on a variety of projects that involve educational uses of Kinect for Windows. The educator driven community resource, KinectEDucation, encourages developers, teachers, students, enthusiasts and any other education stakeholders to help transform classrooms with accessible technology.
     
    One such development is a new product from Kaplan Early Learning Company, the Inspire-NG Move, bundled with the Kinect for Windows sensor. This bundle includes four educational programs for children age three years and older. The programs make it possible for children to experience that hands-on, kinesthetic play with a purpose makes learning fun. The bundle currently sells for US$499.

    “We’re excited about the new learning models that are enabled by Kinect for Windows,” stated Chris Gerblick, vice president of IT and Professional Services at Kaplan Early Learning Company. “We see the Inspire NG-Move family of products as excellent learning tools for both the classroom and the home.”

    With the availability of academic pricing, we look forward to many developments from the academic community that integrate Kinect for Windows into interactive educational experiences.

    Michael Fry
    Business Development, Strategic Alliances
    Kinect for Windows

    Key Links

     

Page 3 of 9 (85 items) 12345»