• Kinect for Windows Product Blog

    Partners Deliver Custom Solutions that Use Kinect for Windows

    • 2 Comments

    Kinect for Windows demos at Microsoft Worldwide Partner Conference

    Kinect for Windows partners are finding new business opportunities by helping to develop new custom applications and ready-made solutions for various commercial customers, such as the Coca-Cola Company, and vertical markets, including the health care industry.

    Several of these solutions were on display at the Microsoft Worldwide Partner Conference (WPC) in Toronto, Canada, where Kinect for Windows took the stage with two amazing demos as well as strong booth showings at the Solutions Innovation Center.

    "Being part of the WPC 2012 event was a great opportunity to showcase our Kinect-based 3-D scanner, and the response was incredibly awesome, both on stage when the audience would spontaneously clap and cheer in the middle of the scan, and in the Kinect for Windows trade show area where people would stand in line to get scanned," said Nicolas Tisserand, co-founder of the France-based Manctl, one of the 11 companies in the Microsoft Accelerator for Kinect program.

    Manctl's Skanect scanner software uses the Kinect sensor to build high quality 3-D digital models of people and objects, which can be sent to a 3-D printer to create detailed plastic extruded sculptures. "Kinect for Windows is a fantastic device, capable of so much more than just game control. It's making depth sensing a commodity," Tisserand added.

    A demo from übi interactive in Germany uses the Kinect sensor to turn virtually any surface into a 3-D touchscreen that can control interfaces, apps, and games. "Kinect for Windows is a great piece of hardware and it works perfect[ly] with our software stack," reported übi co-founder David Hajizadeh. "As off-the-shelf hardware, it massively reduced our costs and we see lots of opportunities for business applications that offer huge value for our customers."

    Snibbe Interactive created its SocialMirror Coke Kiosk to deliver a Kinect-based game in which players aim a stream of soda into a glass and then share videos of the experience with their social networks. "We were extremely excited to show off our unique Coca-Cola branded interactive experience and its unique ability to create instant ROI [return on investment] through our viral marketing component," reported Alan Shimoide, director of engineering at Snibbe.

    InterKnowlogy developed KinectHealth to assist doctors with motion-controlled access to patient records and surgery planning tools. "A true game changer, Kinect for Windows allows our designers and developers to think differently about business cases across many verticals," noted Kevin Custer, the director of strategic marketing and partnerships at InterKnowlogy. "Kinect for Windows is not just how we interact with computers, but it offers unique ways to add gesture and voice to our natural user-interface designed software—the combination of which is changing lives of customers and users alike."
     
    "Avanade has already delivered several innovative solutions using Kinect, and we expect that demand to keep growing," said Ben Reierson, innovation manager at Avanade, whose Kinect for Virtual Healthcare includes video chat for connecting clinics to remote doctors for online appointments. "Customers and partners are clearly getting more serious about the possibilities of Kinect and natural user interfaces."

    Kinect for Windows Team

    Key Links

  • Kinect for Windows Product Blog

    Monsters Come to Life with Kinect for Windows

    • 2 Comments

    A demon dog robot under constructionIt all started with a couple of kids and a remarkable idea, which eventually spawned two terrifying demon dogs and their master. This concept is transforming the haunt industry and could eventually change how theme parks and other entertainment businesses approach animated mechanical electronics (animatronics).
     
    Here's the behind-the-scenes story of how this all came to be:

    The boys, 6-year-old Mark and 10-year-old Jack, fell in love with Travel Channel's Making Monsters, a TV program that chronicles the creation of lifelike animatronic creatures. After seeing their dad's work with Kinect for Windows at the Minneapolis-based Microsoft Technology Center, they connected the dots and dreamed up the concept: wouldn't it be awesome if Dad could use his expertise with the Kinect for Windows motion sensor to make better and scarier monsters?

    So “Dad”—Microsoft developer and technical architect Todd Van Nurden—sent an email to Distortions Unlimited in Greeley, Colorado, offering praise of their work sculpting monsters out of clay and adjustable metal armatures. He also threw in his boys' suggestion on how they might take things to the next level with Kinect for Windows: Imagine how much cooler and more realistic these monsters could be if they had the ability to see you, hear you, anticipate your behavior, and respond to it. Imagine what it means to this industry now that monster makers can take advantage of the Kinect for Windows gesture and voice capabilities.

    Two months passed. Then one day, Todd received a voice mail message from Distortions CEO Ed Edmunds expressing interest. The result: nine months of off-and-on work, culminating with the debut of a Making Monsters episode detailing the project on Travel Channel earlier today, October 21 (check local listings for show times, including repeat airings). The full demonic installation can also be experienced firsthand at The 13th Floor haunted house in Denver, Colorado, now through November 10.

    To get things started, Distortions sent Van Nurden maquettes—scale models about one-quarter of the final size—to build prototypes of two demon dogs and their demon master. Van Nurden worked with Parker, a company that specializes in robotics, to develop movement by using random path manipulation that is more fluid than your typical robot and also is reactive and only loosely scripted. The maquettes were wired to Kinect for Windows with skeletal tracking, audio tracking, and voice control functionality as a proof of concept to suggest a menu of possible options.

    Distortions was impressed. "Ed saw everything it could do and said, 'I want all of them. We need to blow this out’," recalled Van Nurden.


    Todd Van Nurden prepares to install the Kinect for Windows sensor in the demon's belt 
    Todd Van Nurden prepares to install the Kinect for Windows sensor in the demon's belt

    The full-sized dogs are four feet high, while the demon master stands nearly 14 feet. A Kinect for Windows sensor connected to a ruggedized Lenovo M92 workstation is embedded in the demon's belt and, after interpreting tracking data, sends commands to control itself and the dogs via wired Ethernet. Custom software, built by using the Kinect for Windows SDK, provides the operators with a drag-and-drop interface for laying out character placement and other configurable settings. It also provides a top-down view for the attraction's operator, displaying where the guests are and how the creatures are tracking them.

    "We used a less common approach to processing the data as we leveraged the Reactive Extensions for .NET to basically set up push-based Linq subscriptions," Van Nurden revealed. "The drag-and-drop features enable the operator to control the place-space configuration, as well as when certain behaviors begin. We used most of the Kinect for Windows SDK managed API with the exception of raw depth data."

    The dogs are programmed to react very differently if approached by an adult (which might elicit a bark or growl) versus a child (which could prompt a fast pant or soft whimper). Scratching behind a hound's ears provokes a "happy dog" response—assuming you can overcome your fear and get close enough to actually touch one! Each action or mood includes its own set of kinesthetic actions and vocal cues. The sensor quietly tracks groups of people, alternating between a loose tracking algorithm that can calculate relative height quickly when figures are further away and full skeletal tracking when someone approaches a dog or demon, requiring more detailed data to drive the beasts' reactions.

    The end product was so delightfully scary that Van Nurden had to reassure his own sons when they were faced with a life-sized working model of one of the dogs. "I programmed him, he's not going to hurt you," he comforted them.

    Fortunately, it is possible to become the demons' master. If you perform a secret voice and movement sequence, they will actually bow to you.

    Lisa Tanzer, executive producer for Making Monsters, has been following creature creation for two years while shooting the show at Distortions Unlimited. She was impressed by how much more effective the Kinect for Windows interactivity is than the traditional looped audio and fully scripted movements of regular animatronics: "Making the monsters themselves is the same process—you take clay, sculpt it over an armature, mold it, paint it, all the same steps," she said. "The thing that made this project Distortions did for 13th Floor so incredible and fascinating was the Kinect for Windows technology.”

    "It can be really scary," Tanzer reported. "The dogs and demon creature key into people and actually track them around the room. The dog turns, looks at you and whimpers; you go 'Oh, wow, is this thing going to get me?' It's just like a human actor latching on to somebody in a haunted house but there's no human, only this incredible technology.”

    "Incorporating Kinect for Windows into monster making is very new to the haunt industry," she added. "In terms of the entertainment industry, it's a huge deal. I think it's a really cool illustration of where things are going."

    Kinect for Windows team


    Key Links

  • Kinect for Windows Product Blog

    Mysteries of Kinect for Windows Face Tracking output explained

    • 2 Comments

    Since the release of Kinect for Windows version 1.5, developers have been able to use the Face Tracking software development kit (SDK) to create applications that can track human faces in real time. Figure 1, an illustration from the Face Tracking documentation, displays 87 of the points used to track the face. Thirteen points are not illustrated here—more on those points later.

    Figure 1: Tracked Points
    Figure 1: Tracked Points

    You have questions...

    Based on feedback we received via comments and forum posts, it is clear there is some confusion regarding the face tracking points and the data values found when using the SDK sample code. The managed sample, FaceTrackingBasics-WPF, demonstrates how to visualize mesh data by displaying a 3D model representation on top of the color camera image.

    MeshModel - Copy
    Figure 2: Screenshot from FaceTrackingBasics-WPF

    By exploring this sample source code, you will find a set of helper functions under the Microsoft.Kinect.Toolkit.FaceTracking project, in particular GetProjected3DShape(). What many have found was the function returned a collection where the length of the array was 121 values. Additionally, some have also found an enum list, called “FeaturePoint”, that includes 70 items.

    We have answers...

    As you can see, we have two main sets of numbers that don't seem to add up. This is because these are two sets of values that are provided by the SDK:

    1. 3D Shape Points (mesh representation of the face): 121
    2. Tracked Points: 87 + 13

    The 3D Shape Points (121 of them) are the mesh vertices that make a 3D face model based on the Candide-3 wireframe.

    Figure 3: image from http://www.icg.isy.liu.se/candide/img/candide3_rot128.gif
    Figure 3: Wireframe of the Candide-3 model http://www.icg.isy.liu.se/candide/img/candide3_rot128.gif

    These vertices are morphed by the FaceTracking APIs to align with the face. The GetProjected3DShape method returns the vertices as an array of  Vector3DF[]. These values can be enumerated by name using the "FeaturePoint" list. For example, TopSkull, LeftCornerMouth, or OuterTopRightPupil. Figure 4 shows these values superimposed on top of the color frame. 

    FeaturePoints
    Figure 4: Feature Point index mapped on mesh model

    To get the 100 tracked points mentioned above, we need to dive more deeply into the APIs. The managed APIs, provide an FtInterop.cs file that defines an interface, IFTResult, which contains a Get2DShapePoints function. FtInterop is a wrapper for the native library that exposes its functionality to managed languages. Users of the unmanaged C++ API may have already seen this and figured it out. Get2DShapePoints is the function that will provide the 100 tracked points.

    If we have a look at the function, it doesn’t seem to be useful to a managed code developer:

    // STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE;
    void Get2DShapePoints(out IntPtr pointsPtr, out uint pointCount);

    To get a better idea of how you can get a collection of points from IntPtr, we need to dive into the unmanaged function:

    /// <summary>
    /// Returns 2D (X,Y) coordinates of the key points on the aligned face in video frame coordinates.
    /// </summary>
    /// <param name="ppPoints">Array of 2D points (as FT_VECTOR2D).</param>
    /// <param name="pPointCount">Number of elements in ppPoints.</param>
    /// <returns>If the method succeeds, the return value is S_OK. If the method fails, the return value can be E_POINTER.</returns>
    STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE; 

    The function will give us a pointer to the FT_VECTOR2D array. To consume the data from the pointer, we have to create a new function for use with managed code.

    The managed code

    First, you need to create an array to contain the data that is copied to managed memory. Since FT_VECTOR2D is an unmanaged structure, to marshal the data to the managed wrapper, we must have an equivalent data type to match. The managed version of this structure is PointF (structure that uses floats for x and y).

    Now that we have a data type, we need to convert IntPtr to PointF[]. Searching the code, we see that the FaceTrackFrame class wraps the IFTResult object. This also contains the GetProjected3DShape() function we used before, so this is a good candidate to add a new function, GetShapePoints. It will look something like this:

    // populates an array for the ShapePoints
    public void GetShapePoints(ref Vector2DF[] vector2DF)
    {
         // get the 2D tracked shapes
         IntPtr ptBuffer = IntPtr.Zero;
         uint ptCount = 0;
         this.faceTrackingResultPtr.Get2DShapePoints(out ptBuffer, out ptCount);
         if (ptCount == 0)
         {
            
    vector2DF = null;
             return;
         }
     
         // create a managed array to hold the values
         if (vector2DF == null || (vector2DF != null && vector2DF.Length != ptCount))
         {
             vector2DF = new Vector2DF[ptCount];
         }

         ulong sizeInBytes = (ulong)Marshal.SizeOf(typeof(Vector2DF));
         for (ulong i = 0; i < ptCount; i++)
         {
             vector2DF[i] = (Vector2DF)Marshal.PtrToStructure((IntPtr)((ulong)ptBuffer + (i * sizeInBytes)), typeof(Vector2DF));
         }
     } 

    To ensure we are using the data correctly, we refer to the documentation on Get2DShapePoints:

    IFTResult::Get2DShapePoints Method gets the (x,y) coordinates of the key points on the aligned face in video frame coordinates.

    The PointF values represent the mapped values on the color image. Since we know it matches the color frame, there is no need to do apply mapping. You can call the function to get the data, which should align to the color image coordinates.

    The Sample Code

    The modified version of FaceTrackingBasics-WPF is available in the sample code that can be downloaded from CodePlex. It has been modified to allow you to display the feature points (by name or by index value) and toggle the mesh drawing. Because of the way WPF renders, the performance can suffer on machines with lower end graphics cards. I recommend that you only enable these one at a time. If your UI becomes unresponsive, you can block the sensor with your hand to prevent FaceTracking data capturing. Since the application will not detect any face tracked data, it will not render any points, giving you the opportunity to reset the features you enabled by using the UI controls.

    ShapePoints
    Figure 5: ShapePoints mapped around the face

    As you can see in Figure 5, the additional 13 points are the center of the eyes, the tip of the nose, and the areas above the eyebrows on the forehead. Once you enable a feature and tracking begins, you can zoom into the center and see the values more clearly.

    A summary of the changes:

    MainWindows.xaml/.cs:

    • UI changes to enable slider and draw selections

     

    FaceTrackingViewer.cs:

    • Added a Grid control – used for the UI elements
    • Modified the constructor to initialize grid
    • Modified the OnAllFrameReady event
      • For any tracked skeletons, create a canvas and add to the grid. Use that as the parent to put the label controls

    public partial class FaceTrackingViewer : UserControl, IDisposable
    {
         private Grid grid;

         public FaceTrackingViewer()
         {
             this.InitializeComponent();

             // add grid to the layout
             this.grid = new Grid();
             this.grid.Background = Brushes.Transparent;
             this.Content = this.grid;
         }

         private void OnAllFramesReady(object sender, AllFramesReadyEventArgs allFramesReadyEventArgs)
         {
             ...
             // We want keep a record of any skeleton, tracked or untracked.
             if (!this.trackedSkeletons.ContainsKey(skeleton.TrackingId))
             {
                 // create a new canvas for each tracker
                 Canvas canvas = new Canvas();
                 canvas.Background = Brushes.Transparent;
                 this.grid.Children.Add( canvas );
                
                 this.trackedSkeletons.Add(skeleton.TrackingId, new SkeletonFaceTracker(canvas));
             }
             ...
         }
    }

    SkeletonFaceTracker class changes:

    • New property: DrawFraceMesh, DrawShapePoints, DrawFeaturePoint, featurePoints, lastDrawFeaturePoints, shapePoints, labelControls, Canvas
    • New functions: FindTextControl UpdateTextControls, RemoveAllFromCanvas, SetShapePointsLocations, SetFeaturePointsLocations
    • Added the constructor to keep track of the parent control
    • Changed the DrawFaceModel function to draw based on what data was selected
    • Updated the OnFrameReady event to recalculate the positions based for the drawn elements
      • If DrawShapePoints is selected, then we call our new function

    private class SkeletonFaceTracker : IDisposable
    {
    ...
        // properties to toggle rendering 3D mesh, shape points and feature points
        public bool DrawFaceMesh { get; set; }
        public bool DrawShapePoints { get; set; }
        public DrawFeaturePoint DrawFeaturePoints { get; set; }

        // defined array for the feature points
        private Array featurePoints;
        private DrawFeaturePoint lastDrawFeaturePoints;

        // array for Points to be used in shape points rendering
        private PointF[] shapePoints;

        // map to hold the label controls for the overlay
        private Dictionary<string, Label> labelControls;

        // canvas control for new text rendering
        private Canvas Canvas;

        // canvas is passed in for every instance
        public SkeletonFaceTracker(Canvas canvas)
        {
            this.Canvas = canvas;
        }

        public void DrawFaceModel(DrawingContext drawingContext)
        {
            ...
            // only draw if selected
            if (this.DrawFaceMesh && this.facePoints != null)
            {
                ...
            }
        }

        internal void OnFrameReady(KinectSensor kinectSensor, ColorImageFormat colorImageFormat, byte[] colorImage, DepthImageFormat depthImageFormat, short[] depthImage, Skeleton skeletonOfInterest)
        {
            ...
            if (this.lastFaceTrackSucceeded)
            {
                ...
                if (this.DrawFaceMesh || this.DrawFeaturePoints != DrawFeaturePoint.None)
                {
                    this.facePoints = frame.GetProjected3DShape();
                }

                // get the shape points array
                if (this.DrawShapePoints)
                {
                    this.shapePoints = frame.GetShapePoints();
                }
            }

            // draw/remove the components
            SetFeaturePointsLocations();
            SetShapePointsLocations();
        }

        ...
    }

    Pulling it all together...

    As we have seen, there are two types of data points that are available from the Face Tracking SDK:

    • Shape Points: data used to track the face
    • Mesh Data: vertices of the 3D model from the GetProjected3DShape() function
    • FeaturePoints: named vertices on the 3D model that play a significant role in face tracking

    To get the shape point data, we have to extend the current managed wrapper with a new function that will handle the interop with the native API.

    Carmine Sirignano
    Developer Support Escalation Engineer
    Kinect for Windows

    Additional Resources:

     

  • Kinect for Windows Product Blog

    Swap your face…really

    • 2 Comments

    Ever wish you looked like someone else? Maybe Brad Pitt or Jennifer Lawrence? Well, just get Brad or Jennifer in the same room with you, turn on the Kinect for Windows v2 sensor, and presto: you can swap your mug for theirs (and vice versa, of course). Don’t believe it? Then take a look at this cool video from Apache, in which two developers happily trade faces.

    Swapping faces in real time—let the good times roll

    According to Adam Vahed, managing director at Apache, the ability of the Kinect for Windows v2 sensor and SDK to track multiple bodies was essential to this project, as the solution needed to track the head position of both users. In fact, Adam rates the ability to perform full-skeletal tracking of multiple bodies as the Kinect for Windows v2 sensor’s most exciting feature, observing that it “opens up so many possibilities for shared experiences and greater levels of game play in the experiences we create.”

    Adam admits that the face swap demo was done mostly for fun. That said, he also notes that “the ability to identify and capture a person’s face in real time could be very useful for entertainment-based experiences—for instance, putting your face onto a 3D character that can be driven by your own movements.”

    Adam also stressed the value of the higher definition color feed in the v2 sensor, noting that Apache’s developers directly manipulated this feed in the face swap demo in order to achieve the desired effect. He finds the new color feed provides the definition necessary for full-screen augmented-reality experiences, something that wasn’t possible with the original Kinect for Windows sensor.

    Above all, Adam encourages other developers to dive in with the Kinect for Windows v2 sensor and SDK—to load the samples and play around with the capabilities. He adds that the forums are a great source of inspiration as well as information, and he advises developers “to take a look at what other people are doing and see if you can do something different or better—or both!”

    The Kinect for Windows Team

    Key links

  • Kinect for Windows Product Blog

    Using Kinect Webserver to Expose Speech Events to Web Clients

    • 2 Comments

    In our 1.8 release, we made it easy to create Kinect-enabled HTML5 web applications. This is possible because we added an extensible webserver for Kinect data along with a Javascript API which gives developers some great functionality right out of the box:

    • Interactions : hand pointer movements, press and grip events useful for controlling a cursor, buttons and other UI
    • User Viewer: visual representation of the users currently visible to Kinect sensor. Uses different colors to indicate different user states
    • Background Removal: “Green screen” image stream for a single person at a time
    • Skeleton: standard skeleton data such as tracking state, joint positions, joint orientations, etc.
    • Sensor Status: Events corresponding to sensor connection/disconnection


    This is enough functionality to write a compelling application but it doesn’t represent the whole range of Kinect sensor capabilities. In this article I will show you step-by-step how to extend the WebserverBasics-WPF sample (see C# code in CodePlex  or documentation in MSDN) available from Kinect Toolkit Browser to enable web applications to respond to speech commands, where the active speech grammar is configurable by the web client.

    A solution containing the full, final sample code is available on CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).

    Getting Started

    To follow along step-by-step:

    1. If you haven’t done so already, install the Kinect for Windows v1.8 SDK and Toolkit
    2. Launch the Kinect Toolkit Browser
    3. Install WebserverBasics-WPF sample in a local directory
    4. Open the WebserverBasics-WPF.sln solution in Visual Studio
    5. Go to line 136 in MainWindow.xaml.cs file


    You should see the following TODO comment which describes exactly how we’re going to expose speech recognition functionality:

    //// TODO: Optionally add factories here for custom handlers:
    ////       this.webserver.SensorStreamHandlerFactories.Add(new MyCustomSensorStreamHandlerFactory());
    //// Your custom factory would implement ISensorStreamHandlerFactory, in which the
    //// CreateHandler method would return a class derived from SensorStreamHandlerBase
    //// which overrides one or more of its virtual methods.

    We will replace this comment with the functionality described below.

    So, What Functionality Are We Implementing?

    image

    More specifically, on the server side we will:

    1. Create a speech recognition engine
    2. Bind the engine to a Kinect sensor’s audio stream whenever sensor gets connected/disconnected
    3. Allow a web client to specify the speech grammar to be recognized
    4. Forward speech recognition events generated by engine to web client
    5. Registering a factory for the speech stream handler with the Kinect webserver


    This will be accomplished by creating a class called SpeechStreamHandler, derived fromMicrosoft.Samples.Kinect.Webserver.Sensor.SensorStreamHandlerBase. SensorStreamHandlerBase is an implementation of ISensorStreamHandler that frees us from writing boilerplate code. ISensorStreamHandler is an abstraction that gets notified whenever a Kinect sensor gets connected/disconnected, when color, depth and skeleton frames become available and when web clients request to view or update configuration values. In response, our speech stream handler will send event messages to web clients.

    On the web client side we will:

    1. Configure speech recognition stream (enable and specify the speech grammar to be recognized)
    2. Modify the web UI in response to recognized speech events


    All new client-side code is in SamplePage.html

    Creating a Speech Recognition Engine

    In the constructor for SpeechStreamHandler you’ll see the following code:

    RecognizerInfo ri = GetKinectRecognizer();
    if (ri != null)
    {
        this.speechEngine = new SpeechRecognitionEngine(ri.Id);

        if (this.speechEngine != null)
        {
            // disable speech engine adaptation feature
            this.speechEngine.UpdateRecognizerSetting("AdaptationOn", 0);
            this.speechEngine.UpdateRecognizerSetting("PersistedBackgroundAdaptation", 0);
            this.speechEngine.AudioStateChanged += this.AudioStateChanged;
            this.speechEngine.SpeechRecognitionRejected += this.SpeechRecognitionRejected;
            this.speechEngine.SpeechRecognized += this.SpeechRecognized;
        }
    }

    This code snippet will be familiar if you’ve looked at some of our other speech samples such as SpeechBasics-WPF. Basically, we’re getting the metadata corresponding to the Kinect acoustic model (GetKinectRecognizer is hardcoded to use English language acoustic model in this sample, but this can be changed by installing additional language packs and modifying GetKinectRecognizer to look for the desired culture name), using it to create a speech engine, turning off some settings related to audio adaptation feature (which makes speech engine better suited for long-running scenarios) and registering to receive events when speech is recognized or rejected, or when audio state (e.g.: silence vs someone speaking) changes.

    Binding the Speech Recognition Engine to a Kinect Sensor’s Audio Stream

    In order to do this, we override SensorStreamHandlerBase’s implementation of OnSensorChanged, so we can find out about sensors connecting and disconnecting.

    public override void OnSensorChanged(KinectSensor newSensor)
    {
        base.OnSensorChanged(newSensor);
        if (this.sensor != null)
        {
            if (this.speechEngine != null)
            {
                this.StopRecognition();
                this.speechEngine.SetInputToNull();
                this.sensor.AudioSource.Stop();
            }
        }

        this.sensor = newSensor;
        if (newSensor != null)
        {
            if (this.speechEngine != null)
            {
                this.speechEngine.SetInputToAudioStream(
                    newSensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
                this.StartRecognition(this.grammar);
            }
        }
    }

    The main thing we need to do here is Start the AudioSource of the newly connected Kinect sensor in order to get an audio stream that we can hook up as the input to the speech engine. We also need to specify the format of the audio stream, which is a single-channel, 16-bits per sample, Pulse Code Modulation (PCM) stream, sampled at 16kHz.

    Allow Web Clients to Specify Speech Grammar

    We will let clients send us the whole speech grammar that they want recognized, as XML that conforms to the W3C Speech Recognition Grammar Specification format version 1.0. To do this, we will expose a configuration property called “grammarXml”.

    Let’s backtrack a little bit because earlier we glossed over the bit of code in the SpeechStreamHandler constructor where we register the handlers for getting and setting stream configuration properties:

    this.AddStreamConfiguration(SpeechEventCategory, new StreamConfiguration(this.GetProperties, this.SetProperty));

    Now, in the SetProperty method we call LoadGrammarXml method whenever a client sets the “grammarXml” property:

    case GrammarXmlPropertyName:
        this.LoadGrammarXml((string)propertyValue);
        break;

    And in the LoadGrammarXml method we do the real work of updating the speech grammar:

    private void LoadGrammarXml(string grammarXml)
    {
        this.StopRecognition();

        if (!string.IsNullOrEmpty(grammarXml))
        {
            using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(grammarXml)))
            {
                Grammar newGrammar;
                try
                {
                    newGrammar = new Grammar(memoryStream);
                }
                catch (ArgumentException e)
                {
                    throw new InvalidOperationException("Requested grammar might not contain a root rule", e);
                }
                catch (FormatException e)
                {
                    throw new InvalidOperationException("Requested grammar was specified with an invalid format", e);
                }

                this.StartRecognition(newGrammar);
            }
        }
    }

    We first stop the speech recognition because we don’t yet know if the specified grammar is going to be valid or not, then we try to create a new Microsoft.Speech.Recognition.Grammar object from the specified property value. If the property value does not represent a valid grammar, newGrammar variable will remain null. Finally, we call StartRecognition method, which loads the grammar into the speech engine (if grammar is valid), and tells the speech engine to start recognizing and keep recognizing speech phrases until we explicitly tell it to stop.

    private void StartRecognition(Grammar g)
    {
        if ((this.sensor != null) && (g != null))
        {
            this.speechEngine.LoadGrammar(g);
            this.speechEngine.RecognizeAsync(RecognizeMode.Multiple);
        }

        this.grammar = g;
    }

    Send Speech Recognition Events to Web Client

    When we created the speech recognition engine, we registered for 3 events: AudioStateChanged, SpeechRecognized and SpeechRecognitionRejected. Whenever any of these events happen we just want to forward the event to the web client. Since the code ends up being very similar, we will focus on the SpeechRecognized event handler:

    private async void SpeechRecognized(object sender, SpeechRecognizedEventArgs args)
    {
        var message = new RecognizedSpeechMessage(args);
        await this.ownerContext.SendEventMessageAsync(message);
    }

    To send messages to web clients we use functionality exposed by ownerContext, which is an instance of the SensorStreamHandlerContext class which was passed to us in the constructor. The messages are sent to clients using a web socket channel, and could be

    • Stream messages: Messages that are generated continuously, at a predictable rate (e.g.: 30 skeleton stream frames are generated every second), where the data from each message replaces the data from the previous message. If we drop one of these messages every so often there is no major consequence because another will arrive shortly thereafter with more up-to-date data, so the framework might decide to drop one of these messages if it detects a bottleneck in the web socket channel.
    • Event messages: Messages that are generated sporadically, at an unpredictable rate, where each event represents an isolated incident. As such, it is not desirable to drop any one of these kind of messages.


    Given the nature of speech recognition, we chose to communicate with clients using event messages. Specifically, we created the RecognizedSpeechMessage class, which is a subclass of EventMessage that serves as a representation of SpeechRecognizedEventArgs which can be easily serialized as JSON and follows the JavaScript naming conventions.

    You might have noticed the usage of the “async” and “await” keywords in this snippet. They are described in much more detail in MSDN but, in summary, they enable an asynchronous programming model so that long-running operations don’t block thread execution while not necessarily using more than one thread. The Kinect webserver uses a single thread to schedule tasks so the consequence for you is that ISensorStreamHandler implementations don’t need to be thread-safe, but should be aware of potential re-entrancy due to asynchronous behavior.

    Registering a Speech Stream Handler Factory with the Kinect Webserver

    The Kinect webserver can be started, stopped and restarted, and each time it is started it creates ISensorStreamHandler instances in a thread dedicated to Kinect data handling, which is the only thread that ever calls these objects. To facilitate this behavior, the server doesn’t allow for direct registration of ISensorStreamHandler instances and instead expects ISensorStreamHandlerFactory instances to be registered in KinectWebserver.SensorStreamHandlerFactories property.

    For the purposes of this sample, we declared a private factory class that is exposed as a static singleton instance directly from the SpeechStreamHandler class:

    public class SpeechStreamHandler : SensorStreamHandlerBase, IDisposable
    {
        ...

        static SpeechStreamHandler()
        {
            Factory = new SpeechStreamHandlerFactory();
        }
        ...

        public static ISensorStreamHandlerFactory Factory { get; private set; }
        ...

        private class SpeechStreamHandlerFactory : ISensorStreamHandlerFactory
        {
            public ISensorStreamHandler CreateHandler(SensorStreamHandlerContext context)
            {
                return new SpeechStreamHandler(context);
            }
        }
    }

    Finally, back in line 136 of MainWindow.xaml.cs, we replace the TODO comment mentioned above with

    // Add speech stream handler to the list of available handlers, so web client
    // can configure speech grammar and receive speech events
    this.webserver.SensorStreamHandlerFactories.Add(SpeechStreamHandler.Factory);

    Configure Speech Recognition Stream in Web Client

    The sample web client distributed with WebserverBasics-WPF is already configuring a couple of other streams in the function called updateUserState in SamplePage.html, so we will add the following code to this function:

    var speechGrammar = '\
    <grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" root="DefaultRule" xmlns="\' data-mce-href=">\'>\'>\'>\'>http://www.w3.org/2001/06/grammar">\
        <rule id="DefaultRule" scope="public">\
            <one-of>\
                <item>\
                    <tag>SHOW</tag>\
                    <one-of><item>Show Panel</item><item>Show</item></one-of>\
                </item>\
                <item>\
                    <tag>HIDE</tag>\
                    <one-of><item>Hide Panel</item><item>Hide</item></one-of>\
                </item>\
            </one-of>\
        </rule>\
    </grammar>';

    immediateConfig["speech"] = { "enabled": true, "grammarXml": speechGrammar };

    This code enables the speech stream and specifies a grammar that

    • triggers a recognition event with “SHOW” as semantic value whenever a user utters the phrases “Show” or “Show Panel”
    • triggers a recognition event with “HIDE” as semantic value whenever a user utters the phrases “Hide” or “Hide Panel”

    Modify the web UI in response to recognized speech events

    The sample web client already registers an event handler function, so we just need to update it to respond to speech events in addition to user state events:

    function onSpeechRecognized(recognizedArgs) {
        if (recognizedArgs.confidence > 0.7) {
            switch (recognizedArgs.semantics.value) {
                case "HIDE":
                    setChoosePanelVisibility(false);
                    break;
                case "SHOW":
                    setChoosePanelVisibility(true);
                    break;
            }
        }
    }
    ...

    sensor.addEventHandler(function (event) {
        switch (event.category) {
            ...            
            case "speech":
                switch (event.eventType) {
                    case "recognized":
                        onSpeechRecognized(event.recognized);
                        break;
                }
                break;
        }
    });

    Party Time!

    At this point you can rebuild the updated solution and run it to see the server UI. From this UI you can click on the link that reads “Open sample page in default browser” and play with the sample UI. It will look the same as before the code changes, but will respond to the speech phrases “Show”, “Show Panel”, “Hide” and “Hide Panel”. Now try changing the grammar to include more phrases and update the UI in different ways in response to speech events.

    Happy coding!

    Additional Resources

  • Kinect for Windows Product Blog

    Holiday shoppers got the Midas touch

    • 1 Comments

    Ever wonder what you’d look like drenched in gold? December shoppers in Manhattan were captivated by just such images when they paused before an innovative window display for the new men’s cologne Gold Jay Z at Macy’s flagship store in Herald Square. This innovative, engaging display, the creation of advertising agency kbs+ and interactive design firm Future Colossal, employed Kinect for Windows to capture images of window shoppers and flow liquid gold over their silhouettes.

    Window shoppers found it hard to resist creating a gold-clad avatar.

    The experience began when the Kinect for Windows sensor detected that a passer-by had engaged with the display, which showed liquid gold rippling and flowing across a high-resolution screen. The Kinect for Windows sensor then captured a 3D image of the shopper, which artfully emerged from the pool of flowing gold to appear as a silhouette draped in the precious metal. This golden avatar interactively followed the window shopper’s movements, creating a beautiful, sinuous tableau that pulled the passer-by into an immersive experience with the fragrance brand. The Kinect for Windows also provided the shopper a photo of his or her golden doppelganger and a hashtag for sharing it via social media.  

    Kinect for Windows Team

    Key links

  • Kinect for Windows Product Blog

    Using WebGL to Render Kinect Webserver Image Data

    • 1 Comments

    Our 1.8 release includes a sample called WebserverBasics-WPF that shows how HTML5 web applications can leverage our webserver component and JavaScript API to get data from Kinect sensors. Among other things, the client-side code demonstrates how to bind Kinect image streams (specifically, the user viewer and background removal streams) to HTML canvas elements in order to render the sequence of images as they arrive so that users see a video feed of Kinect data.

    In WebserverBasics-WPF, the images are processed by first sending the image data arriving from the server to a web worker thread, which then copies the data pixel-by-pixel into a canvas image data structure which can then be rendered via the canvas “2d” context. While this solution was adequate for our needs, being able to process more than the minimum required 30 frames per second of Kinect image data, we would start dropping image frames and get a kind of stutter in the video feed when we attempted to display data from more than one image stream (e.g.: user viewer + background removal streams) simultaneously. Even when displaying only one image stream at a time there was a noticeable load added to the computer's CPU. This reduced the amount of multitasking that the system could perform.

    Now that the latest version of every major browser supports the WebGL API we can get even better performance without requiring a dedicated background worker thread (that can occupy a full CPU core).  While you should definitely test on your own hardware, using WebGL gave me over 3x improvement in image fps capability—and I don’t even have a high-end GPU!
    Also, once we are using WebGL it is very easy to apply additional image processing or perform other kinds of tasks without adding latency or burdening the CPU. For example, we can use convolution kernels to perform edge detection to process this image
    Original

    and obtain this image:
    EdgeDetected

    So let’s send some Kinect data over to WebGL!

    Getting Started

    1. Make sure you have the Kinect for Windows v1.8 SDK and Toolkit installed
    2. Make sure you have a WebGL-compatible web browser installed
    3. Get the WebserverWebGL sample code, project and solution from CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (also available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).


    Note: The entirety of this post focuses on JavaScript code that runs on the browser client. Also, this post is meant to provide a quick overview on how to use WebGL functionality to render Kinect data. For a more comprehensive tutorial on WebGL itself you can go here or here.

    Encapsulating WebGL Functionality

    WebGL requires a non-trivial amount of setup code so, to avoid cluttering the main sample code in SamplePage.html, we defined a KinectWebGLHelper object constructor in KinectWebGLHelper.js file. This object exposes 3 functions:

    • bindStreamToCanvas(DOMString streamName, HTMLCanvasElement canvas)
      Binds the specified canvas element with the specified image stream.
      This function mirrors KinectUIAdapter.bindStreamToCanvas function, but uses canvas “webgl” context rather than the “2d” context.
    • unbindStreamFromCanvas(DOMString streamName)
      Unbinds the specified image stream from previously bound canvas element, if any.
      This function mirrors KinectUIAdapter.unbindStreamToCanvas function.
    • getMetadata(DOMString streamName)
      Allows clients to access the “webgl” context managed by KinectWebGLHelper object.


    The code modifications (relative to code in WebserverBasics-WPF) necessary to get code in SamplePage.html to start using this helper object are fairly minimal. We replaced

    uiAdapter.bindStreamToCanvas(
        Kinect.USERVIEWER_STREAM_NAME,
        userViewerCanvasElement);
    uiAdapter.bindStreamToCanvas(
        Kinect.BACKGROUNDREMOVAL_STREAM_NAME,
        backgroundRemovalCanvasElement);

    with

    glHelper = new KinectWebGLHelper(sensor); 
    glHelper.bindStreamToCanvas(
        Kinect.USERVIEWER_STREAM_NAME,
        userViewerCanvasElement);
    glHelper.bindStreamToCanvas(
        Kinect.BACKGROUNDREMOVAL_STREAM_NAME,
        backgroundRemovalCanvasElement);

    Additionally, we used the glHelper instance for general housekeeping such as clearing the canvas state whenever it’s supposed to become invisible.

    The KinectWebGLHelper further encapsulates the logic to actually set up and manipulate the WebGL context within an “ImageMetadata” object constructor.

    Setting up the WebGL context

    The first step is to get the webgl context and set up the clear color. For historical reasons, some browsers still use “experimental-webgl” rather than “webgl” as the WebGL context name:

    var contextAttributes = { premultipliedAlpha: true }; 
    var glContext = imageCanvas.getContext('webgl', contextAttributes) ||
        imageCanvas.getContext('experimental-webgl', contextAttributes);
    glContext.clearColor(0.0, 0.0, 0.0, 0.0); // Set clear color to black, fully transparent

    Defining a Vertex Shader

    Next we define a geometry to render plus a corresponding vertex shader. When rendering a 3D scene to a 2D screen the vertex shader would typically transform a 3D world-space coordinate into a 2D screen-space coordinate but, since we’re rendering 2D image data coming from a Kinect sensor into a 2D screen, we just need to define a rectangle using 2D coordinates and map the Kinect image onto this rectangle as a texture so the vertex shader ends up being pretty simple:

    // vertices representing entire viewport as two triangles which make up the whole 
    // rectangle, in post-projection/clipspace coordinates
    var VIEWPORT_VERTICES = new Float32Array([
        -1.0, -1.0,
        1.0, -1.0,
        -1.0, 1.0,
        -1.0, 1.0,
        1.0, -1.0,
        1.0, 1.0]);
    var NUM_VIEWPORT_VERTICES = VIEWPORT_VERTICES.length / 2;
    // Texture coordinates corresponding to each viewport vertex
    var VERTEX_TEXTURE_COORDS = new Float32Array([
        0.0, 1.0,
        1.0, 1.0,
        0.0, 0.0,
        0.0, 0.0,
        1.0, 1.0,
        1.0, 0.0]);

    var vertexShader = createShaderFromSource(
        glContext.VERTEX_SHADER,
    "\
    attribute vec2 aPosition;\
    attribute vec2 aTextureCoord;\
    \
    varying highp vec2 vTextureCoord;\
    \
    void main() {\
         gl_Position = vec4(aPosition, 0, 1);\
         vTextureCoord = aTextureCoord;\
    }");

    We specify the shader program as a literal string that gets compiled by the WebGL context. Note that you could instead choose to get the shader code from the server as a separate resource from a designated URI.
    We also need to let the shader know how to find its input data:

    var positionAttribute = glContext.getAttribLocation(
        program,
        "aPosition");
    glContext.enableVertexAttribArray(positionAttribute);

    var textureCoordAttribute = glContext.getAttribLocation(
        program,
        "aTextureCoord");
    glContext.enableVertexAttribArray(textureCoordAttribute);

    // Create a buffer used to represent whole set of viewport vertices
    var vertexBuffer = glContext.createBuffer();
    glContext.bindBuffer(
        glContext.ARRAY_BUFFER,
        vertexBuffer);
    glContext.bufferData(
        glContext.ARRAY_BUFFER,
       VIEWPORT_VERTICES,
        glContext.STATIC_DRAW);
    glContext.vertexAttribPointer(
        positionAttribute,
        2,
        glContext.FLOAT,
        false,
        0,
        0);

    // Create a buffer used to represent whole set of vertex texture coordinates
    var textureCoordinateBuffer = glContext.createBuffer();
    glContext.bindBuffer(
        glContext.ARRAY_BUFFER,
        textureCoordinateBuffer);
    glContext.bufferData(
        glContext.ARRAY_BUFFER,
        VERTEX_TEXTURE_COORDS,
        glContext.STATIC_DRAW);
    glContext.vertexAttribPointer(textureCoordAttribute,
        2,
        glContext.FLOAT,
        false,
        0,
        0);

    // Create a texture to contain images from Kinect server
    // Note: TEXTURE_MIN_FILTER, TEXTURE_WRAP_S and TEXTURE_WRAP_T parameters need to be set
    // so we can handle textures whose width and height are not a power of 2.
    var texture = glContext.createTexture();
    glContext.bindTexture(
        glContext.TEXTURE_2D,
        texture);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_MAG_FILTER,
        glContext.LINEAR);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_MIN_FILTER,
        glContext.LINEAR);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_WRAP_S,
        glContext.CLAMP_TO_EDGE);
    glContext.texParameteri(
        glContext.TEXTURE_2D,
        glContext.TEXTURE_WRAP_T,
        glContext.CLAMP_TO_EDGE);
    glContext.bindTexture(
        glContext.TEXTURE_2D,
        null);

    Defining a Fragment Shader

    Fragment Shaders (also known as Pixel Shaders) are used to compute the appropriate color for each geometry fragment. This is where we’ll sample color values from our texture and also apply the chosen convolution kernel to process the image.

    // Convolution kernel weights (blurring effect by default) 
    var CONVOLUTION_KERNEL_WEIGHTS = new Float32Array([
        1, 1, 1,
        1, 1, 1,
        1, 1, 1]);
    var TOTAL_WEIGHT = 0;
    for (var i = 0; i < CONVOLUTION_KERNEL_WEIGHTS.length; ++i) {
        TOTAL_WEIGHT += CONVOLUTION_KERNEL_WEIGHTS[i];
    }

    var fragmentShader = createShaderFromSource(
        glContext.FRAGMENT_SHADER,
        "\
        precision mediump float;\
        \
        varying highp vec2 vTextureCoord;\
        \
        uniform sampler2D uSampler;\
        uniform float uWeights[9];\
        uniform float uTotalWeight;\
        \
        /* Each sampled texture coordinate is 2 pixels appart rather than 1,\
        to make filter effects more noticeable. */ \
        const float xInc = 2.0/640.0;\
        const float yInc = 2.0/480.0;\
        const int numElements = 9;\
        const int numCols = 3;\
        \
        void main() {\
            vec4 centerColor = texture2D(uSampler, vTextureCoord);\
            vec4 totalColor = vec4(0,0,0,0);\
            \
            for (int i = 0; i < numElements; i++) {\
                int iRow = i / numCols;\
                int iCol = i - (numCols * iRow);\
                float xOff = float(iCol - 1) * xInc;\
                float yOff = float(iRow - 1) * yInc;\
                vec4 colorComponent = texture2D(\
                    uSampler,\
                    vec2(vTextureCoord.x+xOff, vTextureCoord.y+yOff));\
                totalColor += (uWeights[i] * colorComponent);\
            }\
            \
            float effectiveWeight = uTotalWeight;\
            if (uTotalWeight <= 0.0) {\
                effectiveWeight = 1.0;\
            }\
            /* Premultiply colors with alpha component for center pixel. */\
            gl_FragColor = vec4(\
                totalColor.rgb * centerColor.a / effectiveWeight,\
                centerColor.a);\
        
    }");

    Again, we specify the shader program as a literal string that gets compiled by the WebGL context and, again, we need to let the shader know how to find its input data:

    // Associate the uniform texture sampler with TEXTURE0 slot 
    var textureSamplerUniform = glContext.getUniformLocation(
        program,
        "uSampler");
    glContext.uniform1i(textureSamplerUniform, 0);

    // Since we're only using one single texture, we just make TEXTURE0 the active one
    // at all times
    glContext.activeTexture(glContext.TEXTURE0);

    Drawing Kinect Image Data to Canvas

    After getting the WebGL context ready to receive data from Kinect, we still need to let WebGL know whenever we have a new image to be rendered. So, every time that the KinectWebGLHelper object receives a valid image frame from the KinectSensor object, it calls the ImageMetadata.processImageData function, which looks like this:

    this.processImageData = function(imageBuffer, width, height) { 
        if ((width != metadata.width) || (height != metadata.height)) {
            // Whenever the image width or height changes, update tracked metadata and canvas
            // viewport dimensions.
            this.width = width;
            this.height = height;
            this.canvas.width = width;
            this.canvas.height = height;
            glContext.viewport(0, 0, width, height);
        }

        glContext.bindTexture(
            glContext.TEXTURE_2D,
            texture);
        glContext.texImage2D(
            glContext.TEXTURE_2D,
            0,
            glContext.RGBA,
            width,
            height,
            0,
            glContext.RGBA,
            glContext.UNSIGNED_BYTE,
            new Uint8Array(imageBuffer));

        glContext.drawArrays(
            glContext.TRIANGLES,
            0,
            NUM_VIEWPORT_VERTICES);
        glContext.bindTexture(
            glContext.TEXTURE_2D,
            null);
    };

    Customizing the Processing Effect Applied to Kinect Image

    You might have noticed while reading this post that the default value for CONVOLUTION_KERNEL_WEIGHTS provided by this WebGL sample maps to the following 3x3 convolution kernel:

    1 1 1
    1 1 1
    1 1 1

    and this corresponds to a blurring effect. The following table shows additional examples of effects that can be achieved using 3x3 convolution kernels:

        

    Effect

    Kernel Weights

    Resulting Image

    Original

    0 0 0
    0 1 0
    0 0 0
    Original_sample

    Blurring

    1 1 1
    1 1 1
    1 1 1
    Blurred_sample

    Sharpening

    0 -1 0
    -1 5 -1
    0 -1 0
    Sharpened_sample

    Edge Detection

    -1 0 1
    -2 0 2
    -1 0 1
    EdgeDetected_sample

    It is very easy to experiment with different weights of a 3x3 kernel to apply these and other effects by changing the CONVOLUTION_KERNEL_WEIGHTS coefficients and reloading application in browser. Other kernel sizes can also be supported by changing the fragment shader and its associated setup code.

    Summary

    The new WebserverWebGL sample is very similar in user experience to WebserverBasics-WPF, but the fact that it uses the WebGL API to leverage the power of your GPU means that your web applications can perform powerful kinds of Kinect data processing without burdening the CPU or adding latency to your user experience. We didn't add WebGL functionality previously because it was only recently that WebGL became supported in all major browsers. If you're not sure if your clients will have a WebGL-compatible browser but still want to guarantee they can display image-stream data, you should implement a hybrid approach that uses "webgl" canvas context when available and falls back to using "2d" context otherwise.

    Happy coding!

    Additional Resources

     

    Eddy Escardo-Raffo
    Senior Software Development Engineer
    Kinect for Windows

  • Kinect for Windows Product Blog

    Inside the Kinect for Windows SDK Update with Peter Zatloukal and Bob Heddle

    • 1 Comments

    Now that the updated Kinect for Windows SDK  is available for download, Engineering Manager Peter Zatloukal and Group Program Manager Bob Heddle sat down to discuss what this significant update means to developers.

    Bob Heddle demonstrates the new infrared functionality in the Kinect for Windows SDK 
    Bob Heddle demonstrates the new infrared functionality in the Kinect for Windows SDK.

    Why should developers care about this update to the Kinect for Windows Software Development Kit (SDK)?

    Bob: Because they can do more stuff and then deploy that stuff on multiple operating systems!

    Peter: In general, developers will like the Kinect for Windows SDK because it gives them what I believe is the best tool out there for building applications with gesture and voice.

    In the SDK update, you can do more things than you could before, there’s more documentation, plus there’s a specific sample called Basic Interactions that’s a follow-on to our Human Interface Guidelines (HIG). Human Interface Guidelines are a big investment of ours, and will continue to be. First we gave businesses and developers the HIG in May, and now we have this first sample, demonstrating an implementation of the HIG. With it, the Physical Interaction Zone (PhIZ) is exposed. The PhIZ is a component that maps a motion range to the screen size, allowing users to comfortably control the cursor on the screen.

    This sample is a bit hidden in the toolkit browser, but everyone should check it out. It embodies best practices that we described in the HIG and is can be re-purposed by developers easily and quickly.

    Bob: First we had the HIG, now we have this first sample. And it’s only going to get better. There will be more to come in the future.

    Why upgrade?

    Bob: There’s no downside to upgrading, so everyone should do it today! There are no breaking changes; it’s fully compatible with previous releases of the SDK, it gives you better operating support reach, there are a lot of new features, and it supports distribution in more countries with localized setup and license agreements. And, of course, China is now part of the equation.

    Peter: There are four basic reasons to use the Kinect for Windows SDK and to upgrade to the most recent version:

    • More sensor data are exposed in this release.
    • It’s easier to use than ever (more samples, more documentation).
    • There’s more operating system and tool support (including Windows 8, virtual machine support, Microsoft Visual Studio 2012, and Microsoft .NET Framework 4.5).
    • It supports distribution in more geographical locations. 

    What are your top three favorite features in the latest release of the SDK and why?

    Peter: If I must limit myself to three, then I’d say the HIG sample (Basic Interactions) is probably my favorite new thing. Secondly, there’s so much more documentation for developers. And last but not least…infrared! I’ve been dying for infrared since the beginning. What do you expect? I’m a developer. Now I can see in the dark!

    Bob: My three would be extended-range depth data, color camera settings, and Windows 8 support. Why wouldn’t you want to have the ability to develop for Windows 8? And by giving access to the depth data, we’re giving developers the ability to see beyond 4 meters. Sure, the data out at that range isn’t always pretty, but we’ve taken the guardrails off—we’re letting you go off-roading. Go for it!

    New extended-range depth data now provides details beyond 4 meters. These images show the difference between depth data gathered from previous SDKs (left) versus the updated SDK (right). 
    New extended-range depth data now provides details beyond 4 meters. These images show the difference between depth data gathered from previous SDKs (left) versus the updated SDK (right).

    Peter: Oh yeah, and regarding camera settings, in case it isn’t obvious: this is for those people who want to tune their apps specifically to known environments.

    What's it like working together?

    Peter: Bob is one of the most technically capable program managers (PMs) I have had the privilege of working with.

    Bob: We have worked together for so long—over a decade and in three different companies—so there is a natural trust in each other and our abilities. When you are lucky to have that, you don’t have to spend energy and time figuring out how to work together. Instead, you can focus on getting things done. This leaves us more time to really think about the customer rather than the division of labor.

    Peter: My team is organized by the areas of technical affinity. I have developers focused on:

    • SDK runtime
    • Computer vision/machine learning
    • Drivers and low-level subsystems
    • Audio
    • Samples and tools

    Bob: We have a unique approach to the way we organize our teams: I take a very scenario-driven approach, while Peter takes a technically focused approach. My team is organized into PMs who look holistically across what end users need, versus what commercial customers need, versus what developers need.

    Peter: We organize this way intentionally and we believe it’s a best practice that allows us to iterate quickly and successfully!

    What was the process you and your teams went through to determine what this SDK release would include, and who is this SDK for?

    Bob: This SDK is for every Kinect for Windows developer and anyone who wants to develop with voice and gesture. Seriously, if you’re already using a previous version, there is really no reason not to upgrade. You might have noticed that we gave developers a first version of the SDK in February, then a significant update in May, and now this release. We have designed Kinect for Windows around rapid updates to the SDK; as we roll out new functionality, we test our backwards compatibility very thoroughly, and we ensure no breaking changes.

    We are wholeheartedly dedicated to Kinect for Windows. And we’re invested in continuing to release updated iterations of the SDK rapidly for our business and developer customers. I hope the community recognizes that we’re making the SDK easier and easier to use over time and are really listening to their feedback.

    Peter Zatloukal, Engineering Manager
    Bob Heddle, Group Program Manager
    Kinect for Windows

    Related Links

  • Kinect for Windows Product Blog

    Kinect for Windows Wins Innovation of the Year

    • 1 Comments

    Last week, winners of GeekWire’s fourth annual Seattle 2.0 Startup Awards were announced. Seattle has a vibrant startup community, and this is a very popular event within that community. I attended the awards ceremony, and it was amazCraig Eisler, General Manager of Kinect for Windows accepting the GeekWire Seattle 2.0 StartUp Award for "Innovation of the Year"ing to see how much energy and excitement was in the room – and the EMP (Experience Music Project) where the event was held was packed.

    There were entrepreneurs of all types and levels of experience (I even ran into Ray Ozzie!), venture capitalists, CEOs, and so much IQ and passion in one place it was a rush – the same rush I feel every Monday when I spend time at the Kinect Accelerator in Microsoft’s Westlake office.

    We were thrilled when Kinect for Windows was named “Innovation of the Year” – not because we won (which is great!), but because it was the popular vote of the startup community. This is the same community that is continuing to deliver so many amazingly different ideas and products that have Kinect for Windows at the center. In so many ways, Kinect for Windows is “Innovation of the Year” because of the innovators who are using it: Thank you.

    Learn more about all of the winners of the Seattle 2.0 Startup Awards. And thanks to all of you who voted for Kinect for Windows!

    Craig Eisler
    General Manager, Kinect for Windows

  • Kinect for Windows Product Blog

    Mattel Barbie Dream Closet engages new generation of fans and reinforces 60-year-old brand

    • 1 Comments

    Mattel Barbie Dream ClosetMuch like Build-A-Bear Workshop, Mattel has been watching the trends and finding that children are embracing digital media. How can the company keep a toy like the Barbie doll, launched in 1959, relevant in a world where tablet computers and smartphones dominate kids' wishlists?

    Once again, Kinect for Windows has proved a formidable ally in bridging the gap between digital entertainment and traditional toys. A six-month project for Mattel, Gun Communications and creative applications developer Adapptor built Barbie: the Dream Closet, which lets enthusiasts of all ages across Australia virtually try on a variety Barbie outfits from different decades by using a Kinect for Windows-enabled "magic mirror." Have you ever wondered what you’d look like in one of Barbie's ball gowns, or as an astronaut, or a race car driver? With the Dream Closet, it's possible. Additionally, you can save and share photos over social media, or even take a photo home.

    To build the application, each outfit was photographed on a Barbie doll, trimmed into its component parts, and then reconstructred dynamically on Barbie fans by the custom Dream Closet application, built in Microsoft XNA. The Kinect for Windows sensor and software development kit (SDK) make it easy to accurately determine the size of the user so the virtual clothes and selection menus can be fitted to match.

    "If we would have had to write code from the ground up [versus using code provided in the SDK], it would have taken much longer, and the end result wouldn’t have been nearly as impactful," said Adapptor Managing Director Mark Loveridge. "The Kinect for Windows SDK doubled our development speed."

    The result of Barbie: the Dream Closet? Increased customer brand loyalty and media coverage yielding more than 25 million impressions, a new case study reports.

    "The impact of Kinect for Windows on the public and the Barbie brand is incredible," notes Mattel Marketing Director Amanda Allegos. "Kinect for Windows has given us a new way to reach existing Barbie fans and attract new ones in a way that's contemporary, interactive, and bridges both the digital and physical worlds."

    Kinect for Windows team

    Key Links

Page 5 of 10 (91 items) «34567»