• Kinect for Windows Product Blog

    Jintronix makes rehabilitation more convenient, fun, and affordable with Kinect for Windows


    A stroke can be a devastating experience, leaving the patient with serious physical impairments and beset by concerns for the future. Today, that future is much brighter, as stroke rehabilitation has made enormous strides. Now, Jintronix offers a significant advance to help stroke patients restore their physical functions: an affordable motion-capture system for physical rehabilitation that uses Microsoft Kinect for Windows.

    Jintronix offers a significant advance to help stroke patients restore their physical functions
    The folks at Montreal- and Seattle-based Jintronix are tackling three major issues related to rehabilitation. First, and most importantly, they are working to improve patients’ compliance with their rehabilitation regimen, since up to 65% of patients fail to adhere fully—or at all—with their programs.[1] In addition, they are addressing the lack of accessibility and the high cost associated with rehabilitation. If you have just had a stroke, even getting to the clinic is a challenge, and the cost of hiring a private physical therapist to come to your home is too high for most people.

    Consider Jane, a 57-year-old patient. After experiencing a stroke eight months ago, she now has difficulty moving the entire right side of her body. Like most stroke victims, Jane faces one to three weekly therapy sessions for up to two years. Unable to drive, she depends on her daughter to get her to these sessions; unable to work, she worries about the $100 fee per visit, as she has exhausted her insurance coverage. If that weren’t enough, Jane also must exercise for hours daily just to maintain her mobility. Unfortunately, these exercises are very repetitive, and Jane finds it difficult to motivate herself to do them. 

    Jintronix tackles all of these issues by providing patients with fun, “gamified” exercises that accelerate recovery and increase adherence. In addition, Jintronix gives patients immediate feedback, which ensures that they perform their movements correctly. This is critical when the patient is exercising at home.

    For clinicians and insurers, Jintronix monitors and collects data remotely to measure compliance and provides critical information on how to customize the patient’s regimen. Thus patients can conveniently and consistently get treatment between clinic visits, from the comfort of their own homes, with results transmitted directly to their therapist. This has been shown to be an effective method for delivering care, and for people living in remote areas, this type of tele-rehabilitation has the potential to be a real game changer.[2] Moreover, a growing shortage of trained clinicians—the shortfall in the United States was estimated to be 13,500 in 2013 and is expected to grow to 31,000 by 2016—means that more and more patients will be reliant on home rehab[3].

    Motion capture lies at the heart of Jintronix. The first-generation Kinect for Windows camera can track 20 points on the body with no need for the patient to wear physical sensors, enabling Jintronix to track the patient’s position in three-dimensional space at 30 frames per second. Behind the scenes, Jintronix uses the data captured by the sensor to track such metrics as the speed and fluidity of patients’ movement. It also records patients’ compensation patterns, such as leaning the trunk forward to reach an object instead of extending the arm normally.

    Jintronix then uses this data to place patients in an interactive game environment that’s built around rehabilitation exercises. For example, in the game Fish Frenzy, the patient's hand controls the movement of an on-screen fish, moving it to capture food objects that are placed around the screen in a specific therapeutic pattern, like a rectangle or a figure eight.

    There are other rehab systems out there that use motion capture, but they often require sensor gloves or other proprietary hardware that take a lot of training and supervision to use, or they depend on rigging an entire room with expensive cameras or placing lots of sensors on the body. “Thanks to Kinect for Windows, Jintronix doesn’t require any extra hardware, cameras, or body sensors, which keeps the price affordable,” says Shawn Errunza, CEO of Jintronix. “That low price point is extremely important,” notes Errunza, “as we want to see our system in the home of every patient who needs neurological and orthopedic rehab.”

    Jintronix developed the system by working closely with leaders in the field of physical rehabilitation, such as Dr. Mindy Levin, professor of physical and occupational therapy at McGill University. With strong support both on the research and clinical sides, the company designed a system that can serve a variety of patients in addition to post-stroke victims—good news for the nearly 36 million individuals suffering from physical disabilities in the United States[4].

    What’s more, Jintronix is a potential boon to the elderly, as it has been shown that seniors can reduce the risk of injury due to falls by 35% by following specific exercise programs.  Unfortunately, most home rehab regimens fail to engage such patients. A recent study of elderly patients found that less than 10 percent reported doing their prescribed physical therapy exercises five days a week (which is considered full adherence), and more than a third reported zero days of compliance.

    Jintronix is currently in closed beta testing in five countries, involving more than 150 patients at 60 clinics and hospitals, including DaVinci Physical Therapy in the Seattle area and the Gingras Lindsay Rehabilitation Hospital in Montreal. According to Errunza, “preliminary results show that the fun factor of our activities has a tangible effect on patients’ motivation to stay engaged in their therapy.”

    Jintronix is working to remove all the major barriers to physical rehabilitation by making a system that is fun, simple to use, and affordable. Jintronix demonstrates the potential of natural user interfaces (NUI) to make technology simpler and more effective—and the ability of Kinect for Windows to help high tech meet essential human needs.

    The Kinect for Windows Team

    Key links


    1 http://physiotherapy.org.nz/assets/Professional-dev/Journal/2003-July/July03commentary.pdf

    2 http://www.ncbi.nlm.nih.gov/pubmed/23319181

    3 http://www.apta.org/PTinMotion/NewsNow/2013/10/21/PTDemand/

    4 http://ptjournal.apta.org/content/86/3/401.full

  • Kinect for Windows Product Blog

    What's Up with CodePlex?


    Last week we announced the release of the source code of 22 Kinect for Windows sample applications.  The developer response has been terrific and much larger than we expected.

    Some publications claimed we had open sourced “all of the code for Kinect” or the “core code of Kinect”.  Neither of these is true.  We released source code for most of our sample applications.  It’s important to understand that sample code is not the same as core code.  The purpose of the sample applications is to give developers examples of how to use particular APIs and/or to give a good starting point for a new application.  Samples do use the core APIs and Kinect for Windows platform but we have not changed anything about the licensing of those underlying components.

    The samples we’ve released show how to do things like get raw infrared data from the sensor, build an interactive kiosk that changes content when a person is detected, and track a person’s facial movements.  The samples are one of the many areas in which we are investing to make it easy for new and seasoned developers alike to build applications using Kinect for Windows.

    It was not our intention for our announcement to be misinterpreted.  It’s evident in the comments of many posts that readers understood the distinction.  It’s also been great to see the debate & discussions (I’m looking at you, Reddit :-).

    We are following up with some publications to clarify our announcement and to request they update their posts.



    @benlower | kinectninja@microsoft.com | mobile:  +1 (206) 659-NINJA (6465)

    Kinect for Windows:  @KinectWindows

  • Kinect for Windows Product Blog

    Kinect Accelerator Program Seeking Innovators


    In March, ten startups will converge on Seattle to start developing commercial and gaming applications that utilize Kinect's innovative natural user interface (NUI). As part of the Microsoft Kinect Accelerator program, they will have three months and a wealth of resources—including access to Microsoft and industry mentors—to develop, and then present their applications to angel investors, venture capitalists, Microsoft executives, media, and influential industry leaders.

    Since launching in late November, the Kinect Accelerator has received hundreds of applications from over forty countries, proposing transformative, creative innovations for healthcare, fitness, retail, training/simulation, automotive, scientific research, manufacturing, and much more.

    Applications are still being accepted, and the Kinect Accelerator team encourages you to apply. Learn more about the application process.

    The Kinect Accelerator program is powered by TechStars, one of the most respected technology accelerator programs in the world.  Microsoft is working with TechStars to leverage the absolute best startup accelerator methodologies, mentors, and visibility.  If you are considering building a business based on the capabilities of Kinect, this is a great opportunity for you.

    Dave Drach, Managing Director, Microsoft Emerging Business Team, explains that the Kinect Accelerator program is looking for creative startups that have a passion for driving the next generation of computing. “Starting in the spring of 2012, they will have three months to bring their ideas to life. What will emerge will be applications and business scenarios that we’ve not seen before,” comments Drach.

    Read more about the Kinect Accelerator program.

    Kinect for Windows team

  • Kinect for Windows Product Blog

    Microsoft’s Kinect Accelerator Begins Today


    I am pleased to announce that the finalists for our Kinect Accelerator have arrived in ever-sunny Seattle and today are launching into a three-month program to build new products and business using Kinect. I can’t wait to see what they come up with – using Kinect, these teams have the ability to reimagine the way products are used, and perhaps even revolutionize entire industries along the way.

    Kinect Accelerator is powered by TechStars, in close collaboration with the Microsoft BizSpark program; my team and I have been working closely with the BizSpark team and others in the Interactive Entertainment Business to help develop and bring this program to life. The response to the Kinect Accelerator has been phenomenal and we expect to see remarkable innovation coming out of the program.

    Craig Eisler and other executives from Microsoft and TechStars met in February to review program applications.We were hoping to receive 100 to 150 applications, with a goal of selecting the best ten. But the worldwide entrepreneurial community completely surprised us by submitting almost five hundred applications with concepts spanning nearly 20 different industries, including healthcare, education, retail, entertainment, and more. 

    There were so many clever and innovative ideas and so many great teams it was super challenging to narrow things down – we spent many, many hours in a rigorous and highly energetic review process. We finally landed on 11 finalists from five countries, chosen based on their experience, qualifications, and the potential benefit that could result from their Kinect Accelerator.  The finalists are: 

    • Freak'n Genius – Seattle, WA
    • GestSure Technologies – Toronto, Canada
    • IKKOS – Seattle, WA
    • Kimetric – Buenos Aires, Argentina
    • Jintronix Inc.  – Montreal, Canada
    • Manctl – Lyon, France
    • NConnex – Hadley, MA
    • Styku - Los Angeles, CA
    • übi interactive – Munich, Germany
    • VOXON – New York, NY
    • Zebcare – Boston, MA

    The Kinect Accelerator will be held in Microsoft’s state of the art facility in Seattle’s vibrant South Lake Union neighborhood.Each team will be mentored by entrepreneurs and venture capitalists as well as leaders from Kinect for Windows, Xbox, Microsoft Studios, Microsoft Research and other Microsoft organizations. The teams will spend the first several weeks ideating and refining their business concepts with input and advice from their mentors, followed by several weeks of design and development.  They will present their results at an event at the end of June.

    We were so amazed by the quality, caliber, and uniqueness of the applications and teams that we decided to reward the top 100 applicants that didn’t make it into the program with a complimentary Kinect for Windows sensor. I believe we are going to see great things from many of the folks that applied to the program and we wish them all the best.

    We will share more information about the Kinect Accelerator teams and their applications on this blog in coming months. And for more information on the Kinect Accelerator program in general, go to KinectAccelerator.com.

    Craig Eisler
    General Manager, Kinect for Windows

  • Kinect for Windows Product Blog

    Kinect for Windows Helps Girls Everywhere Dress Like Barbie


    I grew up in the UK and my female cousins all had Barbie. In fact Barbies – they had lots of Barbie dolls and ton of accessories that they were obsessed with. I was more of a BMX kind of kid and thought my days of Barbie education were long behind me, but with a young daughter I’m beginning to realize that I have plenty more Barbie ahead of me, littered around the house like landmines. This time around though, I’m genuinely interested thanks to a Kinect-enabled application. The outfits from Barbie the Dream Closet not only scale to fit users, but enable them to turn sideways to see how they look from various angles.

    This week, Barbie lovers in Sydney, Australia, are being given the chance to do more than fanaticize how they’d look in their favorite Barbie outfit. Thanks to Mattel, Gun Communications, Adapptor, and Kinect for Windows, Barbie The Dream Closet is here.

    The application invites users to take a walk down memory lane and select from 50 years of Barbie fashions. Standing in front of Barbie’s life-sized augmented reality “mirror,” fans can choose from several outfits in her digital wardrobe—virtually trying them on for size.

    The solution, built with the Kinect for Windows SDK and using the Kinect for Windows sensor, tracks users’ movements and gestures enabling them to easily browse through the closet and select outfits that strike their fancy. Once an outfit is selected, the Kinect for Windows skeletal tracking determines the position and orientation of the user. The application then rescales Barbie’s clothes, rendering them over the user in real time for a custom fit.

    One of the most interesting aspects of this solution is the technology’s ability to scale - with menus, navigation controls and clothing all dynamically adapting so that everyone from a little girl to a grown woman (and cough, yes, even a committed father) can enjoy the experience. To facilitate these advancements, each outfit was photographed on a Barbie doll, cut into multiple parts, and then built individually via the application. 

    Of course, the experience wouldn’t be complete without the ability to memorialize it. A photo is taken and, with approval/consent from those photographed, is uploaded and displayed in a gallery on the Barbie Australian Facebook page. (Grandparents can join in the fun from afar!)

    I spoke with Sarah  Sproule, Director, Gun Communications about the genesis of the idea who told me, We started working on Barbie The Dream Closet six months ago, working with our development partner Adapptor. Everyone has been impressed by the flexibility, and innovation Microsoft has poured into Kinect for Windows. Kinect technology has provided Barbie with a rich and exciting initiativBarbie enthusiasts of all ages can enjoy trying on and posing in outfits.e that's proving to delight fans of all ages. We're thrilled with the result, as is Mattel - our client."

    Barbie’s Dream Closet, was opened to the public at the Westfield Parramatta in Sydney  today and will be there through April 15. Its first day, it drew enthusiastic crowds, with around 100 people experiencing Barbie The Dream Closet. It's expected to draw even larger crowds over the holidays. It’s set to be in Melbourne and Brisbane later this year.

     Meantime, the Kinect for Windows team is just as excited about it as my daughter:

    “The first time I saw Barbie’s Dream Closet in action, I knew it would strike a chord,” notes Kinect for Windows Communications Manager, Heather Mitchell. “It’s such a playful, creative use of the technology. I remember fanaticizing about wearing Barbie’s clothes when I was a little girl. Disco Ken was a huge hit in my household back then…Who didn’t want to match his dance moves with their own life-sized Barbie disco dress? I think tens of thousands of grown girls have been waiting for this experience for years…Feels like a natural.”

    That’s the beauty of Kinect – it enables amazingly natural interactions with technology and hundreds of companies are out there building amazing things; we can’t wait to see what they continue to invent.

    Steve Clayton
    Editor, Next at Microsoft

  • Kinect for Windows Product Blog

    Kinect for Windows Technologies: Boxing Robots to Limitless Possibilities


    Most developers, including myself, are natural tinkerers. We hear of a new technology and want to try it out, exploring what it can do, dream up interesting uses, and pushing the limits of what’s possible. Most recently, the Channel 9 team incorporated Kinect for Windows into two projects: BoxingBots, and Project Detroit.Life-sized BoxingBots are controlled by Kinect for Windows technologies

    The life-sized BoxingBots made their debut in early March at SXSW in Austin, Texas. Each robot is equipped with an on-board computer, which receives commands from two Kinect for Windows sensors and computers. The robots are controlled by two individuals whose movements  – punching, rotating, stepping forward and backwards – are interpreted by and relayed back to the robots, who in turn, slug it out, until one is struck and its pneumatic-controlled head springs up.

    The use of Kinect for Windows for telepresence applications, like controlling a robot or other mechanical device, opens up a number of interesting possibilities. Imagine a police officer using gestures and word commands to remotely control a robot, exploring a building that may contain explosives. In the same vein, Kinect telepresence applications using robots could be used in the manufacturing, medical, and transportation industries.

    Project Detroit’s front and rear Kinect cameras transmit   a live video feed of surrounding pedestrians and objects directly to the interior dashboard displays.Project Detroit asked the question, what do you get when you combine the world’s most innovative technology with a classic American car? The answer is a 2012 Ford Mustang with a 1967 fastback replica body, and everything from Windows Phone integration to built-in WiFI, Viper SmartStart security system, cloud services, augmented reality, Ford SYNC, Xbox-enabled entertainment system, Windows 8 Slate, and Kinect for Windows cameras built into the tail and headlights.

    One of the key features we built for Project Detroit was the ability to read Kinect data including a video stream, depth data, skeletal joint data, and audio streams over the network using sockets (available here as an open source project). These capabilites could make it possible to receive an alert on your phone when someone gets too close to your car. You could then switch to a live video/audio stream, via a network from the Kinect, to see what they were doing. Using your phone, you could talk to them, asking  politely that they “look, but not touch.”   

    While these technologies may not show up in production cars in the coming months (or years), Kinect for Windows technologies are suited for use in cars for seeing objects such as pedestrians and cyclists behind and in front of vehicles, making it easier to ease into tight parking spots, and enabling built-in electronic devices with the wave of a hand or voice commands.

    It’s an exciting time to not only be a developer, but a business, organization or consumer who will have the opportunity to benefit from the evolving uses and limitless possibilities of the Kinect for Windows natural user interface. 

    Dan Fernandez
    Senior Director, Microsoft Channel 9

  • Kinect for Windows Product Blog

    Partners Deliver Custom Solutions that Use Kinect for Windows


    Kinect for Windows demos at Microsoft Worldwide Partner Conference

    Kinect for Windows partners are finding new business opportunities by helping to develop new custom applications and ready-made solutions for various commercial customers, such as the Coca-Cola Company, and vertical markets, including the health care industry.

    Several of these solutions were on display at the Microsoft Worldwide Partner Conference (WPC) in Toronto, Canada, where Kinect for Windows took the stage with two amazing demos as well as strong booth showings at the Solutions Innovation Center.

    "Being part of the WPC 2012 event was a great opportunity to showcase our Kinect-based 3-D scanner, and the response was incredibly awesome, both on stage when the audience would spontaneously clap and cheer in the middle of the scan, and in the Kinect for Windows trade show area where people would stand in line to get scanned," said Nicolas Tisserand, co-founder of the France-based Manctl, one of the 11 companies in the Microsoft Accelerator for Kinect program.

    Manctl's Skanect scanner software uses the Kinect sensor to build high quality 3-D digital models of people and objects, which can be sent to a 3-D printer to create detailed plastic extruded sculptures. "Kinect for Windows is a fantastic device, capable of so much more than just game control. It's making depth sensing a commodity," Tisserand added.

    A demo from übi interactive in Germany uses the Kinect sensor to turn virtually any surface into a 3-D touchscreen that can control interfaces, apps, and games. "Kinect for Windows is a great piece of hardware and it works perfect[ly] with our software stack," reported übi co-founder David Hajizadeh. "As off-the-shelf hardware, it massively reduced our costs and we see lots of opportunities for business applications that offer huge value for our customers."

    Snibbe Interactive created its SocialMirror Coke Kiosk to deliver a Kinect-based game in which players aim a stream of soda into a glass and then share videos of the experience with their social networks. "We were extremely excited to show off our unique Coca-Cola branded interactive experience and its unique ability to create instant ROI [return on investment] through our viral marketing component," reported Alan Shimoide, director of engineering at Snibbe.

    InterKnowlogy developed KinectHealth to assist doctors with motion-controlled access to patient records and surgery planning tools. "A true game changer, Kinect for Windows allows our designers and developers to think differently about business cases across many verticals," noted Kevin Custer, the director of strategic marketing and partnerships at InterKnowlogy. "Kinect for Windows is not just how we interact with computers, but it offers unique ways to add gesture and voice to our natural user-interface designed software—the combination of which is changing lives of customers and users alike."
    "Avanade has already delivered several innovative solutions using Kinect, and we expect that demand to keep growing," said Ben Reierson, innovation manager at Avanade, whose Kinect for Virtual Healthcare includes video chat for connecting clinics to remote doctors for online appointments. "Customers and partners are clearly getting more serious about the possibilities of Kinect and natural user interfaces."

    Kinect for Windows Team

    Key Links

  • Kinect for Windows Product Blog

    Monsters Come to Life with Kinect for Windows


    A demon dog robot under constructionIt all started with a couple of kids and a remarkable idea, which eventually spawned two terrifying demon dogs and their master. This concept is transforming the haunt industry and could eventually change how theme parks and other entertainment businesses approach animated mechanical electronics (animatronics).
    Here's the behind-the-scenes story of how this all came to be:

    The boys, 6-year-old Mark and 10-year-old Jack, fell in love with Travel Channel's Making Monsters, a TV program that chronicles the creation of lifelike animatronic creatures. After seeing their dad's work with Kinect for Windows at the Minneapolis-based Microsoft Technology Center, they connected the dots and dreamed up the concept: wouldn't it be awesome if Dad could use his expertise with the Kinect for Windows motion sensor to make better and scarier monsters?

    So “Dad”—Microsoft developer and technical architect Todd Van Nurden—sent an email to Distortions Unlimited in Greeley, Colorado, offering praise of their work sculpting monsters out of clay and adjustable metal armatures. He also threw in his boys' suggestion on how they might take things to the next level with Kinect for Windows: Imagine how much cooler and more realistic these monsters could be if they had the ability to see you, hear you, anticipate your behavior, and respond to it. Imagine what it means to this industry now that monster makers can take advantage of the Kinect for Windows gesture and voice capabilities.

    Two months passed. Then one day, Todd received a voice mail message from Distortions CEO Ed Edmunds expressing interest. The result: nine months of off-and-on work, culminating with the debut of a Making Monsters episode detailing the project on Travel Channel earlier today, October 21 (check local listings for show times, including repeat airings). The full demonic installation can also be experienced firsthand at The 13th Floor haunted house in Denver, Colorado, now through November 10.

    To get things started, Distortions sent Van Nurden maquettes—scale models about one-quarter of the final size—to build prototypes of two demon dogs and their demon master. Van Nurden worked with Parker, a company that specializes in robotics, to develop movement by using random path manipulation that is more fluid than your typical robot and also is reactive and only loosely scripted. The maquettes were wired to Kinect for Windows with skeletal tracking, audio tracking, and voice control functionality as a proof of concept to suggest a menu of possible options.

    Distortions was impressed. "Ed saw everything it could do and said, 'I want all of them. We need to blow this out’," recalled Van Nurden.

    Todd Van Nurden prepares to install the Kinect for Windows sensor in the demon's belt 
    Todd Van Nurden prepares to install the Kinect for Windows sensor in the demon's belt

    The full-sized dogs are four feet high, while the demon master stands nearly 14 feet. A Kinect for Windows sensor connected to a ruggedized Lenovo M92 workstation is embedded in the demon's belt and, after interpreting tracking data, sends commands to control itself and the dogs via wired Ethernet. Custom software, built by using the Kinect for Windows SDK, provides the operators with a drag-and-drop interface for laying out character placement and other configurable settings. It also provides a top-down view for the attraction's operator, displaying where the guests are and how the creatures are tracking them.

    "We used a less common approach to processing the data as we leveraged the Reactive Extensions for .NET to basically set up push-based Linq subscriptions," Van Nurden revealed. "The drag-and-drop features enable the operator to control the place-space configuration, as well as when certain behaviors begin. We used most of the Kinect for Windows SDK managed API with the exception of raw depth data."

    The dogs are programmed to react very differently if approached by an adult (which might elicit a bark or growl) versus a child (which could prompt a fast pant or soft whimper). Scratching behind a hound's ears provokes a "happy dog" response—assuming you can overcome your fear and get close enough to actually touch one! Each action or mood includes its own set of kinesthetic actions and vocal cues. The sensor quietly tracks groups of people, alternating between a loose tracking algorithm that can calculate relative height quickly when figures are further away and full skeletal tracking when someone approaches a dog or demon, requiring more detailed data to drive the beasts' reactions.

    The end product was so delightfully scary that Van Nurden had to reassure his own sons when they were faced with a life-sized working model of one of the dogs. "I programmed him, he's not going to hurt you," he comforted them.

    Fortunately, it is possible to become the demons' master. If you perform a secret voice and movement sequence, they will actually bow to you.

    Lisa Tanzer, executive producer for Making Monsters, has been following creature creation for two years while shooting the show at Distortions Unlimited. She was impressed by how much more effective the Kinect for Windows interactivity is than the traditional looped audio and fully scripted movements of regular animatronics: "Making the monsters themselves is the same process—you take clay, sculpt it over an armature, mold it, paint it, all the same steps," she said. "The thing that made this project Distortions did for 13th Floor so incredible and fascinating was the Kinect for Windows technology.”

    "It can be really scary," Tanzer reported. "The dogs and demon creature key into people and actually track them around the room. The dog turns, looks at you and whimpers; you go 'Oh, wow, is this thing going to get me?' It's just like a human actor latching on to somebody in a haunted house but there's no human, only this incredible technology.”

    "Incorporating Kinect for Windows into monster making is very new to the haunt industry," she added. "In terms of the entertainment industry, it's a huge deal. I think it's a really cool illustration of where things are going."

    Kinect for Windows team

    Key Links

  • Kinect for Windows Product Blog

    Mysteries of Kinect for Windows Face Tracking output explained


    Since the release of Kinect for Windows version 1.5, developers have been able to use the Face Tracking software development kit (SDK) to create applications that can track human faces in real time. Figure 1, an illustration from the Face Tracking documentation, displays 87 of the points used to track the face. Thirteen points are not illustrated here—more on those points later.

    Figure 1: Tracked Points
    Figure 1: Tracked Points

    You have questions...

    Based on feedback we received via comments and forum posts, it is clear there is some confusion regarding the face tracking points and the data values found when using the SDK sample code. The managed sample, FaceTrackingBasics-WPF, demonstrates how to visualize mesh data by displaying a 3D model representation on top of the color camera image.

    MeshModel - Copy
    Figure 2: Screenshot from FaceTrackingBasics-WPF

    By exploring this sample source code, you will find a set of helper functions under the Microsoft.Kinect.Toolkit.FaceTracking project, in particular GetProjected3DShape(). What many have found was the function returned a collection where the length of the array was 121 values. Additionally, some have also found an enum list, called “FeaturePoint”, that includes 70 items.

    We have answers...

    As you can see, we have two main sets of numbers that don't seem to add up. This is because these are two sets of values that are provided by the SDK:

    1. 3D Shape Points (mesh representation of the face): 121
    2. Tracked Points: 87 + 13

    The 3D Shape Points (121 of them) are the mesh vertices that make a 3D face model based on the Candide-3 wireframe.

    Figure 3: image from http://www.icg.isy.liu.se/candide/img/candide3_rot128.gif
    Figure 3: Wireframe of the Candide-3 model http://www.icg.isy.liu.se/candide/img/candide3_rot128.gif

    These vertices are morphed by the FaceTracking APIs to align with the face. The GetProjected3DShape method returns the vertices as an array of  Vector3DF[]. These values can be enumerated by name using the "FeaturePoint" list. For example, TopSkull, LeftCornerMouth, or OuterTopRightPupil. Figure 4 shows these values superimposed on top of the color frame. 

    Figure 4: Feature Point index mapped on mesh model

    To get the 100 tracked points mentioned above, we need to dive more deeply into the APIs. The managed APIs, provide an FtInterop.cs file that defines an interface, IFTResult, which contains a Get2DShapePoints function. FtInterop is a wrapper for the native library that exposes its functionality to managed languages. Users of the unmanaged C++ API may have already seen this and figured it out. Get2DShapePoints is the function that will provide the 100 tracked points.

    If we have a look at the function, it doesn’t seem to be useful to a managed code developer:

    // STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE;
    void Get2DShapePoints(out IntPtr pointsPtr, out uint pointCount);

    To get a better idea of how you can get a collection of points from IntPtr, we need to dive into the unmanaged function:

    /// <summary>
    /// Returns 2D (X,Y) coordinates of the key points on the aligned face in video frame coordinates.
    /// </summary>
    /// <param name="ppPoints">Array of 2D points (as FT_VECTOR2D).</param>
    /// <param name="pPointCount">Number of elements in ppPoints.</param>
    /// <returns>If the method succeeds, the return value is S_OK. If the method fails, the return value can be E_POINTER.</returns>
    STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE; 

    The function will give us a pointer to the FT_VECTOR2D array. To consume the data from the pointer, we have to create a new function for use with managed code.

    The managed code

    First, you need to create an array to contain the data that is copied to managed memory. Since FT_VECTOR2D is an unmanaged structure, to marshal the data to the managed wrapper, we must have an equivalent data type to match. The managed version of this structure is PointF (structure that uses floats for x and y).

    Now that we have a data type, we need to convert IntPtr to PointF[]. Searching the code, we see that the FaceTrackFrame class wraps the IFTResult object. This also contains the GetProjected3DShape() function we used before, so this is a good candidate to add a new function, GetShapePoints. It will look something like this:

    // populates an array for the ShapePoints
    public void GetShapePoints(ref Vector2DF[] vector2DF)
         // get the 2D tracked shapes
         IntPtr ptBuffer = IntPtr.Zero;
         uint ptCount = 0;
         this.faceTrackingResultPtr.Get2DShapePoints(out ptBuffer, out ptCount);
         if (ptCount == 0)
    vector2DF = null;
         // create a managed array to hold the values
         if (vector2DF == null || (vector2DF != null && vector2DF.Length != ptCount))
             vector2DF = new Vector2DF[ptCount];

         ulong sizeInBytes = (ulong)Marshal.SizeOf(typeof(Vector2DF));
         for (ulong i = 0; i < ptCount; i++)
             vector2DF[i] = (Vector2DF)Marshal.PtrToStructure((IntPtr)((ulong)ptBuffer + (i * sizeInBytes)), typeof(Vector2DF));

    To ensure we are using the data correctly, we refer to the documentation on Get2DShapePoints:

    IFTResult::Get2DShapePoints Method gets the (x,y) coordinates of the key points on the aligned face in video frame coordinates.

    The PointF values represent the mapped values on the color image. Since we know it matches the color frame, there is no need to do apply mapping. You can call the function to get the data, which should align to the color image coordinates.

    The Sample Code

    The modified version of FaceTrackingBasics-WPF is available in the sample code that can be downloaded from CodePlex. It has been modified to allow you to display the feature points (by name or by index value) and toggle the mesh drawing. Because of the way WPF renders, the performance can suffer on machines with lower end graphics cards. I recommend that you only enable these one at a time. If your UI becomes unresponsive, you can block the sensor with your hand to prevent FaceTracking data capturing. Since the application will not detect any face tracked data, it will not render any points, giving you the opportunity to reset the features you enabled by using the UI controls.

    Figure 5: ShapePoints mapped around the face

    As you can see in Figure 5, the additional 13 points are the center of the eyes, the tip of the nose, and the areas above the eyebrows on the forehead. Once you enable a feature and tracking begins, you can zoom into the center and see the values more clearly.

    A summary of the changes:


    • UI changes to enable slider and draw selections



    • Added a Grid control – used for the UI elements
    • Modified the constructor to initialize grid
    • Modified the OnAllFrameReady event
      • For any tracked skeletons, create a canvas and add to the grid. Use that as the parent to put the label controls

    public partial class FaceTrackingViewer : UserControl, IDisposable
         private Grid grid;

         public FaceTrackingViewer()

             // add grid to the layout
             this.grid = new Grid();
             this.grid.Background = Brushes.Transparent;
             this.Content = this.grid;

         private void OnAllFramesReady(object sender, AllFramesReadyEventArgs allFramesReadyEventArgs)
             // We want keep a record of any skeleton, tracked or untracked.
             if (!this.trackedSkeletons.ContainsKey(skeleton.TrackingId))
                 // create a new canvas for each tracker
                 Canvas canvas = new Canvas();
                 canvas.Background = Brushes.Transparent;
                 this.grid.Children.Add( canvas );
                 this.trackedSkeletons.Add(skeleton.TrackingId, new SkeletonFaceTracker(canvas));

    SkeletonFaceTracker class changes:

    • New property: DrawFraceMesh, DrawShapePoints, DrawFeaturePoint, featurePoints, lastDrawFeaturePoints, shapePoints, labelControls, Canvas
    • New functions: FindTextControl UpdateTextControls, RemoveAllFromCanvas, SetShapePointsLocations, SetFeaturePointsLocations
    • Added the constructor to keep track of the parent control
    • Changed the DrawFaceModel function to draw based on what data was selected
    • Updated the OnFrameReady event to recalculate the positions based for the drawn elements
      • If DrawShapePoints is selected, then we call our new function

    private class SkeletonFaceTracker : IDisposable
        // properties to toggle rendering 3D mesh, shape points and feature points
        public bool DrawFaceMesh { get; set; }
        public bool DrawShapePoints { get; set; }
        public DrawFeaturePoint DrawFeaturePoints { get; set; }

        // defined array for the feature points
        private Array featurePoints;
        private DrawFeaturePoint lastDrawFeaturePoints;

        // array for Points to be used in shape points rendering
        private PointF[] shapePoints;

        // map to hold the label controls for the overlay
        private Dictionary<string, Label> labelControls;

        // canvas control for new text rendering
        private Canvas Canvas;

        // canvas is passed in for every instance
        public SkeletonFaceTracker(Canvas canvas)
            this.Canvas = canvas;

        public void DrawFaceModel(DrawingContext drawingContext)
            // only draw if selected
            if (this.DrawFaceMesh && this.facePoints != null)

        internal void OnFrameReady(KinectSensor kinectSensor, ColorImageFormat colorImageFormat, byte[] colorImage, DepthImageFormat depthImageFormat, short[] depthImage, Skeleton skeletonOfInterest)
            if (this.lastFaceTrackSucceeded)
                if (this.DrawFaceMesh || this.DrawFeaturePoints != DrawFeaturePoint.None)
                    this.facePoints = frame.GetProjected3DShape();

                // get the shape points array
                if (this.DrawShapePoints)
                    this.shapePoints = frame.GetShapePoints();

            // draw/remove the components


    Pulling it all together...

    As we have seen, there are two types of data points that are available from the Face Tracking SDK:

    • Shape Points: data used to track the face
    • Mesh Data: vertices of the 3D model from the GetProjected3DShape() function
    • FeaturePoints: named vertices on the 3D model that play a significant role in face tracking

    To get the shape point data, we have to extend the current managed wrapper with a new function that will handle the interop with the native API.

    Carmine Sirignano
    Developer Support Escalation Engineer
    Kinect for Windows

    Additional Resources:


  • Kinect for Windows Product Blog

    Using Kinect Webserver to Expose Speech Events to Web Clients


    In our 1.8 release, we made it easy to create Kinect-enabled HTML5 web applications. This is possible because we added an extensible webserver for Kinect data along with a Javascript API which gives developers some great functionality right out of the box:

    • Interactions : hand pointer movements, press and grip events useful for controlling a cursor, buttons and other UI
    • User Viewer: visual representation of the users currently visible to Kinect sensor. Uses different colors to indicate different user states
    • Background Removal: “Green screen” image stream for a single person at a time
    • Skeleton: standard skeleton data such as tracking state, joint positions, joint orientations, etc.
    • Sensor Status: Events corresponding to sensor connection/disconnection

    This is enough functionality to write a compelling application but it doesn’t represent the whole range of Kinect sensor capabilities. In this article I will show you step-by-step how to extend the WebserverBasics-WPF sample (see C# code in CodePlex  or documentation in MSDN) available from Kinect Toolkit Browser to enable web applications to respond to speech commands, where the active speech grammar is configurable by the web client.

    A solution containing the full, final sample code is available on CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).

    Getting Started

    To follow along step-by-step:

    1. If you haven’t done so already, install the Kinect for Windows v1.8 SDK and Toolkit
    2. Launch the Kinect Toolkit Browser
    3. Install WebserverBasics-WPF sample in a local directory
    4. Open the WebserverBasics-WPF.sln solution in Visual Studio
    5. Go to line 136 in MainWindow.xaml.cs file

    You should see the following TODO comment which describes exactly how we’re going to expose speech recognition functionality:

    //// TODO: Optionally add factories here for custom handlers:
    ////       this.webserver.SensorStreamHandlerFactories.Add(new MyCustomSensorStreamHandlerFactory());
    //// Your custom factory would implement ISensorStreamHandlerFactory, in which the
    //// CreateHandler method would return a class derived from SensorStreamHandlerBase
    //// which overrides one or more of its virtual methods.

    We will replace this comment with the functionality described below.

    So, What Functionality Are We Implementing?


    More specifically, on the server side we will:

    1. Create a speech recognition engine
    2. Bind the engine to a Kinect sensor’s audio stream whenever sensor gets connected/disconnected
    3. Allow a web client to specify the speech grammar to be recognized
    4. Forward speech recognition events generated by engine to web client
    5. Registering a factory for the speech stream handler with the Kinect webserver

    This will be accomplished by creating a class called SpeechStreamHandler, derived fromMicrosoft.Samples.Kinect.Webserver.Sensor.SensorStreamHandlerBase. SensorStreamHandlerBase is an implementation of ISensorStreamHandler that frees us from writing boilerplate code. ISensorStreamHandler is an abstraction that gets notified whenever a Kinect sensor gets connected/disconnected, when color, depth and skeleton frames become available and when web clients request to view or update configuration values. In response, our speech stream handler will send event messages to web clients.

    On the web client side we will:

    1. Configure speech recognition stream (enable and specify the speech grammar to be recognized)
    2. Modify the web UI in response to recognized speech events

    All new client-side code is in SamplePage.html

    Creating a Speech Recognition Engine

    In the constructor for SpeechStreamHandler you’ll see the following code:

    RecognizerInfo ri = GetKinectRecognizer();
    if (ri != null)
        this.speechEngine = new SpeechRecognitionEngine(ri.Id);

        if (this.speechEngine != null)
            // disable speech engine adaptation feature
            this.speechEngine.UpdateRecognizerSetting("AdaptationOn", 0);
            this.speechEngine.UpdateRecognizerSetting("PersistedBackgroundAdaptation", 0);
            this.speechEngine.AudioStateChanged += this.AudioStateChanged;
            this.speechEngine.SpeechRecognitionRejected += this.SpeechRecognitionRejected;
            this.speechEngine.SpeechRecognized += this.SpeechRecognized;

    This code snippet will be familiar if you’ve looked at some of our other speech samples such as SpeechBasics-WPF. Basically, we’re getting the metadata corresponding to the Kinect acoustic model (GetKinectRecognizer is hardcoded to use English language acoustic model in this sample, but this can be changed by installing additional language packs and modifying GetKinectRecognizer to look for the desired culture name), using it to create a speech engine, turning off some settings related to audio adaptation feature (which makes speech engine better suited for long-running scenarios) and registering to receive events when speech is recognized or rejected, or when audio state (e.g.: silence vs someone speaking) changes.

    Binding the Speech Recognition Engine to a Kinect Sensor’s Audio Stream

    In order to do this, we override SensorStreamHandlerBase’s implementation of OnSensorChanged, so we can find out about sensors connecting and disconnecting.

    public override void OnSensorChanged(KinectSensor newSensor)
        if (this.sensor != null)
            if (this.speechEngine != null)

        this.sensor = newSensor;
        if (newSensor != null)
            if (this.speechEngine != null)
                    newSensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));

    The main thing we need to do here is Start the AudioSource of the newly connected Kinect sensor in order to get an audio stream that we can hook up as the input to the speech engine. We also need to specify the format of the audio stream, which is a single-channel, 16-bits per sample, Pulse Code Modulation (PCM) stream, sampled at 16kHz.

    Allow Web Clients to Specify Speech Grammar

    We will let clients send us the whole speech grammar that they want recognized, as XML that conforms to the W3C Speech Recognition Grammar Specification format version 1.0. To do this, we will expose a configuration property called “grammarXml”.

    Let’s backtrack a little bit because earlier we glossed over the bit of code in the SpeechStreamHandler constructor where we register the handlers for getting and setting stream configuration properties:

    this.AddStreamConfiguration(SpeechEventCategory, new StreamConfiguration(this.GetProperties, this.SetProperty));

    Now, in the SetProperty method we call LoadGrammarXml method whenever a client sets the “grammarXml” property:

    case GrammarXmlPropertyName:

    And in the LoadGrammarXml method we do the real work of updating the speech grammar:

    private void LoadGrammarXml(string grammarXml)

        if (!string.IsNullOrEmpty(grammarXml))
            using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(grammarXml)))
                Grammar newGrammar;
                    newGrammar = new Grammar(memoryStream);
                catch (ArgumentException e)
                    throw new InvalidOperationException("Requested grammar might not contain a root rule", e);
                catch (FormatException e)
                    throw new InvalidOperationException("Requested grammar was specified with an invalid format", e);


    We first stop the speech recognition because we don’t yet know if the specified grammar is going to be valid or not, then we try to create a new Microsoft.Speech.Recognition.Grammar object from the specified property value. If the property value does not represent a valid grammar, newGrammar variable will remain null. Finally, we call StartRecognition method, which loads the grammar into the speech engine (if grammar is valid), and tells the speech engine to start recognizing and keep recognizing speech phrases until we explicitly tell it to stop.

    private void StartRecognition(Grammar g)
        if ((this.sensor != null) && (g != null))

        this.grammar = g;

    Send Speech Recognition Events to Web Client

    When we created the speech recognition engine, we registered for 3 events: AudioStateChanged, SpeechRecognized and SpeechRecognitionRejected. Whenever any of these events happen we just want to forward the event to the web client. Since the code ends up being very similar, we will focus on the SpeechRecognized event handler:

    private async void SpeechRecognized(object sender, SpeechRecognizedEventArgs args)
        var message = new RecognizedSpeechMessage(args);
        await this.ownerContext.SendEventMessageAsync(message);

    To send messages to web clients we use functionality exposed by ownerContext, which is an instance of the SensorStreamHandlerContext class which was passed to us in the constructor. The messages are sent to clients using a web socket channel, and could be

    • Stream messages: Messages that are generated continuously, at a predictable rate (e.g.: 30 skeleton stream frames are generated every second), where the data from each message replaces the data from the previous message. If we drop one of these messages every so often there is no major consequence because another will arrive shortly thereafter with more up-to-date data, so the framework might decide to drop one of these messages if it detects a bottleneck in the web socket channel.
    • Event messages: Messages that are generated sporadically, at an unpredictable rate, where each event represents an isolated incident. As such, it is not desirable to drop any one of these kind of messages.

    Given the nature of speech recognition, we chose to communicate with clients using event messages. Specifically, we created the RecognizedSpeechMessage class, which is a subclass of EventMessage that serves as a representation of SpeechRecognizedEventArgs which can be easily serialized as JSON and follows the JavaScript naming conventions.

    You might have noticed the usage of the “async” and “await” keywords in this snippet. They are described in much more detail in MSDN but, in summary, they enable an asynchronous programming model so that long-running operations don’t block thread execution while not necessarily using more than one thread. The Kinect webserver uses a single thread to schedule tasks so the consequence for you is that ISensorStreamHandler implementations don’t need to be thread-safe, but should be aware of potential re-entrancy due to asynchronous behavior.

    Registering a Speech Stream Handler Factory with the Kinect Webserver

    The Kinect webserver can be started, stopped and restarted, and each time it is started it creates ISensorStreamHandler instances in a thread dedicated to Kinect data handling, which is the only thread that ever calls these objects. To facilitate this behavior, the server doesn’t allow for direct registration of ISensorStreamHandler instances and instead expects ISensorStreamHandlerFactory instances to be registered in KinectWebserver.SensorStreamHandlerFactories property.

    For the purposes of this sample, we declared a private factory class that is exposed as a static singleton instance directly from the SpeechStreamHandler class:

    public class SpeechStreamHandler : SensorStreamHandlerBase, IDisposable

        static SpeechStreamHandler()
            Factory = new SpeechStreamHandlerFactory();

        public static ISensorStreamHandlerFactory Factory { get; private set; }

        private class SpeechStreamHandlerFactory : ISensorStreamHandlerFactory
            public ISensorStreamHandler CreateHandler(SensorStreamHandlerContext context)
                return new SpeechStreamHandler(context);

    Finally, back in line 136 of MainWindow.xaml.cs, we replace the TODO comment mentioned above with

    // Add speech stream handler to the list of available handlers, so web client
    // can configure speech grammar and receive speech events

    Configure Speech Recognition Stream in Web Client

    The sample web client distributed with WebserverBasics-WPF is already configuring a couple of other streams in the function called updateUserState in SamplePage.html, so we will add the following code to this function:

    var speechGrammar = '\
    <grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" root="DefaultRule" xmlns="\' data-mce-href=">\'>\'>\'>\'>http://www.w3.org/2001/06/grammar">\
        <rule id="DefaultRule" scope="public">\
                    <one-of><item>Show Panel</item><item>Show</item></one-of>\
                    <one-of><item>Hide Panel</item><item>Hide</item></one-of>\

    immediateConfig["speech"] = { "enabled": true, "grammarXml": speechGrammar };

    This code enables the speech stream and specifies a grammar that

    • triggers a recognition event with “SHOW” as semantic value whenever a user utters the phrases “Show” or “Show Panel”
    • triggers a recognition event with “HIDE” as semantic value whenever a user utters the phrases “Hide” or “Hide Panel”

    Modify the web UI in response to recognized speech events

    The sample web client already registers an event handler function, so we just need to update it to respond to speech events in addition to user state events:

    function onSpeechRecognized(recognizedArgs) {
        if (recognizedArgs.confidence > 0.7) {
            switch (recognizedArgs.semantics.value) {
                case "HIDE":
                case "SHOW":

    sensor.addEventHandler(function (event) {
        switch (event.category) {
            case "speech":
                switch (event.eventType) {
                    case "recognized":

    Party Time!

    At this point you can rebuild the updated solution and run it to see the server UI. From this UI you can click on the link that reads “Open sample page in default browser” and play with the sample UI. It will look the same as before the code changes, but will respond to the speech phrases “Show”, “Show Panel”, “Hide” and “Hide Panel”. Now try changing the grammar to include more phrases and update the UI in different ways in response to speech events.

    Happy coding!

    Additional Resources

Page 5 of 11 (104 items) «34567»