The Holodesk from Microsoft Research lets people interact with virtual objects in a uniquely real way – almost as if they were on Star Trek’s Holodeck. We talk to lead researcher Otmar Hilliges to find out how they did it.

We’ve written about Microsoft Research’s HoloDesk before. It’s an amazing piece of technology that uses optics, Kinect and other ‘sufficiently advanced technology’ to create a 3D world which you can manipulate with your bare hands. It’s like something out of Star Trek. Now we’re talking to Otmar Hilliges, a Post-doc researcher at Microsoft Research, Cambridge who worked on the HoloDesk project. Needless to say, the views in this article are his own personal opinions and not official Microsoft statements.

Tell us about your role.

We started the project a year and a half ago. I did most of the ideation and a lot of the technical implementation along with my colleagues here in the sensors and devices group. In my PHD I did a lot work on enabling 3D interactions on 2D touchscreens and we wanted to extend this into 3D. There are four people on the team and the duration from inception to submission of the paper was 18 months, but not working full-time.

Is it as intuitive as it looks?

The video is live. It’s a camera right next to the user. It’s what you would see if you were using the system. There’s no TV trickery!

What where the objectives of the project?

The high level goal was to manipulate virtual objects in the exact same manner as we would manipulate real objects. Modelling natural human grasping was the hardest bit – the interaction between hands and virtual objects. This is the main research contribution and we created an algorithm that takes Kinect data and uses it as input into a 3D physics simulation.

How does Kinect fit into the project?

It would have been much harder without Kinect. We’ve used 2D input from ordinary cameras but that’s not as rich as the depth data from Kinect. But it’s still non-trivial because Kinect only gives you per-pixel information about distance from the camera. But with grasping many parts of the hand aren’t visible to the camera so we have to do a lot of inference and tricks to calculate what the hand is doing.

The human hand is very flexible. One of the difficulties is if you have two overlapping hands. Many grasping strategies cause problems. For example, if you approach an object from the top your palm it occludes the fingers.

We’re not fully there but we’re tracking motion over time using an optical flow technique and approximating the posture of the hand using 3D particles. But it is not a fully articulating 3D model. That is the ultimate goal.

What technology did you use to write the software?

It is mostly based on Direct3D and DirectCompute for the image processing and graphics with C# and HLSL. There’s a little bit of CUDA involved for processing the optical flow – the point to point correspondence between frames.

What was the most surprising thing about the project?

We demonstrated this at open lab days and tech fest and people do react very, very strongly to it. It’s somewhere between excitement and being completely weirded out that they’re holding something without really holding.

For bystanders, people can’t really see what’s going on and they just see someone semi-insane waving their hands in the air and then they look through the mirror and see the full 3D scene.

People said that it ‘it feels almost as if the object was there’ even though a lot of sensory feedback is missing. There’s no weight, no haptic feedback but the visual illusion is strong enough that people suspend their disbelief.

Where do you go from here with HoloDesk?

For us, the input sensing and modelling input from depth cameras is still an interesting and open research question. The display setup we’re using in HoloDesk was more a way to prototype the user experience and we’re looking at other ways to superimpose 3D graphics over the real world that are less clunky. It’s not the kind of hardware setup that a user will have in five years’ time. So we have two strands of research to follow: novel display technologies that would enable this in more lightweight way and then more work around sensing and interpretation.

If this technology becomes more widespread, how would people use it?

I think there are two possibilities. One is gaming. Imagine game characters running around in your living room. Your environment becomes part of the game play. The other one is much less spectacular but just as promising. The current incarnation of mobile phones have pretty big hand-held screens but that might not be the ultimate evolution of phones. They might become a computational unit that you keep in your pocket and you interact with it by manipulating your environment. You could just draw a screen on your desk or render a screen onto your hand when you’re out doors. It suggests that physical screens might be replaced by virtual screens. But not yet. Perhaps ten years out.

While HoloDesk isn’t going to turn into a commercial product any time soon, you can get your hands on Kinect for the PC now and you can download the SDK.