After upsetting a few people with my previous post regarding the amount of post processing required to use gesture recognition with the Kinect – I thought it was worth posting a follow up, now that I’ve had time to delve into the device a bit more deeply.
First off – I wanted to address the dispute over the comment I made about the 3×3 checkerboard pattern of the projected dot pattern. Here’s some footage I shot with an IR camera – actually a modified EyeToy camera with the internal filter removed and a visible light blocking filter on the front.
You can clearly see the dot pattern in this movie, and it is in fact arranged in a 3×3 grid of light and dark regions, each one with a central bright spot.
Now that I’ve had a chance to look at the pattern, it could well be that the Kinect is doing a form of ‘average light returned’ for each sample point. As the reflecting surface moves further away the dot density will decrease. I think the community is still out on that one. Parallax detection with such a dense and possibly random pattern is quite impressive.
Via a comment posted to the Youtube video I was directed to this patent that shows that the pattern of dots is actually significant to the depth mapping algorithm.
Now on to deciphering the sensor data…
I compiled the C++ client application that plots a depth map using OpenGL – this really smokes my laptop (roughly 2.5Ghz dual core) and is dropping frames like crazy. However, if you restrict your attention to just the depth map and attempt the blob detection routines things do become more manageable.
Also it is critical to compile the image processing library in Release mode – OpenCV does much better in this build mode, and the frame processing is much slicker. It is possible to define a region of interest with the depth data and reduce cpu processing time down to well within the 1/30th second window. This still leaves your machine somewhat “busy” with reduced amount of CPU available for other tasks.
However I’ve not been able to get any faster data transfer rate than the 30 fps – for most applications this will be acceptable, but for a real-time music controller, it’s only really good enough for triggering continuous data – not finite note on / note off. Triggering rhythmic samples is right out – as if you miss the beat, it’ll be off for the whole duration – for this kind of thing you *need* at least 60 fps, and higher if possible.
I still think that the best approach is to use a separate machine for extracting the control data, and then pass this along to a rendering machine to handle the output.
I had a chance to talk to the makers of the Kinect Piano shown in the last post, they’ve been able to optimize the performance of their system (they use Python) to get the response time much tighter, and I think it shows in this video.
They persuaded me to give up my C++ code in favo[u]r of a Python interpreter – I’ll be giving this a shot and publishing the results as soon as I can. Additionally I’ll be trying thing out on a Mac and see if it’s any smoother. All the cool kids seem to be using Macs nowadays.
Finally I’ve tried to use the Kinect as a 3D scanner – it seems to do quite well on most simple forms, although it’s not so good at recognizing a Dalek…
UPDATE – OK so I fired up my Mac and “gitted” the latest OpenKinect driver – when I go to build it tells me that std::map m_devices has no access method called at() used in createDevices() defined in libfreenect.hpp – I’m on 10.5.8 of OSX – anyone Mac savvy know how to get around this?
UPDATE UPDATE : Fixed it m’self by replacing .at(_index) with [_index] which was a guess, but it seems to work.