Robotic assistants designed to coexist and communicate with humans
in the real world should be able to interact with them in an
intuitive way. This requires that the robots are able to recognize
typical gestures performed by humans such as head shaking/nodding,
hand waving, or pointing gestures. We present a
system that is able to spot and recognize complex, parameterized
gestures from monocular image sequences. To represent people, we
locate their faces and hands using trained classifiers and track
them over time. We use few, expressive features extracted out of
this compact representation as input to hidden Markov models (HMMs).
First, we segment gestures into distinct phases and train HMMs for
each phase separately. Then, we construct composed HMMs, which
consist of the individual phase-HMMs. Once a specific phase is
recognized, we estimate the parameter of the current gesture, e.g.,
the target of a pointing gesture. As we demonstrate in the
experiments, our method is able to robustly locate and track hands,
despite of the fact that they can take a large number of
substantially different shapes. Based on this, our system is able
to reliably spot and recognize a variety of complex, parameterized
- Recognizing Complex, Parameterized Gestures from Monocular Image Sequences. T. Axenbeck, M. Bennewitz, S. Behnke, and W. Burgard. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2008.
- Robust Recognition of Complex Gestures for Natural Human-Robot Interaction. M. Bennewitz, T. Axenbeck, S. Behnke, and W. Burgard. In: Proceedings of the Workshop on Interactive Robot Learning at Robotics: Science and Systems Conference (RSS), 2008.
|This video (XVID-MPEG4, AVI) shows that faces, facial features, and hands can be robustly tracked even under difficult and changing lighting conditions and given cluttered background. Our system reliably recognizes complex gestures. We only show the most likely recognized gesture in the video. (Click here for the video using an alternative codec.)|
|We perfomed further experiments in a different envionment. In this video (XVID-MPEG4, AVI), we show the most likely gesture individually for the left and right hand, and for bi-manual gestures.|