Humanoid service robots performing complex object manipulation tasks
need to plan whole-body motions that satisfy a variety of
constraints: The robot must keep its balance, self-collisions and
collisions with obstacles in the environment must be avoided and, if
applicable, the trajectory of the end-effector must follow the
constrained motion of a manipulated object in Cartesian space.
We present an approach to whole-body motion planning
that is based on rapidly-exploring random
trees in combination with inverse kinematics. Using our system, humanoids
are able to plan motions so as to open drawer, doors, and picking up
Code will be made available open source in ROS / MoveIt!
We developed a system that enables a humanoid robot to imitate complex
whole-body motions of humans in real time. To avoid falls of the
robot that might occur when using direct imitation of the movements
due to the different weight distribution, we developed an approach
that actively balances the center of mass over the support polygon
of the robot's feet.
Autonomous Biped Navigation Through Cluttered 3D Environments
We developed a navigation system that allows humanoids
to autonomously navigate in previously unknown, cluttered environments.
Our approach relies on data from consumer-grade depth cameras such
as an ASUS Xtion or Microsoft Kinect. From the depth data, our
system estimates the robot's pose and maintains a
heightmap representation of the environment. Based on this model, our
technique iteratively computes sequences of safe actions including
footsteps and whole-body motions, leading the robot to target
locations. Hereby, the planner chooses from a set of actions that
consists of planar footsteps, step-over actions, as well as parameterized step-onto and
step-down actions. To efficiently check for
collisions during planning, we developed a new approach that takes
into account the shape of the robot and the obstacles.
The video below shows our Nao humanoid equipped with an ASUS Xtion Pro Live on top of its head
navigating in a cluttered environments. The robot is able to
traverse highly challenging passages by building an accurate
heightmap from the data of the onboard depth camera and choosing
Autonomous Navigation in 3D Environments Based on Depth Camera Data
We developed an integrated approach for robot localization, obstacle mapping,
and path planning in 3D environments based on data of an onboard consumer-level
depth camera. Our system relies on state-of-the-art techniques for
environment modeling and localization, which we extended for depth camera data.
Our approach performs in real-time, maintains a 3D environment representation,
and estimates the robot's pose in 6D. As our experiments show, the depth camera is
well-suited for robust localization and reliable obstacle avoidance in complex
The video below shows our Nao humanoid equipped with an ASUS Xtion Pro Live on top of its head.
The robot estimates its 6D pose in a static 3D model of the environment based on depth data. At the same time, it constructs an 3D obstacle map from the depth data for obstacle avoidance. To allow for real-time performance, the robot updates the map from sensor data at 6Hz. The learned octree-based representation (OctoMap) is then used for real-time planning of collision-free paths.
Autonomously Climbing Complex Staircases and
We developed an approach to enable a humanoid robot to autonomously
climb up complex staircases. We first reconstruct a 3D model of the
staircase based on laser range data acquired with a humanoid. The
robot then globally estimates its pose in the 3D model, which is
subsequently refined by integrating visual observations. We use the
3D staircase model and the estimated pose to project edges
corresponding to stair contours into monocular camera images. By
detecting edges in the images and associating them to projected
model edges, the robot is able to accurately locate itself towards
the stairs and to climb them reliably.
Furthermore, we developed methods that enable a humanoid robot to
traverse ramps using only vision and inertial data for sensing.
The robot locates the beginning of the ramp using visual observations, walks down with regular corrections based on the inertial data, and finally determines the end of the ramp by detecting the ending edge before exiting the ramp.
NAO Walking Down a Ramp
C. Lutz, F. Atmanspacher, A. Hornung, and M. Bennewitz.
In: Video Abstract Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
The first two videos below show how the Nao acquires an accurate 3D
model of the staircase and then autonomously climbs it up to the
top. As can be seen, the robot moves its head to observe stair edges
in camera images and accurately positions itself for climbing the
next step. The last video shows the humanoid traversing a ramp using only vision and inertial data for sensing.
Vision-based Obstacle Avoidance
We have developed efficient approaches to obstacle avoidance for
humanoid robots based on monocular images. Our approach relies on
ground-plane estimation and trains visual classifiers using color
and texture information in a self-supervised way. During
navigation, the classifiers are automatically updated and applied to
the image stream to decide which areas are traversable. From this
information, the robot can compute a two-dimensional occupancy grid
map of the environment and use it for planning collision-free paths.
As we illustrate in thorough experiments with a real humanoid, the
classification results are highly accurate and the resulting
occupancy map enables the robot to reliably avoid obstacles during
The videos below show how our Nao humanoid trains the visual
classifiers in a self-supervised fashion during navigation. The
learned classifiers are applied to the stream of camera images to
distriminate obstacles from the floor. Based on the traversable
area, the robot builds an occupancy map for
In the first video, the robot uses data
from its 2D laser scanner to guide the training, in the second video
the robot needs only its RGB camera data and odometry information.
Efficient Path Planning for Humanoids
Humanoid robots possess the capability of stepping over or onto
objects. When planning paths for humanoids, one therefore should
consider an intelligent placement of footsteps instead of choosing
detours around obstacles. We propose to combine grid-based 2D
planning with footstep planning in an efficient manner. In this way,
we exploit the advantages of both frameworks, namely fast planning on
grids and the ability to find solutions in situations where grid-based
planning fails. Our method computes a global solution by adaptively
switching between fast grid-based planning in open spaces and footstep
planning in the vicinity of obstacles. To decide which planning
framework to use, our approach classifies the environment into regions
of different complexity with respect to the traversability.
Experiments carried out in a simulated office environment and with a
Nao humanoid show that (i) our approach significantly reduces the
planning time compared to pure footstep planning and (ii) the
resulting plans are almost as good as globally computed optimal
Recently, We released an update of our footstep planner
in the ROS humanoid_navigation stack. The planner is now much faster
and builds on SBPL. In addition to the previous
D* Lite implementation, this now enables anytime (re-)planning with suboptimality bounds, e.g., with ARA* or AD*.
This first part of the video below shows our Nao humanoid robot
executing a footstep plan, carefully avoiding obstacles. Note that
in this scenario, a conventional 2D path using the robot's
circumcircle would lead to suboptimal results or even collisions
because there is no direct control of the footstep locations.
In the second part of the video, the robot is traversing its initial
path when the location of an obstacle changes after a
few steps. Reusing previous information, a new footstep path is
Humanoid Robot Localization
in Complex Indoor Environments
We developed a probabilistic localization method for humanoid robots
navigating in arbitrary complex indoor environments using only onboard
sensing, which is a challenging task. Inaccurate motion execution of
biped robots leads to an uncertain estimate of odometry, and their
limited payload constrains perception to observations from lightweight
and typically noisy sensors. Additionally, humanoids do not walk on
flat ground only and perform a swaying motion while walking, which
requires estimating a full 6D torso pose. We apply Monte Carlo
localization to globally determine and track a humanoid’s 6D pose in a
given 3D world model, which may contain multiple levels and
staircases. We present an observation model to integrate range
measurements from a laser scanner or a depth camera as well as
attitude data and information from the joint encoders. To increase the
localization accuracy, e.g., while climbing stairs, we propose a
further observation model and additionally use monocular vision data
in an improved proposal distribution. We demonstrate the effectiveness
of our methods in extensive real-world experiments with a Nao
humanoid. As the experiments illustrate, the robot is able to globally
localize itself and accurately track its 6D pose while walking and
This video below shows our humanoid robot navigating in a complex
indoor environment while localizing itself using our technique. We
present experiments carried out in the Webots 6 robot simulator as
well as using our Nao humanoid equipped with a laser head. This
robot was developeed
Robotics in cooperation with our lab.
Learning Reliable and Efficient Navigation with a Humanoid
Reliable and efficient navigation with a humanoid robot is a difficult
task. First, the motion commands are executed rather inaccurately due
to backlash in the joints or foot slippage. Second, the observations
are typically highly affected by noise due to the shaking behavior of
the robot. Thus, the localization performance is typically reduced
while the robot moves and the uncertainty about its pose increases.
As a result, the reliable and efficient execution of a navigation task
cannot be ensured anymore since the robot's pose estimate might not
correspond to the true location. We developed a reinforcement
learning approach to select appropriate navigation actions for a
humanoid robot equipped with a camera for localization. The robot
learns to reach the destination reliably and as fast as possible,
thereby choosing actions to account for motion drift and trading off
velocity in terms of fast walking movements against accuracy in
localization. Extensive simulated and practical experiments with a
humanoid robot demonstrate that our learned policy significantly
outperforms a hand-optimized navigation strategy.
This video below shows our Nao humanoid robot navigating in our corridor
environment. The policy learned with reinforcement learning is
executed in order to reach the goal fast and reliably. In addition
to the external view, the robot's estimated state with the
corresponding uncertainty ellipse and the camera view with detected
and integrated features is shown.
Metric Localization with Scale-Invariant Visual Features using a Single Perspective Camera
The Scale Invariant Feature Transform (SIFT) has
become a popular feature extractor for vision-based applications. It
has been successfully applied to metric localization and mapping using
stereo vision and omnivision. We present an approach
to Monte-Carlo localization using SIFT features for mobile robots
equipped with a single perspective camera. First, we acquire a
2D grid map of the environment that contains the visual features. To
come up with a compact environmental model, we appropriately
down-sample the number of features in the final map. During
localization, we cluster close-by particles and estimate for each
cluster the set of potentially visible features in the map using
ray-casting. These relevant map features are then compared to the
features extracted from the current image. The observation model used
to evaluate the individual particles considers the difference between
the measured and the expected angle of similar features. In
real-world experiments, we demonstrate that our technique is able to
accurately track the position of a mobile robot. Moreover, we present
experiments illustrating that a robot equipped with a different type
of camera can use the same map of SIFT features for localization.
animated gif (2 MB) shows the evolution of the particle clouds during an localization experiment. The blue dot corresponds to the true pose of the robot and the green dot indicates the pose resulting from odometry information.
This video (wmv, 19 MB) shows the humanoid robot Max collecting data in an office environment. Since the robot was designed for playing soccer, its camera looks downwards. Thus, in the experiment shown here, Max has to bend backwards in order to observe the features used for localization in the environment.
Learning Efficient Policies for Vision-based Navigation
Cameras provide valuable information about the environment and are
often used as sensor for localization to accomplish navigation
tasks. However, fast movements of a mobile robot typically reduce
the performance of vision-based localization systems due to motion
blur. We developed a reinforcement learning approach to select
appropriate velocity values for vision-based navigation. The
learned policy minimizes the time to reach the destination and
implicitly takes the impact of motion blur on observations into
account. To reduce the size of the resulting policies, which is
desirable in the context of memory-constrained systems, we compress
the learned policy via a clustering approach. Extensive simulated
and real-world experiments demonstrate that our learned policy
significantly outperforms any policy that uses a constant velocity
and more advanced heuristics.
In our experiments, we used a Pioneer 2 robot with a top-mounted
down-looking camera to observe the floor in front of the robot. In
the experiment shown in the video below, the robot drives to its goal position using the
learned policy. The current camera image with detected and matched
landmarks as well as the robot's true (red) and the via UKF estimated
(green) pose and the corresponding uncertainty are
displayed. Depending on the distance and the angle to the goal as
well as on the uncertainty in the belief about the robot's pose, the
robot chooses appropriate values for the translational velocity to
reach the destination as fast as possible.
We furthermore performed experiments with an outdoor robot and used
a scenario in which the robot had to traverse several waypoints.
The second part of the video below
shows the robot driving to its goal using the learned
policy. As can be seen, the robot reaches the goal fast and reliably.
Robust Recognition of Complex Gestures for Natural
Robotic assistants designed to coexist and communicate with humans
in the real world should be able to interact with them in an
intuitive way. This requires that the robots are able to recognize
typical gestures performed by humans such as head shaking/nodding,
hand waving, or pointing gestures. We present a
system that is able to spot and recognize complex, parameterized
gestures from monocular image sequences. To represent people, we
locate their faces and hands using trained classifiers and track
them over time. We use few, expressive features extracted out of
this compact representation as input to hidden Markov models (HMMs).
First, we segment gestures into distinct phases and train HMMs for
each phase separately. Then, we construct composed HMMs, which
consist of the individual phase-HMMs. Once a specific phase is
recognized, we estimate the parameter of the current gesture, e.g.,
the target of a pointing gesture. As we demonstrate in the
experiments, our method is able to robustly locate and track hands,
despite of the fact that they can take a large number of
substantially different shapes. Based on this, our system is able
to reliably spot and recognize a variety of complex, parameterized
Robust Recognition of Complex Gestures for Natural Human-Robot
Interaction. M. Bennewitz, T. Axenbeck, S. Behnke, and W. Burgard.
In: Proceedings of the Workshop on Interactive Robot Learning at Robotics: Science and Systems Conference (RSS), 2008.
video (XVID-MPEG4, AVI) shows that faces, facial features, and
hands can be robustly tracked
even under difficult and changing lighting conditions
and given cluttered background. Our system reliably recognizes complex gestures.
We only show the most likely recognized gesture in the video. (Click
here for the
video using an alternative codec.)
We perfomed further experiments in a different envionment.
In this video (XVID-MPEG4, AVI),
we show the most likely gesture individually for the left and right
hand, and for bi-manual gestures.
Multimodal Interaction between a Humanoid Robot and Humans
The purpose of our research is to develop a humanoid museum tour
guide robot that performs intuitive, multimodal interaction with
multiple people. Our robots Robotinho and Fritz use speech, facial
expressions, eye-gaze, and gestures to interact with people.
Depending on the audio-visual input, the robots shift their
attention between different people in order to involve them into the
conversation. Robotinho and Fritz perform human-like arm gestures
during an interaction and also use pointing gestures generated with
eyes, head, and arms to direct the attention of their communication
partners towards objects of interest. To express the emotional
state, the robot generate facial expressions and adapt the speech
In contrast to its predecessor Fritz, Robotinho is not only used in
a static scenario and possesses several new features such as the
movable trunk, a more expressive head with movable eye lids, and an
additional arm joint.
The Humanoid Museum Tour Guide Robotinho.
F. Faber, M. Bennewitz, C. Eppner, A. Goeroeg, C. Gonsior, D. Joho, M. Schreiber, and S. Behnke.
In: Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2009.
Intuitive Multimodal Interaction with Communication Robot
Bennewitz, F. Faber, D. Joho, and S. Behnke. In:
M. Hackel, editor, Humanoid Robots, Human-like Machines,
pp. 613-624, I-Tech Education and Publishing, 2007.
Please check this webpage for further
related publications in the year 2005.
Robotinho acting as a tourguide in the corridor of our university
(wmv, 29 MB) shows how the robot guides visitors to exhibits and
presents them. Robotinho uses multiple modalities
to interact with people in a natural way.
Before, Robotinho was used in a static scenario and explained some
smaller robots which were placed on a table in the front. This video
(wmv, 19 MB) shows the multimodal interaction between Robotinho
Our communication robot Fritz explained some smaller robots to the
visitors at the Science Days in the Europa-Park Rust, October 2006. This video (wmv, 28 MB) shows the interaction between Fritz and the visitors.
Utilizing Learned Motion Patterns to Predict Positions of People
Whenever people move through their environments they do not move
randomly. Instead, they usually follow specific trajectories or
motion patterns corresponding to their intentions. Knowledge about
such patterns enables a mobile robot to robustly keep track of
persons in its environment and to improve its behavior. We
propose a technique for learning collections of trajectories that
characterize typical motion patterns of persons. Data recorded with
laser-range finders is clustered using the expectation maximization
algorithm. Based on the result of the clustering process we derive
a Hidden Markov Model (HMM) that is applied to estimate the current
and future positions of persons based on sensory input.
We present several experiments carried out in different environments
with a mobile robot equipped with a laser range scanner and a camera
system. The results demonstrate that our approach can reliably
learn motion patterns of persons, can robustly estimate and predict
the positions of multiple persons, and can be used to improve the navigation
behavior of a mobile robot.
mpeg-video (4.7 MB) for an experiment with a single person. The
video shows a scene overview (left hand side), the results from the people tracking
system which is based on laser-range data (right hand side), as well
as the HMM (bottom) which
is used to maintain a belief of
the robot over the positions of the person. In this case we do not
use vision information because we assume only one person is moving
in the environment. In the HMM the red dot
corresponds to the position of the person provided by the laser
tracking system. The size of the squares of the states of the HMM
represent the probabilty that the person is currently in the corresponding state.
mpeg-video (12.8 MB) for an experiment with multiple persons. The
video shows the camera images (left hand side) with the areas corresponding to a
person detected by the laser tracking system, as well as one HMM (right hand side). The
HMM shows the belief of the robot over the position of the
person which enters the corridor as second (black trousers, blue
animated gif (5.9 MB) for an experiment with two persons. Whereas
the upper image depicts the belief about the position of person
1 the lower image shows the belief about the position of
person 2. The circles are detected
features. The grey value of each circle represents the
similarity to the person corresponding to the HMM (the darker the
more likely). In the beginning the robot
was quite certain that persons 1 and 2 were in the room
containing resting place 3.
animated gif (3.4 MB) for an experiment with a moving robot.
Here the robot traveled along the
corridor and looked into one of the offices where it detected person
A. Whereas the robot was initially rather uncertain as to where person A was, the
probability of resting place 3 seriously increased after the
Adapting Navigation Strategies Using Learned Motion Patterns of People
We propose a method for adapting the behavior of a mobile robot
according to the activities of the people in its surrounding. Our
approach uses learned motion patterns of persons. Whenever the robot
detects a person it computes a probabilistic estimate about which
motion pattern the person might be engaged in. During path planning
it then uses this belief to improve its navigation behavior. In
different practical experiments carried out on a real robot we
demonstrate that our approach allows a robot to quickly adapt its
navigation plans according to the activities of the persons in its
Our mobile robot Albert moves into a doorway to let a person pass
Albert moves forward and waits until the likelihood of
interfering with the person is low enough (mpg-video).
Albert moves away from a doorway to let a person enter the
corresponding room (mpg-video).
Learning Motion Patterns of People
We propose a method to learn typical motion behaviors of persons.
As people move through their environments, they usually do not move
randomly. Instead, they often engage in typical motion patterns,
related to specific locations they might be interested in
approaching and specific trajectories they might follow in doing so.
Knowledge about such patterns may enable a mobile robot to develop
improved people following and obstacle avoidance skills. We present
an algorithm that learns collections of typical trajectories that
characterize a person's motion patterns. Data, recorded by mobile
robots equipped with laser-range finders, is clustered into
different types of motion using the popular expectation maximization
algorithm while simultaneously learning multiple motion patterns.
Experimental results, obtained using data collected in a domestic
residence and in an office building, illustrate that highly
predictive models of human motion patterns can be learned.
Coordinating the motion of multiple mobile robots is one of the
fundamental problems in robotics. The predominant algorithms for
coordinating teams of robots are decoupled and prioritized, thereby
avoiding combinatorially hard planning problems typically faced by
centralized approaches. While these methods are very efficient,
they have two major drawbacks. First, they are incomplete,
i.e. they sometimes fail to find a solution even if one exists, and
second, the resulting solutions are often not optimal. We developed
a method for finding and optimizing priority schemes for such
prioritized and decoupled planning techniques. Existing approaches
apply a single priority scheme which makes them overly prone to
failure in cases where valid solutions exist. By searching in the
space of priorization schemes, our approach overcomes this
limitation. It performs a randomized search with hill-climbing to
find solutions and to minimize the overall path length. To focus the
search, our algorithm is guided by constraints generated from the