Learning Efficient Policies for Vision-based Navigation

Cameras provide valuable information about the environment and are often used as sensor for localization to accomplish navigation tasks. However, fast movements of a mobile robot typically reduce the performance of vision-based localization systems due to motion blur. We used a reinforcement learning approach to select appropriate velocity values for vision-based navigation. The learned policy minimizes the time to reach the destination and implicitly takes the impact of motion blur on observations into account. To reduce the size of the resulting policies, which is desirable in the context of memory-constrained systems, we compress the learned policy via a clustering approach. Extensive simulated and real-world experiments demonstrate that our learned policy significantly outperforms any policy that uses a constant velocity.

In the future, we plan to apply our approach to the humanoid robot Nao, which has just arrived.


Related publication:

Videos:

  In our experiments, we used a Pioneer 2 robot with a top-mounted down-looking camera to observe the floor in front of the robot. In the experiment shown in this video (MPEG), the robot drives to its goal position using the learned policy. The current camera image with detected and matched landmarks as well as the robot's true (red) and the via UKF estimated (green) pose and the corresponding uncertainty are displayed. Depending on the distance and the angle to the goal as well as on the uncertainty in the belief about the robot's pose, the robot chooses appropriate values for the translational velocity to reach the destination as fast as possible.
  We furthermore performed experiments with an outdoor robot and used a scenario in which the robot had to traverse several waypoints. This video (MPEG) shows the robot driving to its goal using the learned policy. As can be seen, the robot reaches the goal fast and reliably.

Robust Recognition of Complex Gestures for Natural Human-Robot Interaction

Robotic assistants designed to coexist and communicate with humans in the real world should be able to interact with them in an intuitive way. This requires that the robots are able to recognize typical gestures performed by humans such as head shaking/nodding, hand waving, or pointing gestures. We present a system that is able to spot and recognize complex, parameterized gestures from monocular image sequences. To represent people, we locate their faces and hands using trained classifiers and track them over time. We use few, expressive features extracted out of this compact representation as input to hidden Markov models (HMMs). First, we segment gestures into distinct phases and train HMMs for each phase separately. Then, we construct composed HMMs, which consist of the individual phase-HMMs. Once a specific phase is recognized, we estimate the parameter of the current gesture, e.g., the target of a pointing gesture. As we demonstrate in the experiments, our method is able to robustly locate and track hands, despite of the fact that they can take a large number of substantially different shapes. Based on this, our system is able to reliably spot and recognize a variety of complex, parameterized gestures.

Related publications:

Videos:

  This video (XVID-MPEG4, AVI) shows that faces, facial features, and hands can be robustly tracked even under difficult and changing lighting conditions and given cluttered background. Our system reliably recognizes complex gestures. We only show the most likely recognized gesture in the video. (Click here for the video using an alternative codec.)
  We perfomed further experiments in a different envionment. In this video (XVID-MPEG4, AVI), we show the most likely gesture individually for the left and right hand, and for bi-manual gestures.

Multimodal Interaction between a Humanoid Robot and Humans

The purpose of our research is to develop a humanoid museum tour guide robot that performs intuitive, multimodal interaction with multiple people. Our robots Robotinho and Fritz use speech, facial expressions, eye-gaze, and gestures to interact with people. Depending on the audio-visual input, the robots shift their attention between different people in order to involve them into the conversation. Robotinho and Fritz perform human-like arm gestures during an interaction and also use pointing gestures generated with eyes, head, and arms to direct the attention of their communication partners towards objects of interest. To express the emotional state, the robot generate facial expressions and adapt the speech synthesis.

In contrast to its predecessor Fritz, Robotinho is not only used in a static scenario and possesses several new features such as the movable trunk, a more expressive head with movable eye lids, and an additional arm joint.

Related publications:

  • The Humanoid Museum Tour Guide Robotinho.
    F. Faber, M. Bennewitz, C. Eppner, A. Goeroeg, C. Gonsior, D. Joho, M. Schreiber, and S. Behnke.
    In: Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2009.
  • Intuitive Multimodal Interaction with Communication Robot Fritz. M. Bennewitz, F. Faber, D. Joho, and S. Behnke. In: M. Hackel, editor, Humanoid Robots, Human-like Machines, pp. 613-624, I-Tech Education and Publishing, 2007.
  • Fritz - A Humanoid Communication Robot. M. Bennewitz, F. Faber, D. Joho, and S. Behnke. In: Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2007.
  • Towards a Humanoid Museum Guide Robot that Interacts with Multiple Persons. M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2005.

Please check this webpage for further related publications in the year 2005.

Videos:

  Robotinho acting as a tourguide in the corridor of our university building. This video (wmv, 29 MB) shows how the robot guides visitors to exhibits and presents them. Robotinho uses multiple modalities to interact with people in a natural way.
  Before, Robotinho was used in a static scenario and explained some smaller robots which were placed on a table in the front. This video (wmv, 19 MB) shows the multimodal interaction between Robotinho and humans.
  Our communication robot Fritz explained some smaller robots to the visitors at the Science Days in the Europa-Park Rust, October 2006. This video (wmv, 28 MB) shows the interaction between Fritz and the visitors.

Metric Localization with Scale-Invariant Visual Features using a Single Perspective Camera

The Scale Invariant Feature Transform (SIFT) has become a popular feature extractor for vision-based applications. It has been successfully applied to metric localization and mapping using stereo vision and omnivision. We present an approach to Monte-Carlo localization using SIFT features for mobile robots equipped with a single perspective camera. First, we acquire a 2D grid map of the environment that contains the visual features. To come up with a compact environmental model, we appropriately down-sample the number of features in the final map. During localization, we cluster close-by particles and estimate for each cluster the set of potentially visible features in the map using ray-casting. These relevant map features are then compared to the features extracted from the current image. The observation model used to evaluate the individual particles considers the difference between the measured and the expected angle of similar features. In real-world experiments, we demonstrate that our technique is able to accurately track the position of a mobile robot. Moreover, we present experiments illustrating that a robot equipped with a different type of camera can use the same map of SIFT features for localization.

Related publication:

Animations:
  The animated gif (2 MB) shows the evolution of the particle clouds during an localization experiment. The blue dot corresponds to the true pose of the robot and the green dot indicates the pose resulting from odometry information.
  This video (wmv, 19 MB) shows the humanoid robot Max collecting data in an office environment. Since the robot was designed for playing soccer, its camera looks downwards. Thus, in the experiment shown here, Max has to bend backwards in order to observe the features used for localization in the environment.

Utilizing Learned Motion Patterns to Predict Positions of People

Whenever people move through their environments they do not move randomly. Instead, they usually follow specific trajectories or motion patterns corresponding to their intentions. Knowledge about such patterns enables a mobile robot to robustly keep track of persons in its environment and to improve its behavior. We propose a technique for learning collections of trajectories that characterize typical motion patterns of persons. Data recorded with laser-range finders is clustered using the expectation maximization algorithm. Based on the result of the clustering process we derive a Hidden Markov Model (HMM) that is applied to estimate the current and future positions of persons based on sensory input. We present several experiments carried out in different environments with a mobile robot equipped with a laser range scanner and a camera system. The results demonstrate that our approach can reliably learn motion patterns of persons, can robustly estimate and predict the positions of multiple persons, and can be used to improve the navigation behavior of a mobile robot.

Related publications:

Animations:

  See mpeg-video (4.7 MB) for an experiment with a single person. The video shows a scene overview (left hand side), the results from the people tracking system which is based on laser-range data (right hand side), as well as the HMM (bottom) which is used to maintain a belief of the robot over the positions of the person. In this case we do not use vision information because we assume only one person is moving in the environment. In the HMM the red dot corresponds to the position of the person provided by the laser tracking system. The size of the squares of the states of the HMM represent the probabilty that the person is currently in the corresponding state.

  See mpeg-video (12.8 MB) for an experiment with multiple persons. The video shows the camera images (left hand side) with the areas corresponding to a person detected by the laser tracking system, as well as one HMM (right hand side). The HMM shows the belief of the robot over the position of the person which enters the corridor as second (black trousers, blue shirt).

  See animated gif (5.9 MB) for an experiment with two persons. Whereas the upper image depicts the belief about the position of person 1 the lower image shows the belief about the position of person 2. The circles are detected features. The grey value of each circle represents the similarity to the person corresponding to the HMM (the darker the more likely). In the beginning the robot was quite certain that persons 1 and 2 were in the room containing resting place 3.

  See animated gif (3.4 MB) for an experiment with a moving robot. Here the robot traveled along the corridor and looked into one of the offices where it detected person A. Whereas the robot was initially rather uncertain as to where person A was, the probability of resting place 3 seriously increased after the detection.

Adapting Navigation Strategies Using Learned Motion Patterns of People

We propose a method for adapting the behavior of a mobile robot according to the activities of the people in its surrounding. Our approach uses learned motion patterns of persons. Whenever the robot detects a person it computes a probabilistic estimate about which motion pattern the person might be engaged in. During path planning it then uses this belief to improve its navigation behavior. In different practical experiments carried out on a real robot we demonstrate that our approach allows a robot to quickly adapt its navigation plans according to the activities of the persons in its surrounding.

Related publication:

Animations:
 
  • Our mobile robot Albert moves into a doorway to let a person pass by (mpg-video).
  • Albert moves forward and waits until the likelihood of interfering with the person is low enough (mpg-video).
  • Albert moves away from a doorway to let a person enter the corresponding room (mpg-video).

Learning Motion Patterns of People

We propose a method to learn typical motion behaviors of persons. As people move through their environments, they usually do not move randomly. Instead, they often engage in typical motion patterns, related to specific locations they might be interested in approaching and specific trajectories they might follow in doing so. Knowledge about such patterns may enable a mobile robot to develop improved people following and obstacle avoidance skills. We present an algorithm that learns collections of typical trajectories that characterize a person's motion patterns. Data, recorded by mobile robots equipped with laser-range finders, is clustered into different types of motion using the popular expectation maximization algorithm while simultaneously learning multiple motion patterns. Experimental results, obtained using data collected in a domestic residence and in an office building, illustrate that highly predictive models of human motion patterns can be learned.

Related publications:

Please check this webpage for further related publications in the years 2002-2004.

Animation:

  Video (mpg) showing the individual learning steps.

Prioritized Multi-robot Path Planning

Coordinating the motion of multiple mobile robots is one of the fundamental problems in robotics. The predominant algorithms for coordinating teams of robots are decoupled and prioritized, thereby avoiding combinatorially hard planning problems typically faced by centralized approaches. While these methods are very efficient, they have two major drawbacks. First, they are incomplete, i.e. they sometimes fail to find a solution even if one exists, and second, the resulting solutions are often not optimal. We developed a method for finding and optimizing priority schemes for such prioritized and decoupled planning techniques. Existing approaches apply a single priority scheme which makes them overly prone to failure in cases where valid solutions exist. By searching in the space of priorization schemes, our approach overcomes this limitation. It performs a randomized search with hill-climbing to find solutions and to minimize the overall path length. To focus the search, our algorithm is guided by constraints generated from the task specification.

Related publications:

Please check this webpage for further related publications in the years 2000-2001.

Animations:

  Experiment with the robots of the CS Freiburg
  • Uncoordinated motions (mpg-video)
  • Executing the computed collision-free paths (mpg-video)
  A team of 10 robots in a corridor environment (simulation)
  A team of 30 robots in a cluttered environment (simulation)