Goal-directed Imitation Learning from Humans (M. Muehlig) |
One problem in learning the representation of movements is to map the observed data to a meaningful description. This is not particularly important for movements that exhibit only qualitative character, like for instance gesture imitation. But when it comes to imitating goal-directed movements that involve the handling or manipulation of objects, it is important to extract a suitable description for the task. This is relevant for being able to transfer task knowledge to novel situations. Task representations should generalize over the variant parameters, and ideally only comprise the invariant parameters of the task. If the learned movement representation comprises for instance hand-object relations as compared to joint movements, it can be applied to objects at different locations and at different orientations in the workspace. This is a significant difference to joint-level imitation methods, which show no flexibility to adapt the acquired movement in different situations.
Further, it should be possible to use acquired movements also concurrently, or to superpose learnt movements with ones that are generated by other methods. Example: Performing a bi-manual task with a tool and an object, while controlling the robots’ gaze to look at the object with a tracking controller.
Another important aspect for suitable representations is its dimensionality. Robotics motion control schemes tend to become less robust with higher task dimensions, since the given constraints limit the movement capabilities. Fewer constraints increase the influence of criteria that can make the movement more robust. Therefore, we strive for an efficient representation of the task, so that the robustness and achievability of the movement is high.
The topic of imitation learning can be subdivided in two distinct areas. There is the field of symbolic imitation learning, where action sequences are represented in symbols, often in graph structures. The similarity of such behaviors is determined on this rather high symbolic level, and the reproduction of the behavior is rather independent of the underlying embodiment of the system. On the other side, there are the statistically driven, sensor-based approaches, which determine similarities more on the embodiment level, and reproduce the movements in a very similar way than acquired. In these approaches, the problem is to extract the “meaning” of the observed behavior, and to retarget it to the observing system.
For imitation learning, it is favorable to exploit the statistics of multi-modal sensor data, such as visual information (object tracking, body tracking with motion capture …), guided movements (kinesthetic teaching), auditory and tactile information. The research question to solve is how to integrate multi-modal data into the movement.
When looking at the development of humans, it becomes apparent that movements are acquired in a rather complex teaching - imitation procedure. Hereby, the teacher or demonstrator plays an important role. Involving the demonstrator into the imitation learning process offers the possibility to significantly improve the learning process, and to even teach movements that cannot be captured by data driven approaches. The project targets at employing the feedback of the demonstrator. Other than in the purely data-driven approaches, the imitation learning process can be organized as a tutor / system interaction cycle, in which the feedback of the tutor may be incorporated to enforce distinct elements of the movement. The tutor's feedback can effectively be incorporated in specifying what to learn, and how it is represented. It is a target of this project to develop and organize the imitation learning process by incorporating such elements. This poses the challenging question of how to incorporate reward signals interactively, and how to structure the interactive learning process to allow for an incremental knowledge acquisition.