How can we use constraints on adaptivity for the application of learning methods to large-scale systems ? How can we utilise multimodality, redundancy, and active perception in complex task scenarios for an acting system?
Future cognitive systems, especially service robots, will act, develop, and learn in environments which typically are only roughly known in advance and -- even more important -- vary over time.
The capabilities of such systems must become increasingly more sophisticated in order to account for tasks becoming more and more complex. Hence, cognitive systems need to have the ability to adapt their skills to their environments, to acquire new world and task knowledge, and to learn from their tutors to carry out new tasks.
In recent years, various different learning paradigms have been developed, including supervised learning, reinforcement learning, and unsupervised methods. While all paradigms achieve convincing results in certain areas of cognitive systems, they typically do not scale to complex real-world problems, because they require a huge amount of training data and learning time. Cognitive systems need to develop under various resource constraints. One of the most prominent ones is the ability to act and learn interactively with humans, which poses challenging computation time constraints.
Other constraints are given by the morphology of the robot. For example, which actuators and sensors are available, what are their limits, etc.? Finally, varying environmental conditions require a high robustness of the overall system and of its components. Considering current robotic systems with dozens of motor degrees of freedom, sensors, and cameras, it is important to develop innovative methods which structure the learning process to achieve these ambitious goals.
According to the limited resources considered, we distinguish four main approaches:
Attentive mechanisms can be used to focus processing to relevant aspects of the high-dimensional, multimodal input a system receives. Popular visually attentive systems have to be extended to include attention to actions, and to consider the current context. They have to be supplemented by other attentive mechanism to incorporate other modalities like speech, as well as tactile information. In order to fuse cues from several modalities, we need to develop methods to extract and represent the meaning of processes in order to focus learning on these semantically relevant aspects e.g. of actions.
Attentive mechanisms can also be used to adapt the plasticity, i.e. the learning rate, e.g. based on the confidence of the current perception and the already achieved quality of a skill.
Using single sensory cues heavily binds the robustness of perception. An interesting approach to tackle these limitations is by fusing several sensory modalities. However, the generation of actions and behaviour has not yet thoroughly been investigated in view of multimodal inputs. Interesting research questions in this area are:
The concept of parallel processing and subsequent fusion aiming at a higher robustness can also be extended to single components: Applying several conceptually different algorithms in parallel, e.g. for object recognition or motor command generation, and subsequent fusion of the processing results, a more robust overall behaviour of the system might be generated. Here, methods need to be developed to weight the different results, combine them to a final output, and eventually reject an outcome completely in case of extreme variability. In such a case the system could ask its human instructor for help.
Simulation is a powerful means to incorporate knowledge into technical systems, either through computational models, or through connectionist models (Neural networks, machine learning techniques ...). Once established, system models may be exploited to predict future situations. Such prediction and anticipation mechanisms play an important role in increasing the performance of technical systems. Anticipation of action and behaviour due to predicted changes in the environment is a key aspect to achieve highly efficient systems. This is closely related to the generation of models, as well as the representation of the system and its environment with respect to the aspects of interest.
Key questions are:
Purely developmental systems often require intensive explorations to achieve a good system performance. While biological organisms explore their bodies throughout their whole life (For instance, babies do reaching movements several million times until the age of 3 years), technical systems are limited in the number of trials. Hence, we need to provide initial perception and action primitives to a system, to generate basic behaviour. But, obviously we cannot prepare a robot for every possible situation, it might face in future. So we also need learning mechanisms which gradually extend these basic skills to more and more complex ones, forming a hierarchical web of skills.
The exploration of new skills -- in the first step -- can be done in simulation, as mentioned above. However, simulations always simplify the real world, so acquired skills subsequently need to be adapted to be applicable for real-world problems.
Arising questions are: