Liste

Unupervised Learning of Interactions from Real Data

Lay summary

Soggetto e obbiettivo
L'obbiettivo di questo progetto e' di costruire degli agenti capaci di operare nel mondo reale (in altre parole, capaci di coordinare azioni, come la navigazione attorno ad ostacoli o di manipolare oggetti) senza diretta supervisione. Nel nostro contesto, un agente e' una macchina che puo' osservare dati attraverso vari sensori (per es., telecamere e microfoni) e che puo' fare azioni come cambiare la sua posizione oppure interagire con oggetti. Siamo interessati a costruire agenti che possono sviluppare queste abilita' imparando dalla osservazione indiretta e passiva delle interazioni di altri agenti da dati reali (e quindi non simulati).

Contesto socio-scientifico
Ci si aspetta che questi agenti sviluppino autonomamente l'abilita' di riconoscere e localizzare oggetti, di riconoscere e predire le loro dinamiche e di riconoscere le loro interazioni.
Questo rendera' possibile l'automatizzazione di lavori che al momento richiedono una costosa annotazione manuale o che non possono essere annotati a causa di confidenzialita' dei dati (banche, assicurazioni, militari) o della mancanza di esperti (nel campo medico).

Abstract

Our ultimate goal is to build an agent that learns to operate in the real world (i.e., to plan actions, such as navigation around obstacles or to grab objects) without direct supervision. In our own definition, an agent is a machine that can observe data through several sensors (e.g., cameras and microphones) and that can perform actions such as changing its position or interacting with objects. An agent can learn either by direct interaction or through passive observation of the interactions between other agents. In this proposal we focus on the latter learning strategy. We investigate methods for an agent to build a representation of the observed data, i.e., an internal model of the environment, the agents in it and their interactions. In contrast to related work we do not learn from simulated data, where the agent is aware of its own actions, but directly from real data. The agent learns this representation by making and validating predictions of the observations with it. In particular, the learned representation aims to predict the consequences of actions and to determine what actions are needed to produce changes to other agents.
To tackle this goal, a first aspect is to choose the data to learn from. Unfortunately, collecting examples with labels, which are needed to train with supervised learning methods, does not seem a viable solution. Manual annotation of objects and actions is quite costly, error-prone, time-consuming, may be ill-defined, and may introduce undesired bias into the training. A second aspect is whether the proposed “passive” learning is even possible and thus if one should instead focus on learning through direct interaction, i.e., with a physical agent (a robot) or a simulated one (e.g., in a videogame). Using a robot to learn through direct interaction with the real world is challenging, because the interaction process requires either a very long time (physics and technology limit the speed of operation of a robot) or working with several robots in parallel, which is costly. Working with simulations faces instead limitations due to the gap between the simulated and the real environments. Moreover, current research in self-supervised learning, representation learning and disentangling of factors of variations shows that passive learning is possible and its full potential is still largely untapped.
Our proposed approach is to make an agent learn about objects and other agents through passive observation of their interactions. Our approach is to use existing datasets of real images and video sequences and to learn representations of the environment (e.g., global attributes such as illumination and the point of view of the agent), of the objects (attributes such as their location, pose, category, 3D surface, appearance - texture, materials -) and of the actions (e.g., actions can be associated to changes of the object attributes). We expect that solving the above objectives will have a strong impact in both science and industry. Building object representations that will enable the detection, prediction and learning from object interactions without human annotation has the potential to solve machine learning problems at a large scale without data privacy concerns (e.g., in the medical and military fields, in the banking and insurance industry).

Last updated:17.07.2023

SNSF
Project funding (Div. I-III)
Original data source 188690 i

Information Technology
Mathematics, Natural- and Engineering Sciences;Engineering Sciences

1 People

Paolo Favaro

We help you find the perfect fit.

Lay summary

Abstract