Liste

Visual Question Answering and Visual Turing Tests for Medical Imaging

Lay summary

Les récents développements de l'apprentissage approfondi (DL) ont été particulièrement passionnants dans le domaine médical, car ces méthodes ont apporté de nouvelles capacités impressionnantes. En même temps, la nature de ces méthodes a suscité de nombreuses inquiétudes chez les scientifiques et les médecins, quant à l'effet de ces méthodes, à leurs biais, et comment elles échouent. C'est pourquoi la compréhension du fonctionnement interne des méthodes DL est devenue un axe de recherche essentiel pour le traitement des images médicales, ainsi que pour d'autres domaines de recherche en intelligence artificielle.

Dans ce contexte, les méthodes de questions-réponses visuelles (VQA) qui interrogent le contenu d'une image au moyen d'une question textuelle explicite offrent une nouvelle voie passionnante pour discerner les rouages internes de DL. En effet, elles permettent de sonder directement le modèle par rapport aux données d'entrée de la même manière qu'un test de Turing. Cela permet donc de vérifier et d'identifier si les méthodes de DL raisonnent correctement sur les images - un objectif essentiel dans le domaine de l'imagerie médicale. Toutefois, à ce jour, les méthodes VQA dans le domaine de l'imagerie médicale sont mal adaptées, car (1) les recherches actuelles sur les VQA ont été largement menées par la communauté de la vision par ordinateur, la question de savoir comment exploiter les connaissances cliniques et spécifiques à un domaine reste inexplorée et (2) la méthodologie d'évaluation des VQA repose sur des statistiques de comptage simple ou des mesures de traitement du langage naturel qui ne sont pas liées aux tâches cliniques auxquelles les VQA en médecine seraient chargés de répondre.

En tant que tel, le projet actuel a deux objectifs. Le premier est de développer de nouvelles méthodes VQA qui peuvent raisonner sur le contenu des images médicales de manière similaire aux experts du domaine en prenant en compte les connaissances spécifiques au domaine provenant des applications médicales. Le second est de concevoir des méthodes qui peuvent valider et vérifier que les méthodes VQA sont évaluées par rapport à leurs objectifs cliniques de manière impartiale.

On espère que le projet donnera lieu à de nouveaux développements dans un domaine de recherche qui est soumis à la pression de donner plus d'informations aux cliniciens, ses utilisateurs finaux. Il a donc un potentiel d'impact majeur pour la communauté de recherche (technique et clinique) et les industries qui cherchent à commercialiser des méthodes de calcul d'images médicales. Pour faire le lien entre nos résultats et ces communautés, nous diffuserons nos résultats par le biais de publications, de sites web et de conférences, mais aussi par des défis algorithmiques menés par la communauté.

Abstract

Recent Deep Learning (DL) developments have been particularly exciting in the medical field, as large Neural Network methods have surged and brought impressive new capabilities. At the same time, the black-box nature of these has raised many concerns, by scientists and doctors, as to what these methods are doing, what their biases are, and when and where they ultimately fail. For this reason, understanding the inner workings of DL methods has becomes a vital research focus for medical image computing, as well as other research fields in Artificial Intelligence.

In this context, Visual Question and Answering (VQA) methods that query the content of an image by means of an explicit text question, offer an exciting new pathway to discern the inner workings of DL. In effect, they allow direct model probing with respect to the inputs in much the same way as a Turing Test. This hence offers an ability to verify and identify whether DL methods are reasoning about images appropriately - a critical objective in medical image computing. To this date however, VQA methods in medical image computing are ill-suited as (1) current research on VQAs has largely been driven by the computer vision community, whereby the question of how to leverage clinical and domain-specific knowledge remains unexplored and (2) the methodology by which VQAs are evaluated relies on simple count statistics or natural language processing metrics that do not relate to the clinical tasks VQAs in medicine would be tasked to answer.

As such, the current project has two goals. The first is to develop new VQA methods that can reason about the content of medical images in similar ways to domain experts by taking into account domain-specific knowledge from medical applications. The second is to design methods that can validate and verify that VQA methods are evaluated with respect to their clinical objectives in an unbiased way.To do so, our project proposes, on the one hand, to develop new computational VQA methods that are tailored for medical applications. These will include Question and Answering modules that can be added to existing DL methods, methods to integrate domain-specific knowledge into VQAs as well as VQAs for sequential reasoning. To validate VQAs, we will explore the recent Visual Turing Test (VTT) framework and re-focus its use for concept validation in medical domains. Here we will design methods for concept-centric questioning policies, VTTs for sequential reasoning and designing VTTs that can cope with broader VQA answering spectrums.

By the projects end, we will have designed novel methods that can provide in depth understanding of DL functioning in VQA methods and ensured representative evaluations of these for concept-oriented objectives. Ultimately, the project will yield insightful new developments in a research area that is under pressure to give more information to clinicians, its ultimate users. Hence, it has major impact potential for the research community (technical and clinical) and industries looking to commercialize medical image computing methods. To bridge our results to these communities we will disseminate our results through publications, websites and conferences, but also through community led algorithmic challenges.

Last updated:18.07.2023

SNSF
Project funding (Div. I-III)
Original data source 191983 i

Information Technology
Mathematics, Natural- and Engineering Sciences;Engineering Sciences

1 People

Prof.Raphael Sznitman

We help you find the perfect fit.

Lay summary

Abstract