Séminaire InforTech « Multimodal learning »
Title : Multimodal learning: fusing knowledge from different modalities in an explainable way
Presenter: Otmane AMEL, doctorant au service ILIA de la Faculté Polytechnique de l’UMONS
Lieu: Ho 25 – Houdain
Abstract: This presentation gives an overview of multimodal learning algorithms for fusion, inspired by the human brain’s capacity to combine multiple sensory inputs. Our research spotlights the importance of robust fusion methods and effective modality encoders while highlighting key challenges in the field, including the integration of diverse data sources (image, text, RGB-D frames, etc.) and the need for model transparency. In this study, we introduce a novel multimodal framework that can be used for research and industry for leveraging multiple data modalities to improve performance over unimodal solutions. This framework not only addresses key questions, such as which encoders to use, how and when to fuse these modalities, but also provides the best suited fusion method. Additionally, it quantifies the contributions of each modality, helping to identify the most valuable data input and debug the model. We propose a fusion method called MultConcat that demonstrates outperforming results for two use cases to demonstrate the practical impact and applicability of multimodal learning: customs goods classification (image + text modalities) and dangerous action recognition (RGB-D modalities). The first use case concerns customs fraud detection in collaboration with the startup e-origin, and the second one « InfraSecure, » a railway construction safety project in collaboration with Infrabel. We also aim to apply the proposed framework to a third use case related to the medical field for Alzheimer diagnosis where multimodal data is available (AI4Brain FEDER project). This work in progress is under the collaboration of multiple institutions and research labs such as ISIA, ILIA, Neuroscience department of UMons and CHU Ambroise Paré.
7000 Mons, Belgium