Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

ToothSeg: Robust Tooth Instance Segmentation and Numbering in CBCT using Deep Learning and Self-Correction

Authors: Van Nistelrooij, Niels; Krämer, Lars; Kempers, Steven; Beyer, Michel; Bolelli, Federico; Xi, Tong; Bergé, Stefaan; Heil, ; Max, ; Maier-Hein, Klaus H.; Vinayahalingam, Shankeeth; Isensee, Fabian

Published in: IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

2025 Articolo su rivista

Towards on-device continual learning with Binary Neural Networks in industrial scenarios

Authors: Vorabbi, L.; Carraggi, A.; Maltoni, D.; Borghi, G.; Santi, S.

Published in: IMAGE AND VISION COMPUTING

This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within … (Read full abstract)

This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within the Internet of Things context. As deep learning continues to gain traction in IoT applications, the need for efficient models that can learn continuously from incremental data streams without requiring extensive computational resources has become more pressing. We propose a solution that integrates Continual Learning with BNNs, utilizing replay memory to prevent catastrophic forgetting. Our method focuses on quantized neural networks, introducing the quantization also for the backpropagation step, significantly reducing memory and computational requirements. Furthermore, we enhance the replay memory mechanism by storing intermediate feature maps (i.e. latent replay) with 1bit precision instead of raw data, enabling efficient memory usage. In addition to well-known benchmarks, we introduce the DL-Hazmat dataset, which consists of over 140k high-resolution grayscale images of 64 hazardous material symbols. Experimental results show a significant improvement in model accuracy and a substantial reduction in memory requirements, demonstrating the effectiveness of our method in enabling deep learning applications on embedded devices in real-world scenarios. Our work expands the application of Continual Learning and BNNs for efficient on-device training, offering a promising solution for IoT and other resource-constrained environments.

2025 Articolo su rivista

Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations

Authors: Capitani, Giacomo; Bonicelli, Lorenzo; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Published in: IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION

2025 Relazione in Atti di Convegno

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

Authors: D'Amelio, Alessandro; Cartella, Giuseppe; Cuculo, Vittorio; Lucchi, Manuele; Cornia, Marcella; Cucchiara, Rita; Boccignone, Giuseppe

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the … (Read full abstract)

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches.

2025 Relazione in Atti di Convegno

Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have recently emerged as a powerful framework for extending the capabilities of Large Language Models … (Read full abstract)

Multimodal Large Language Models (MLLMs) have recently emerged as a powerful framework for extending the capabilities of Large Language Models (LLMs) to reason over non-textual modalities. However, despite their success, understanding how they integrate visual and textual information remains an open challenge. Among them, LLaMA~3.2-Vision represents a significant milestone in the development of open-source MLLMs, offering a reproducible and efficient architecture that competes with leading proprietary models, such as Claude 3 Haiku and GPT-4o mini. Motivated by these characteristics, we conduct the first systematic analysis of the information flow between vision and language in LLaMA~3.2-Vision. We analyze three visual question answering (VQA) benchmarks, covering the tasks of VQA on natural images---using both open-ended and multiple-choice question formats---as well as document VQA. These tasks require diverse reasoning capabilities, making them well-suited to reveal distinct patterns in multimodal reasoning. Our analysis unveils a four-stage reasoning strategy: an initial semantic interpretation of the question, an early-to-mid-layer multimodal fusion, a task-specific reasoning stage guided by the resulting multimodal embedding, and a final answer prediction stage. Furthermore, we reveal that multimodal fusion is task-dependent: in complex settings such as document VQA, the model postpones cross-modal integration until semantic reasoning over the question has been established. Overall, our findings offer new insights into the internal dynamics of MLLMs and contribute to advancing the interpretability of vision-language architectures. Our source code is available at https://github.com/AImageLab/MLLMs-FlowTracker.

2025 Relazione in Atti di Convegno

Trajectory Forecasting Through Low-Rank Adaptation of Discrete Latent Codes

Authors: Benaglia, R.; Porrello, A.; Buzzega, P.; Calderara, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of … (Read full abstract)

Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents, e.g., basketball players engaged in intricate interactions with long-term intentions. Deep generative models offer a natural learning approach for trajectory forecasting, yet they encounter difficulties in achieving an optimal balance between sampling fidelity and diversity. We address this challenge by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs), which utilize a discrete latent space to tackle the issue of posterior collapse. Specifically, we introduce an instance-based codebook that allows tailored latent representations for each example. In a nutshell, the rows of the codebook are dynamically adjusted to reflect contextual information (i.e., past motion patterns extracted from the observed trajectories). In this way, the discretization process gains flexibility, leading to improved reconstructions. Notably, instance-level dynamics are injected into the codebook through low-rank updates, which restrict the customization of the codebook to a lower dimension space. The resulting discrete space serves as the basis of the subsequent step, which regards the training of a diffusion-based predictive model. We show that such a two-fold framework, augmented with instance-level discretization, leads to accurate and diverse forecasts, yielding state-of-the-art performance on three established benchmarks.

2025 Relazione in Atti di Convegno

U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation

Authors: Lumetti, Luca; Capitani, Giacomo; Ficarra, Elisa; Grana, Costantino; Calderara, Simone; Porrello, Angelo; Bolelli, Federico

Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical … (Read full abstract)

Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical applications. These models must be regularly updated to integrate new medical data and customized to meet evolving diagnostic standards, regulatory requirements, commercial needs, and privacy constraints. Model merging offers a promising solution, as it allows working with multiple specialized networks that can be created and combined dynamically instead of relying on monolithic models. While extensively studied in standard 2D classification, the potential of model merging for 3D segmentation remains unexplored. This paper presents an efficient framework that allows effective model merging in the domain of 3D image segmentation. Our approach builds upon theoretical analysis and encourages wide minima during pre-training, which we demonstrate to facilitate subsequent model merging. Using U-Net 3D, we evaluate the method on distinct anatomical structures with the ToothFairy2 and BTCV Abdomen datasets. To support further research, we release the source code and all the model weights in a dedicated repository: https://github.com/LucaLumetti/UNetTransplant

2025 Relazione in Atti di Convegno

Unravelling Neurodivergent Gaze Behaviour through Visual Attention Causal Graphs

Authors: Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cucchiara, Rita; Boccignone, Giuseppe

Can the very fabric of how we visually explore the world hold the key to distinguishing individuals with Autism Spectrum … (Read full abstract)

Can the very fabric of how we visually explore the world hold the key to distinguishing individuals with Autism Spectrum Disorder (ASD)? While eye tracking has long promised quantifiable insights into neurodevelopmental conditions, the causal underpinnings of gaze behaviour remain largely uncharted territory. Moving beyond traditional descriptive metrics of gaze, this study employs cutting-edge causal discovery methods to reconstruct the directed networks that govern the flow of attention across natural scenes. Given the well-documented atypical patterns of visual attention in ASD, particularly regarding socially relevant cues, our central hypothesis is that individuals with ASD exhibit distinct causal signatures in their gaze patterns, significantly different from those of typically developing controls. To our knowledge, this is the first study to explore the diagnostic potential of causal modeling of eye movements in uncovering the cognitive phenotypes of ASD and offers a novel window into the neurocognitive alterations characteristic of the disorder.

2025 Relazione in Atti di Convegno

Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Authors: Rinaldi, Filippo; Capitani, Giacomo; Bonicelli, Lorenzo; Crisostomi, Donato; Bolelli, Federico; Rodolà, Emanuele; Ficarra, Elisa; Calderara, Simone; Porrello, Angelo

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is … (Read full abstract)

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint.

2025 Relazione in Atti di Convegno

VATr++: Choose Your Words Wisely for Handwritten Text Generation

Authors: Vanherle, B.; Pippi, V.; Cascianelli, S.; Michiels, N.; Van Reeth, F.; Cucchiara, R.

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Styled Handwritten Text Generation (HTG) has received significant attention in recent years,propelled by the success of learning-based solutions employing GANs,Transformers,and,preliminarily,Diffusion … (Read full abstract)

Styled Handwritten Text Generation (HTG) has received significant attention in recent years,propelled by the success of learning-based solutions employing GANs,Transformers,and,preliminarily,Diffusion Models. Despite this surge in interest,there remains a critical yet understudied aspect - the impact of the input,both visual and textual,on the HTG model training and its subsequent influence on performance. This work extends the VATr [1] Styled-HTG approach by addressing the pre-processing and training issues that it faces,which are common to many HTG models. In particular,we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover,in this work,we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular,we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so,we aim to establish a foundation for fair and meaningful comparisons between HTG strategies,fostering progress in the field.

2025 Articolo su rivista

Page 11 of 106 • Total publications: 1054