Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Segmenting Maxillofacial Structures in CBCT Volumes

Authors: Bolelli, Federico; Marchesini, Kevin; Van Nistelrooij, Niels; Lumetti, Luca; Pipoli, Vittorio; Ficarra, Elisa; Vinayahalingam, Shankeeth; Grana, Costantino

Cone-beam computed tomography (CBCT) is a standard imaging modality in orofacial and dental practices, providing essential 3D volumetric imaging of … (Read full abstract)

Cone-beam computed tomography (CBCT) is a standard imaging modality in orofacial and dental practices, providing essential 3D volumetric imaging of anatomical structures, including jawbones, teeth, sinuses, and neurovascular canals. Accurately segmenting these structures is fundamental to numerous clinical applications, such as surgical planning and implant placement. However, manual segmentation of CBCT scans is time-intensive and requires expert input, creating a demand for automated solutions through deep learning. Effective development of such algorithms relies on access to large, well-annotated datasets, yet current datasets are often privately stored or limited in scope and considered structures, especially concerning 3D annotations. This paper proposes ToothFairy2, a comprehensive, publicly accessible CBCT dataset with voxel-level 3D annotations of 42 distinct classes corresponding to maxillofacial structures. We validate the dataset by benchmarking state-of-the-art neural network models, including convolutional, transformer-based, and hybrid Mamba-based architectures, to evaluate segmentation performance across complex anatomical regions. Our work also explores adaptations to the nnU-Net framework to optimize multi-class segmentation for maxillofacial anatomy. The proposed dataset provides a fundamental resource for advancing maxillofacial segmentation and supports future research in automated 3D image analysis in digital dentistry.

2025 Relazione in Atti di Convegno

Segmenting the Inferior Alveolar Canal in CBCTs Volumes: the ToothFairy Challenge

Authors: Bolelli, Federico; Lumetti, Luca; Vinayahalingam, Shankeeth; Di Bartolomeo, Mattia; Pellacani, Arrigo; Marchesini, Kevin; Van Nistelrooij, Niels; Van Lierop, Pieter; Xi, Tong; Liu, Yusheng; Xin, Rui; Yang, Tao; Wang, Lisheng; Wang, Haoshen; Xu, Chenfan; Cui, Zhiming; Wodzinski, Marek Michal; Müller, Henning; Kirchhoff, Yannick; R., Rokuss Maximilian; Maier-Hein, Klaus; Han, Jaehwan; Kim, Wan; Ahn, Hong-Gi; Szczepański, Tomasz; Grzeszczyk Michal, K.; Korzeniowski, Przemyslaw; Caselles Ballester Vicent amd Paolo Burgos-Artizzu, Xavier; Prados Carrasco, Ferran; Berge’, Stefaan; Van Ginneken, Bram; Anesi, Alexandre; Re, ; Grana, Costantino

Published in: IEEE TRANSACTIONS ON MEDICAL IMAGING

In recent years, several algorithms have been developed for the segmentation of the Inferior Alveolar Canal (IAC) in Cone-Beam Computed … (Read full abstract)

In recent years, several algorithms have been developed for the segmentation of the Inferior Alveolar Canal (IAC) in Cone-Beam Computed Tomography (CBCT) scans. However, the availability of public datasets in this domain is limited, resulting in a lack of comparative evaluation studies on a common benchmark. To address this scientific gap and encourage deep learning research in the field, the ToothFairy challenge was organized within the MICCAI 2023 conference. In this context, a public dataset was released to also serve as a benchmark for future research. The dataset comprises 443 CBCT scans, with voxel-level annotations of the IAC available for 153 of them, making it the largest publicly available dataset of its kind. The participants of the challenge were tasked with developing an algorithm to accurately identify the IAC using the 2D and 3D-annotated scans. This paper presents the details of the challenge and the contributions made by the most promising methods proposed by the participants. It represents the first comprehensive comparative evaluation of IAC segmentation methods on a common benchmark dataset, providing insights into the current state-of-the-art algorithms and outlining future research directions. Furthermore, to ensure reproducibility and promote future developments, an open-source repository that collects the implementations of the best submissions was released.

2025 Articolo su rivista

Semantic Residual Prompts for Continual Learning

Authors: Menabue, M.; Frascaroli, E.; Boschini, M.; Sangineto, E.; Bonicelli, L.; Porrello, A.; Calderara, S.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Prompt-tuning methods for Continual Learning (CL) freeze a large pre-trained model and train a few parameter vectors termed prompts. Most … (Read full abstract)

Prompt-tuning methods for Continual Learning (CL) freeze a large pre-trained model and train a few parameter vectors termed prompts. Most of these methods organize these vectors in a pool of key-value pairs and use the input image as query to retrieve the prompts (values). However, as keys are learned while tasks progress, the prompting selection strategy is itself subject to catastrophic forgetting, an issue often overlooked by existing approaches. For instance, prompts introduced to accommodate new tasks might end up interfering with previously learned prompts. To make the selection strategy more stable, we leverage a foundation model (CLIP) to select our prompts within a two-level adaptation mechanism. Specifically, the first level leverages a standard textual prompt pool for the CLIP textual encoder, leading to stable class prototypes. The second level, instead, uses these prototypes along with the query image as keys to index a second pool. The retrieved prompts serve to adapt a pre-trained ViT, granting plasticity. In doing so, we also propose a novel residual mechanism to transfer CLIP semantics to the ViT layers. Through extensive analysis on established CL benchmarks, we show that our method significantly outperforms both state-of-the-art CL approaches and the zero-shot CLIP test. Notably, our findings hold true even for datasets with a substantial domain gap w.r.t. the pre-training knowledge of the backbone model, as showcased by experiments on satellite imagery and medical datasets. The codebase is available at https://github.com/aimagelab/mammoth.

2025 Relazione in Atti di Convegno

Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios

Authors: Pipoli, Vittorio; Bolelli, Federico; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita; Ficarra, Elisa

Published in: IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION

This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. … (Read full abstract)

This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. It presents two main contributions: (i) we introduce a novel prompt learning module which is designed to produce sample-specific prompts and (ii) we show that modality-agnostic prompts can effectively adjust to diverse missing modality scenarios. Our model, termed SCP, exploits the semantic representation of available modalities to query a learnable memory bank, which allows the generation of prompts based on the semantics of the input. Notably, SCP distinguishes itself from existing methodologies for its capacity of self-adjusting to both the missing modality scenario and the semantic context of the input, without prior knowledge about the specific missing modality and the number of modalities. Through extensive experiments, we show the effectiveness of the proposed prompt learning framework and demonstrate enhanced performance and robustness across a spectrum of missing modality cases.

2025 Relazione in Atti di Convegno

State-of-the-art Review and Benchmarking of Barcode Localization Methods

Authors: Vezzali, Enrico; Bolelli, Federico; Santi, Stefano; Grana, Costantino

Published in: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Barcodes, despite their long history, remain an essential technology in supply chain management. In addition, barcodes have found extensive use … (Read full abstract)

Barcodes, despite their long history, remain an essential technology in supply chain management. In addition, barcodes have found extensive use in industrial engineering, particularly in warehouse automation, component tracking, and robot guidance. To detect a barcode in an image, multiple algorithms have been proposed in the literature, with a significant increase of interest in the topic since the rise of deep learning. However, research in the field suffers from many limitations, including the scarcity of public datasets and code implementations which hinders the reproducibility and reliability of published results. For this reason, we developed ``BarBeR'' (Barcode Benchmark Repository), a benchmark designed for testing and comparing barcode detection algorithms. This benchmark includes the code implementation of various detection algorithms for barcodes, along with a suite of useful metrics. Among the supported localization methods, there are multiple deep-learning detection models, that will be used to assess the recent contributions of Artificial Intelligence to this field. In addition, we provide a large, annotated dataset of 8748 barcode images, combining multiple public barcode datasets with standardized annotation formats for both detection and segmentation tasks. Finally, we provide a thorough summary of the history and literature on barcode localization and share the results obtained from running the benchmark on our dataset, offering valuable insights into the performance of different algorithms when applied to real-world problems.

2025 Articolo su rivista

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Authors: Rossi, Daniel; Borghi, Guido; Vezzani, Roberto

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial … (Read full abstract)

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.

2025 Relazione in Atti di Convegno

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Authors: Barsellotti, Luca; Bianchi, Lorenzo; Messina, Nicola; Carrara, Fabio; Cornia, Marcella; Baraldi, Lorenzo; Falchi, Fabrizio; Cucchiara, Rita

Open-Vocabulary Segmentation (OVS) aims at segmenting images from free-form textual concepts without predefined training classes. While existing vision-language models such … (Read full abstract)

Open-Vocabulary Segmentation (OVS) aims at segmenting images from free-form textual concepts without predefined training classes. While existing vision-language models such as CLIP can generate segmentation masks by leveraging coarse spatial information from Vision Transformers, they face challenges in spatial localization due to their global alignment of image and text features. Conversely, self-supervised visual models like DINO excel in fine-grained visual encoding but lack integration with language. To bridge this gap, we present Talk2DINO, a novel hybrid approach that combines the spatial accuracy of DINOv2 with the language understanding of CLIP. Our approach aligns the textual embeddings of CLIP to the patch-level features of DINOv2 through a learned mapping function without the need to fine-tune the underlying backbones. At training time, we exploit the attention maps of DINOv2 to selectively align local visual patches with textual embeddings. We show that the powerful semantic and localization abilities of Talk2DINO can enhance the segmentation process, resulting in more natural and less noisy segmentations, and that our approach can also effectively distinguish foreground objects from the background. Experimental results demonstrate that Talk2DINO achieves state-of-the-art performance across several unsupervised OVS benchmarks.

2025 Relazione in Atti di Convegno

Taming Mambas for 3D Medical Image Segmentation

Authors: Lumetti, Luca; Marchesini, Kevin; Pipoli, Vittorio; Ficarra, Elisa; Grana, Costantino; Bolelli, Federico

Published in: IEEE ACCESS

Recently, the field of 3D medical segmentation has been dominated by deep learning models employing Convolutional Neural Networks (CNNs) and … (Read full abstract)

Recently, the field of 3D medical segmentation has been dominated by deep learning models employing Convolutional Neural Networks (CNNs) and Transformer-based architectures, each with its distinctive strengths and limitations. CNNs are constrained by a local receptive field, whereas Transformer are hindered by their substantial memory requirements as well as their data hunger, making them not ideal for processing 3D medical volumes at a fine-grained level. For these reasons, fully convolutional neural networks, as nnU-Net, still dominate the scene when segmenting medical structures in large 3D medical volumes. Despite numerous advancements toward developing transformer variants with subquadratic time and memory complexity, these models still fall short in content-based reasoning. A recent breakthrough is Mamba, a Recurrent Neural Network (RNN) based on State Space Models (SSMs), outperforming Transformers in many long-context tasks (million-length sequences) on famous natural language processing and genomic benchmarks while keeping a linear complexity. In this paper, we evaluate the effectiveness of Mamba-based architectures in comparison to state-of-the-art convolutional and Transformer-based models for 3D medical image segmentation across three well-established datasets: Synapse Abdomen, MSD BrainTumor, and ACDC. Additionally, we address the primary limitations of existing Mamba-based architectures by proposing alternative architectural designs, hence improving segmentation performances. The source code is publicly available to ensure reproducibility and facilitate further research: https://github.com/LucaLumetti/TamingMambas.

2025 Articolo su rivista

TONO: A Synthetic Dataset for Face Image Compliance to ISO/ICAO Standard

Authors: Borghi, Guido; Franco, Annalisa; Di Domenico, Nicolò; Maltoni, Davide

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

ToothFairy 2024 Preface

Authors: Bolelli, Federico; Lumetti, Luca; Vinayahalingam, Shankeeth; Di Bartolomeo, Mattia; Van Nistelrooij, Niels; Marchesini, Kevin; Anesi, Alexandre; Grana, Costantino

2025 Breve Introduzione

Page 10 of 106 • Total publications: 1054