Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Diffusion and Autoregressive Deep Learning models for Transactional Data Generation
Authors: Garuti, Fabrizio; Luetto, Simone; Sangineto Lorenzo Forni, Enver; Cucchiara, Rita
Enabling On-Device Continual Learning with Binary Neural Networks and Latent Replay
Authors: Vorabbi, Lorenzo; Maltoni, Davide; Borghi, Guido; Santi, Stefano
On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is … (Read full abstract)
On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.
Enhancing Patch-Based Learning for the Segmentation of the Mandibular Canal
Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino
Published in: IEEE ACCESS
Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in … (Read full abstract)
Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in recent research endeavors. Deep learning techniques have shown promising results in this domain, yet their efficacy is still significantly hindered by the limited availability of 3D maxillofacial datasets. An inherent challenge is posed by the size of input volumes, which necessitates a patch-based processing approach that compromises the neural network performance due to the absence of global contextual information. This study introduces a novel approach that harnesses the spatial information within the extracted patches and incorporates it into a Transformer architecture, thereby enhancing the segmentation process through the use of prior knowledge about the patch location. Our method significantly improves the Dice score by a factor of 4 points, with respect to the previous work proposed by Cipriano et al., while also reducing the training steps required by the entire pipeline. By integrating spatial information and leveraging the power of Transformer architectures, this research not only advances the accuracy of IAC segmentation, but also streamlines the training process, offering a promising direction for improving dental and maxillofacial image analysis.
Fault Diagnosis and Identification in AGVs System
Authors: Bertoli, A.; Battilani, N.; Fantuzzi, C.
Published in: IFAC PAPERSONLINE
This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves … (Read full abstract)
This article describes a methodology for the diagnosis of failures in multi-AGV (Automatic Guided Vehicles). Today, AGVs are establishing themselves in the most advanced automatic logistics solutions, providing performance and safety that cannot be achieved with handling solutions with manual forklifts. Furthermore, thanks to the application of Industry 4.0 digital technologies, very advanced tools are available to monitor the performance and diagnose faults of fleets of AGV. In particular, studies on fault diagnosis have mainly focused on (1) the diagnosis of internal components of the automatic truck and (2) the identification of failures in the functionality of the AGV in its interaction with the surrounding environment. This paper shows an approach to fault diagnosis in multi-AGVs system, considering the interaction between each single AGV and the environment, with the scope to help the user increase the system efficiency in an existing layout. The objective of the paper is to introduce and discuss a methodology to study the failure and the available recovery actions of the AGV navigation system. Moreover, the paper presents the real AGV data acquisition and processing architecture actually deployed on the factory shop floor, as well as the result from the experimental study in a real industrial environment. Copyright (c) 2024 The Authors. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Fluent and Accurate Image Captioning with a Self-Trained Reward Model
Authors: Moratelli, Nicholas; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality … (Read full abstract)
Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic richness and tends to drive the model towards the style of ground-truth sentences, thus losing detail and specificity. On the contrary, recent attempts to employ image-text models like CLIP as reward have led to grammatically incorrect and repetitive captions. In this paper, we propose Self-Cap a captioning approach that relies on a learnable reward model based on self-generated negatives that can discriminate captions based on their consistency with the image. Specifically, our discriminator is a fine-tuned contrastive image-text model trained to promote caption correctness while avoiding the aberrations that typically happen when training with a CLIP-based reward. To this end, our discriminator directly incorporates negative samples from a frozen captioner, which significantly improves the quality and richness of the generated captions but also reduces the fine-tuning time in comparison to using the CIDEr score as the sole metric for optimization. Experimental results demonstrate the effectiveness of our training strategy on both standard and zero-shot image captioning datasets.
FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data
Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Haider Zama, Md; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Otroshi Shahreza, Hatef; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko
Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are … (Read full abstract)
Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.
FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems
Authors: Melzi, Pietro; Tolosana, Ruben; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; DeAndres-Tame, Ivan; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhao, Weisong; Zhu, Xiangyu; Yan, Zheyu; Zhang, Xiao-Yu; Wu, Jinlin; Lei, Zhen; Tripathi, Suvidha; Kothari, Mahak; Zama, Md Haider; Deb, Debayan; Biesseck, Bernardo; Vidal, Pedro; Granada, Roger; Fickel, Guilherme; Führ, Gustavo; Menotti, David; Unnervik, Alexander; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Rahimi, Parsa; Marcel, Sébastien; Sarridis, Ioannis; Koutlis, Christos; Baltsou, Georgia; Papadopoulos, Symeon; Diou, Christos; Di Domenico, Nicolò; Borghi, Guido; Pellegrini, Lorenzo; Mas-Candela, Enrique; Sánchez-Pérez, Ángela; Atzori, Andrea; Boutros, Fadi; Damer, Naser; Fenu, Gianni; Marras, Mirko
Published in: INFORMATION FUSION
This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state … (Read full abstract)
This article presents FRCSyn-onGoing, an ongoing challenge for face recognition where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases and standard experimental protocols. FRCSyn-onGoing is based on the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first face recognition international challenge aiming to explore the use of real and synthetic data independently, and also their fusion, in order to address existing limitations in the technology. Specifically, FRCSyn-onGoing targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. To enhance face recognition performance, FRCSyn-onGoing strongly advocates for information fusion at various levels, starting from the input data, where a mix of real and synthetic domains is proposed for specific tasks of the challenge. Additionally, participating teams are allowed to fuse diverse networks within their proposed systems to improve the performance. In this article, we provide a comprehensive evaluation of the face recognition systems and results achieved so far in FRCSyn-onGoing. The results obtained in FRCSyn-onGoing, together with the proposed public ongoing benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.