Publications
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
An Attention-Based Representation Distillation Baseline for Multi-label Continual Learning
Authors: Menabue, Martin; Frascaroli, Emanuele; Boschini, Matteo; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone
Published in: LECTURE NOTES IN COMPUTER SCIENCE
The field of Continual Learning (CL) has inspired numerous researchers over the years, leading to increasingly advanced countermeasures to the … (Read full abstract)
The field of Continual Learning (CL) has inspired numerous researchers over the years, leading to increasingly advanced countermeasures to the issue of catastrophic forgetting. Most studies have focused on the single-class scenario, where each example comes with a single label. The recent literature has successfully tackled such a setting, with impressive results. Differently, we shift our attention to the multi-label scenario, as we feel it to be more representative of real-world open problems. In our work, we show that existing state-of-the-art CL methods fail to achieve satisfactory performance, thus questioning the real advance claimed in recent years. Therefore, we assess both old-style and novel strategies and propose, on top of them, an approach called Selective Class Attention Distillation (SCAD). It relies on a knowledge transfer technique that seeks to align the representations of the student network – which trains continuously and is subject to forgetting – with the teacher ones, which is pretrained and kept frozen. Importantly, our method is able to selectively transfer the relevant information from the teacher to the student, thereby preventing irrelevant information from harming the student’s performance during online training. To demonstrate the merits of our approach, we conduct experiments on two different multi-label datasets, showing that our method outperforms the current state-of-the-art Continual Learning methods. Our findings highlight the importance of addressing the unique challenges posed by multi-label environments in the field of Continual Learning. The code of SCAD is available at https://github.com/aimagelab/SCAD-LOD-2024.
Architettura Software IoT per la Diagnosi e Identificazione dei Guasti a Misura d'Uomo
Authors: Bertoli, Annalisa
Negli ultimi anni, la complessità dei sistemi produttivi è aumentata significativamente a causa dei progressi nelle tecnologie derivanti dall'Industria 4.0, … (Read full abstract)
Negli ultimi anni, la complessità dei sistemi produttivi è aumentata significativamente a causa dei progressi nelle tecnologie derivanti dall'Industria 4.0, in particolare attraverso l'Internet of Things (IoT) e i big data. Questa evoluzione ha facilitato l'accesso senza precedenti a enormi quantità di dati, ma ha anche introdotto sfide nella raccolta dei dati e nella loro applicazione pratica per gli operatori che interagiscono con questi sistemi. Questa tesi presenta un'architettura IoT progettata per ambienti industriali reali, con l'obiettivo di dimostrare come i dati possano essere utilizzati efficacemente per monitorare le operazioni e i processi produttivi in tempo reale. L'approccio proposto migliora la capacità di rilevare e gestire i guasti, fornendo agli operatori le informazioni necessarie per prendere decisioni informate. Integrando sensori intelligenti e analisi avanzate, è possibile ottenere una visibilità dettagliata sullo stato del sistema, consentendo interventi di manutenzione tempestivi e preparando il terreno per future implementazioni di manutenzione predittiva. La ricerca include un'analisi di due casi di studio distinti, mostrando la versatilità dell'architettura in diverse applicazioni industriali. Illustra come l'utilizzo efficace dei dati possa ottimizzare l'efficienza operativa e ridurre i tempi di inattività, contribuendo così a una migliore gestione del sistema. Inoltre, questo approccio consente agli operatori umani di comprendere meglio i loro ambienti e di prendere decisioni autonome basate su informazioni in tempo reale.
Augmenting and Mixing Transformers with Synthetic Data for Image Captioning
Authors: Caffagni, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Published in: IMAGE AND VISION COMPUTING
Image captioning has attracted significant attention within the Computer Vision and Multimedia research domains, resulting in the development of effective … (Read full abstract)
Image captioning has attracted significant attention within the Computer Vision and Multimedia research domains, resulting in the development of effective methods for generating natural language descriptions of images. Concurrently, the rise of generative models has facilitated the production of highly realistic and high-quality images, particularly through recent advancements in latent diffusion models. In this paper, we propose to leverage the recent advances in Generative AI and create additional training data that can be effectively used to boost the performance of an image captioning model. Specifically, we combine real images with their synthetic counterparts generated by Stable Diffusion using a Mixup data augmentation technique to create novel training examples. Extensive experiments on the COCO dataset demonstrate the effectiveness of our solution in comparison to different baselines and state-of-the-art methods and validate the benefits of using synthetic data to augment the training stage of an image captioning model and improve the quality of the generated captions. Source code and trained models are publicly available at: https://github.com/aimagelab/synthcap_pp.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Authors: Cocchi, Federico; Moratelli, Nicholas; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. … (Read full abstract)
Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. They have recently garnered attention due to their capability to address complex tasks involving both modalities. However, their effectiveness is limited to the knowledge acquired during training, which restricts their practical utility. In this work, we introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources. Our proposed model, Reflective LLaVA (ReflectiVA), utilizes reflective tokens to dynamically determine the need for external knowledge and predict the relevance of information retrieved from an external database. Tokens are trained following a two-stage two-model training recipe. This ultimately enables the MLLM to manage external knowledge while preserving fluency and performance on tasks where external knowledge is not needed. Through our experiments, we demonstrate the efficacy of ReflectiVA for knowledge-based visual question answering, highlighting its superior performance compared to existing methods. Source code and trained models are publicly available at https://github.com/aimagelab/ReflectiVA.
AURALYS: smart glasses to improve audio selection and perception in educational and working contexts
Authors: Filippini, Gianluca; Borghi, Guido; Giliberti, Enrico; Damiani, Paola; Vezzani, Roberto
BarBeR - Barcode Benchmark Repository: Implementation and Reproducibility Notes
Authors: Vezzali, Enrico; Bolelli, Federico; Santi, Stefano; Grana, Costantino
This paper provides a detailed description of how to install, set up, and use "BarBeR" (Barcode Benchmark Repository) to reproduce … (Read full abstract)
This paper provides a detailed description of how to install, set up, and use "BarBeR" (Barcode Benchmark Repository) to reproduce the results presented in the ICPR 2024 paper "BarBeR: A Barcode Benchmarking Repository". The paper details the tests available in the repository and how the configuration parameters affect and influence experimental results.
BarBeR: A Barcode Benchmarking Repository
Authors: Vezzali, E.; Bolelli, F.; Santi, S.; Grana, C.
Published in: LECTURE NOTES IN COMPUTER SCIENCE
Since their invention in 1949, barcodes have remained the preferred method for automatic data capture, playing a crucial role in … (Read full abstract)
Since their invention in 1949, barcodes have remained the preferred method for automatic data capture, playing a crucial role in supply chain management. To detect a barcode in an image, multiple algorithms have been proposed in the literature, with a significant increase of interest in the topic since the rise of deep learning. However, research in the field suffers from many limitations, including the scarcity of public datasets and code implementations, which hampers the reproducibility and reliability of published results. For this reason, we developed "BarBeR" (Barcode Benchmark Repository), a benchmark designed for testing and comparing barcode detection algorithms. This benchmark includes the code implementation of various detection algorithms for barcodes, along with a suite of useful metrics. It offers a range of test setups and can be expanded to include any localization algorithm. In addition, we provide a large, annotated dataset of 8748 barcode images, combining multiple public barcode datasets with standardized annotation formats for both detection and segmentation tasks. Finally, we share the results obtained from running the benchmark on our dataset, offering valuable insights into the performance of different algorithms.
Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature
Authors: Caffagni, Davide; Cocchi, Federico; Mambelli, Anna; Tutrone, Fabio; Zanella, Marco; Cornia, Marcella; Cucchiara, Rita
Published in: CEUR WORKSHOP PROCEEDINGS
Transformer-based language models like BERT have revolutionized Natural Language Processing (NLP) research, but their application to historical languages remains underexplored. … (Read full abstract)
Transformer-based language models like BERT have revolutionized Natural Language Processing (NLP) research, but their application to historical languages remains underexplored. This paper investigates the adaptation of BERT-based embedding models for Latin, a language central to the study of the sacred texts of Christianity. Focusing on Jerome’s Vulgate, pre-Vulgate Latin translations of the Bible, and patristic commentaries such as Augustine’s De Genesi ad litteram, we address the challenges posed by Latin’s complex syntax, specialized vocabulary, and historical variations at the orthographic, morphological, and semantic levels. In particular, we propose fine-tuning existing BERT-based embedding models on annotated Latin corpora, using self-generated hard negatives to improve performance in detecting biblical references in early Christian literature in Latin. Experimental results demonstrate the ability of BERT-based models to identify citations of and allusions to the Bible(s) in ancient Christian commentaries while highlighting the complexities and challenges of this field. By integrating NLP techniques with humanistic expertise, this work provides a case study on intertextual analysis in Latin patristic works. It underscores the transformative potential of interdisciplinary approaches, advancing computational tools for sacred text studies and bridging the gap between philology and computational analysis.