Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

CaMEL: Mean Teacher Learning for Image Captioning

Authors: Barraco, Manuele; Stefanini, Matteo; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual … (Read full abstract)

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities. In this paper we present CaMEL, a novel Transformer-based architecture for image captioning. Our proposed approach leverages the interaction of two interconnected language models that learn from each other during the training phase. The interplay between the two language models follows a mean teacher learning paradigm with knowledge distillation. Experimentally, we assess the effectiveness of the proposed solution on the COCO dataset and in conjunction with different visual feature extractors. When comparing with existing proposals, we demonstrate that our model provides state-of-the-art caption quality with a significantly reduced number of parameters. According to the CIDEr metric, we obtain a new state of the art on COCO when training without using external data. The source code and trained models will be made publicly available at: https://github.com/aimagelab/camel.

2022 Relazione in Atti di Convegno

Catastrophic Forgetting in Continual Concept Bottleneck Models

Authors: Marconato, E.; Bontempo, G.; Teso, S.; Ficarra, E.; Calderara, S.; Passerini, A.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2022 Relazione in Atti di Convegno

Connected Components Labeling on Bitonal Images

Authors: Bolelli, Federico; Allegretti, Stefano; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2022 Relazione in Atti di Convegno

Continual Learning in Real-Life Applications

Authors: Graffieti, G; Borghi, G; Maltoni, D

Published in: IEEE ROBOTICS AND AUTOMATION LETTERS

Y Existing Continual Learning benchmarks only partially address the complexity of real-life applications, limiting the realism of learning agents. In … (Read full abstract)

Y Existing Continual Learning benchmarks only partially address the complexity of real-life applications, limiting the realism of learning agents. In this letter, we propose and focus on benchmarks characterized by common key elements of real-life scenarios, including temporally ordered streams as input data, strong correlation of samples in short time ranges, high data distribution drift over the long time frame, and heavy class unbalancing. Moreover, we enforce online training constraints such as the need for frequent model updates without the possibility of storing a large amount of past data or passing the dataset multiple times through the model. Besides, we introduce a novel hybrid approach based on Continual Learning, whose architectural elements and replay memory management proved to be useful and effective in the considered scenarios. The experimental validation carried out, including comparisons with existing methods and an ablation study, confirms the validity and the suitability of the proposed approach.

2022 Articolo su rivista

Continual semi-supervised learning through contrastive interpolation consistency

Authors: Boschini, Matteo; Buzzega, Pietro; Bonicelli, Lorenzo; Porrello, Angelo; Calderara, Simone

Published in: PATTERN RECOGNITION LETTERS

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed … (Read full abstract)

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on supervision suffices to outperform SOTA methods trained under full supervision.

2022 Articolo su rivista

Deep Segmentation of the Mandibular Canal: a New 3D Annotated Dataset of CBCT Volumes

Authors: Cipriano, Marco; Allegretti, Stefano; Bolelli, Federico; Di Bartolomeo, Mattia; Pollastri, Federico; Pellacani, Arrigo; Minafra, Paolo; Anesi, Alexandre; Grana, Costantino

Published in: IEEE ACCESS

Inferior Alveolar Nerve (IAN) canal detection has been the focus of multiple recent works in dentistry and maxillofacial imaging. Deep … (Read full abstract)

Inferior Alveolar Nerve (IAN) canal detection has been the focus of multiple recent works in dentistry and maxillofacial imaging. Deep learning-based techniques have reached interesting results in this research field, although the small size of 3D maxillofacial datasets has strongly limited the performance of these algorithms. Researchers have been forced to build their own private datasets, thus precluding any opportunity for reproducing results and fairly comparing proposals. This work describes a novel, large, and publicly available mandibular Cone Beam Computed Tomography (CBCT) dataset, with 2D and 3D manual annotations, provided by expert clinicians. Leveraging this dataset and employing deep learning techniques, we are able to improve the state of the art on the 3D mandibular canal segmentation. The source code which allows to exactly reproduce all the reported experiments is released as an open-source project, along with this article.

2022 Articolo su rivista

DeepFakes Have No Heart: A Simple rPPG-Based Method to Reveal Fake Videos

Authors: Boccignone, Giuseppe; Bursic, Sathya; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella; Patania, Sabrina

Published in: LECTURE NOTES IN COMPUTER SCIENCE

We present a simple, yet general method to detect fake videos displaying human subjects, generated via Deep Learning techniques. The … (Read full abstract)

We present a simple, yet general method to detect fake videos displaying human subjects, generated via Deep Learning techniques. The method relies on gauging the complexity of heart rate dynamics as derived from the facial video streams through remote photoplethysmography (rPPG). Features analyzed have a clear semantics as to such physiological behaviour. The approach is thus explainable both in terms of the underlying context model and the entailed computational steps. Most important, when compared to more complex state-of-the-art detection methods, results so far achieved give evidence of its capability to cope with datasets produced by different deep fake models.

2022 Relazione in Atti di Convegno

Differential Diagnosis of Alzheimer Disease vs. Mild Cognitive Impairment Based on Left Temporal Lateral Lobe Hypomethabolism on 18F-FDG PET/CT and Automated Classifiers

Authors: Nuvoli, S.; Bianconi, F.; Rondini, M.; Lazzarato, A.; Marongiu, A.; Fravolini, M. L.; Cascianelli, S.; Amici, S.; Filippi, L.; Spanu, A.; Palumbo, B.

Published in: DIAGNOSTICS

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT … (Read full abstract)

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT to improve the differential diagnosis between Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI). Procedures: We retrospectively analyzed a total of 150 consecutive patients who underwent diagnostic evaluation for suspected AD (n = 67) or MCI (n = 83). All patients received brain 18F-FDG PET/CT according to the international guidelines, and images were analyzed both Qualitatively (QL) and Quantitatively (QN), the latter by a fully automated post-processing software that produced a z score metabolic map of 25 anatomically different cortical regions. A subset of n = 122 cases with a confirmed diagnosis of AD (n = 53) or MDI (n = 69) by 18-24-month clinical follow-up was finally included in the study. Univariate analysis and three automated classification models (classification tree-ClT-, ridge classifier-RC- and linear Support Vector Machine -lSVM-) were considered to estimate the ability of the z scores to discriminate between AD and MCI cases in. Results: The univariate analysis returned 14 areas where the z scores were significantly different between AD and MCI groups, and the classification accuracy ranged between 74.59% and 76.23%, with ClT and RC providing the best results. The best classification strategy consisted of one single split with a cut-off value of approximate to -2.0 on the z score from temporal lateral left area: cases below this threshold were classified as AD and those above the threshold as MCI. Conclusions: Our findings confirm the usefulness of brain 18F-FDG PET/CT QL and QN analyses in differentiating AD from MCI. Moreover, the combined use of automated classifications models can improve the diagnostic process since its use allows identification of a specific hypometabolic area involved in AD cases in respect to MCI. This data improves the traditional 18F-FDG PET/CT image interpretation and the diagnostic assessment of cognitive disorders.

2022 Abstract in Rivista

Dress Code: High-Resolution Multi-Category Virtual Try-On

Authors: Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Prior … (Read full abstract)

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Prior work focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. To address this deficiency, we introduce Dress Code, which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. Extensive experimental evaluation demonstrates that the proposed approach surpasses the baselines and state-of-the-art competitors in terms of visual quality and quantitative results. The Dress Code dataset is publicly available at https://github.com/aimagelab/dress-code.

2022 Relazione in Atti di Convegno

Dress Code: High-Resolution Multi-Category Virtual Try-On

Authors: Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing … (Read full abstract)

Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024 x 768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. The Dress Code dataset is publicly available at https://github.com/aimagelab/dress-code.

2022 Relazione in Atti di Convegno

Page 26 of 106 • Total publications: 1054