Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Buffer-MIL: Robust Multi-instance Learning with a Buffer-Based Approach

Authors: Bontempo, G.; Lumetti, L.; Porrello, A.; Bolelli, F.; Calderara, S.; Ficarra, E.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Histopathological image analysis is a critical area of research with the potential to aid pathologists in faster and more accurate … (Read full abstract)

Histopathological image analysis is a critical area of research with the potential to aid pathologists in faster and more accurate diagnoses. However, Whole-Slide Images (WSIs) present challenges for deep learning frameworks due to their large size and lack of pixel-level annotations. Multi-Instance Learning (MIL) is a popular approach that can be employed for handling WSIs, treating each slide as a bag composed of multiple patches or instances. In this work we propose Buffer-MIL, which aims at tackling the covariate shift and class imbalance characterizing most of the existing histopathological datasets. With this goal, a buffer containing the most representative instances of each disease-positive slide of the training set is incorporated into our model. An attention mechanism is then used to compare all the instances against the buffer, to find the most critical ones in a given slide. We evaluate Buffer-MIL on two publicly available WSI datasets, Camelyon16 and TCGA lung cancer, outperforming current state-of-the-art models by 2.2% of accuracy on Camelyon16.

2023 Relazione in Atti di Convegno

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Authors: Di Nucci, D.; Simoni, A.; Tomei, M.; Ciuffreda, L.; Vezzani, R.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and … (Read full abstract)

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images. Despite their efficiency, NeRF models can pose challenges in certain scenarios such as vehicle inspection, where the lack of sufficient data or the presence of challenging elements (e.g. reflections) strongly impact the accuracy of the reconstruction. To this aim, we introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques. The dataset is publicly released at https://aimagelab.ing.unimore.it/go/ carpatch and can be used as an evaluation guide and as a baseline for future work on this challenging topic.

2023 Relazione in Atti di Convegno

Class-Incremental Continual Learning into the eXtended DER-verse

Authors: Boschini, Matteo; Bonicelli, Lorenzo; Buzzega, Pietro; Porrello, Angelo; Calderara, Simone

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks … (Read full abstract)

The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method – termed eXtended-DER (X-DER) – outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImageNet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g. the value of Knowledge Distillation and flatter minima in continual learning setups). We make our results fully reproducible; the codebase is available at https://github.com/aimagelab/mammoth.

2023 Articolo su rivista

Combining Identity Features and Artifact Analysis for Differential Morphing Attack Detection

Authors: Di Domenico, Nicolò; Borghi, Guido; Franco, Annalisa; Maltoni, Davide

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Due to the importance of the Morphing Attack, the development of new and accurate Morphing Attack Detection (MAD) systems is … (Read full abstract)

Due to the importance of the Morphing Attack, the development of new and accurate Morphing Attack Detection (MAD) systems is urgently needed by private and public institutions. In this context, D-MAD methods, i.e. detectors fed with a trusted live image and a probe tend to show better performance with respect to S-MAD approaches, that are based on a single input image. However, D-MAD methods usually leverage the identity of the two input face images only, and then present two main drawbacks: they lose performance when the two subjects look alike, and they do not consider potential artifacts left by the morphing procedure (which are instead typically exploited by S-MAD approaches). Therefore, in this paper, we investigate the combined use of D-MAD and S-MAD to improve detection performance through the fusion of the features produced by these two MAD approaches.

2023 Relazione in Atti di Convegno

Computer Vision in Human Analysis: From Face and Body to Clothes

Authors: Daoudi, Mohamed; Vezzani, Roberto; Borghi, Guido; Ferrari, Claudio; Cornia, Marcella; Becattini, Federico; Pilzer, Andrea

Published in: SENSORS

For decades, researchers of different areas, ranging from artificial intelligence to computer vision, have intensively investigated human-centered data, i.e., data … (Read full abstract)

For decades, researchers of different areas, ranging from artificial intelligence to computer vision, have intensively investigated human-centered data, i.e., data in which the human plays a significant role, acquired through a non-invasive approach, such as cameras. This interest has been largely supported by the highly informative nature of this kind of data, which provides a variety of information with which it is possible to understand many aspects including, for instance, the human body or the outward appearance. Some of the main tasks related to human analysis are focused on the body (e.g., human pose estimation and anthropocentric measurement estimation), the hands (e.g., gesture detection and recognition), the head (e.g., head pose estimation), or the face (e.g., emotion and expression recognition). Additional tasks are based on non-corporal elements, such as motion (e.g., action recognition and human behavior understanding) and clothes (e.g., garment-based virtual try-on and attribute recognition). Unfortunately, privacy issues severely limit the usage and the diffusion of this kind of data, making the exploitation of learning approaches challenging. In particular, privacy issues behind the acquisition and the use of human-centered data must be addressed by public and private institutions and companies. Thirteen high-quality papers have been published in this Special Issue and are summarized in the following: four of them are focused on the human face (facial geometry, facial landmark detection, and emotion recognition), two on eye image analysis (eye status classification and 3D gaze estimation), five on the body (pose estimation, conversational gesture analysis, and action recognition), and two on the outward appearance (transferring clothing styles and fashion-oriented image captioning). These numbers confirm the high interest in human-centered data and, in particular, the variety of real-world applications that it is possible to develop.

2023 Articolo su rivista

Consistency-Based Self-supervised Learning for Temporal Anomaly Localization

Authors: Panariello, A.; Porrello, A.; Calderara, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2023 Relazione in Atti di Convegno

DAS-MIL: Distilling Across Scales for MILClassification of Histological WSIs

Authors: Bontempo, Gianpaolo; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The adoption of Multi-Instance Learning (MIL) for classifying Whole-Slide Images (WSIs) has increased in recent years. Indeed, pixel-level annotation of … (Read full abstract)

The adoption of Multi-Instance Learning (MIL) for classifying Whole-Slide Images (WSIs) has increased in recent years. Indeed, pixel-level annotation of gigapixel WSI is mostly unfeasible and time-consuming in practice. For this reason, MIL approaches have been profitably integrated with the most recent deep-learning solutions for WSI classification to support clinical practice and diagnosis. Nevertheless, the majority of such approaches overlook the multi-scale nature of the WSIs; the few existing hierarchical MIL proposals simply flatten the multi-scale representations by concatenation or summation of features vectors, neglecting the spatial structure of the WSI. Our work aims to unleash the full potential of pyramidal structured WSI; to do so, we propose a graph-based multi-scale MIL approach, termed DAS-MIL, that exploits message passing to let information flows across multiple scales. By means of a knowledge distillation schema, the alignment between the latent space representation at different resolutions is encouraged while preserving the diversity in the informative content. The effectiveness of the proposed framework is demonstrated on two well-known datasets, where we outperform SOTA on WSI classification, gaining a +1.9% AUC and +3.3¬curacy on the popular Camelyon16 benchmark.

2023 Relazione in Atti di Convegno

Deep Learning and Large Scale Models for Bank Transactions

Authors: Garuti, Fabrizio; Luetto, Simone; Cucchiara, Rita; Sangineto, Enver

Published in: CEUR WORKSHOP PROCEEDINGS

The success of Artificial Intelligence (AI) in different research and application areas has increased the interest in adopting Deep Learning … (Read full abstract)

The success of Artificial Intelligence (AI) in different research and application areas has increased the interest in adopting Deep Learning techniques also in the financial field. Particularly interesting is the case of financial transactional data, which represent one of the most valuable sources of information for banks and other financial institutes. However, the heterogeneity of the data, composed of both numerical and categorical attributes, makes the use of standard Deep Learning methods difficult. In this paper, we present UniTTAB, a Transformer network for transactional time series, which can uniformly represent heterogeneous time-dependent data, and which is trained on a very large scale of real transactional data. As far as we know, the dataset we used for training is the largest real bank transactions dataset used for Deep Learning methods in this field, being all the other common datasets either much smaller or synthetically generated. The use of this very large real training dataset, makes our UniTTAB the first foundation model for transactional data.

2023 Relazione in Atti di Convegno

Depth-based 3D human pose refinement: Evaluating the refinet framework

Authors: D'Eusanio, A.; Simoni, A.; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R.

Published in: PATTERN RECOGNITION LETTERS

In recent years, Human Pose Estimation has achieved impressive results on RGB images. The advent of deep learning architectures and … (Read full abstract)

In recent years, Human Pose Estimation has achieved impressive results on RGB images. The advent of deep learning architectures and large annotated datasets have contributed to these achievements. However, little has been done towards estimating the human pose using depth maps, and especially towards obtaining a precise 3D body joint localization. To fill this gap, this paper presents RefiNet, a depth-based 3D human pose refinement framework. Given a depth map and an initial coarse 2D human pose, RefiNet regresses a fine 3D pose. The framework is composed of three modules, based on different data representations, i.e. 2D depth patches, 3D human skeletons, and point clouds. An extensive experimental evaluation is carried out to investigate the impact of the model hyper-parameters and to compare RefiNet with off-the-shelf 2D methods and literature approaches. Results confirm the effectiveness of the proposed framework and its limited computational requirements.

2023 Articolo su rivista

Detecting Morphing Attacks via Continual Incremental Training

Authors: Pellegrini, Lorenzo; Borghi, Guido; Franco, Annalisa; Maltoni, Davide

Scenarios in which restrictions in data transfer and storage limit the possibility to compose a single dataset – also exploiting … (Read full abstract)

Scenarios in which restrictions in data transfer and storage limit the possibility to compose a single dataset – also exploiting different data sources – to perform a batch-based training procedure, make the development of robust models particularly challenging. We hypothesize that the recent Continual Learning (CL) paradigm may represent an effective solution to enable incremental training, even through multiple sites. Indeed, a basic assumption of CL is that once a model has been trained, old data can no longer be used in successive training iterations and in principle can be deleted. Therefore, in this paper, we investigate the performance of different Continual Learning methods in this scenario, simulating a learning model that is updated every time a new chunk of data, even of variable size, is available. Experimental results reveal that a particular CL method, namely Learning without Forgetting (LwF), is one of the best-performing algorithms. Then, we investigate its usage and parametrization in Morphing Attack Detection and Object Classification tasks, specifically with respect to the amount of new training data that became available.

2023 Relazione in Atti di Convegno

Page 20 of 106 • Total publications: 1054