Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

Authors: Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: ... IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, … (Read full abstract)

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, there has been an increasing interest in applying Artificial Intelligence techniques for virtually unwrapping and automatically detecting ink on the Herculaneum papyri collection. This collection consists of carbonized scrolls and fragments of documents, which have been digitized via X-ray tomography to allow the development of ad-hoc deep learning-based DDR solutions. In this work, we propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the challenging Herculaneum papyri, demonstrating its suitability via deep experimental analysis. To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation (https://github.com/aimagelab/vffc).

2023 Relazione in Atti di Convegno

W2WNet: A two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality

Authors: Ponzio, F.; Macii, E.; Ficarra, E.; Di Cataldo, S.

Published in: EXPERT SYSTEMS WITH APPLICATIONS

Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. … (Read full abstract)

Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. Nonetheless, in many real-world scenarios, such high quality is very hard to obtain, and datasets may be affected by any sort of image degradation and mislabelling issues. This negatively impacts the performance of standard CNNs, both during the training and the inference phase. To address this issue we propose Wise2WipedNet (W2WNet), a new two-module Convolutional Neural Network, where a Wise module exploits Bayesian inference to identify and discard spurious images during the training and a Wiped module takes care of the final classification, while broadcasting information on the prediction confidence at inference time. The goodness of our solution is demonstrated on a number of public benchmarks addressing different image classification tasks, as well as on a real-world case study on histological image analysis. Overall, our experiments demonstrate that W2WNet is able to identify image degradation and mislabelling issues both at training and at inference time, with positive impact on the final classification accuracy.

2023 Articolo su rivista

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Authors: Barraco, Manuele; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an … (Read full abstract)

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net.

2023 Relazione in Atti di Convegno

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Authors: Zhang, J.; Sangineto, E.; Tang, H.; Siarohin, A.; Zhong, Z.; Sebe, N.; Wang, W.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce … (Read full abstract)

Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce realistic images representing rigid/semi-rigid objects, such as human faces or cars. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphics applications. This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis, which combines a GNeRF with a texture generator. The former learns an implicit 3D representation of the human body and outputs a set of 2D semantic segmentation masks. The latter transforms these semantic masks into a real image, adding a realistic texture to the human appearance. Without requiring additional 3D information, our model can learn 3D human representations with a photo-realistic, controllable generation. Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines. The code is available at https://github.com/zhangqianhui/3DSGAN.

2022 Relazione in Atti di Convegno

A Computational Approach for Progressive Architecture Shrinkage in Action Recognition

Authors: Tomei, Matteo; Baraldi, Lorenzo; Fiameni, Giuseppe; Bronzin, Simone; Cucchiara, Rita

Published in: SOFTWARE, PRACTICE AND EXPERIENCE

2022 Articolo su rivista

A survey on data integration for multi-omics sample clustering

Authors: Lovino, Marta; Randazzo, Vincenzo; Ciravegna, Gabriele; Barbiero, Pietro; Ficarra, Elisa; Cirrincione, Giansalvo

Published in: NEUROCOMPUTING

2022 Articolo su rivista

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

Authors: Messina, Nicola; Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Falchi, Fabrizio; Amato, Giuseppe; Cucchiara, Rita

Image-text matching is gaining a leading role among tasks involving the joint understanding of vision and language. In literature, this … (Read full abstract)

Image-text matching is gaining a leading role among tasks involving the joint understanding of vision and language. In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts. Nonetheless, it has a direct downstream application: cross-modal retrieval, which consists in finding images related to a given query text or vice-versa. Solving this task is of critical importance in cross-modal search engines. Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks. However, these models are often computationally expensive, especially at inference time. This prevents their adoption in large-scale cross-modal retrieval scenarios, where results should be provided to the user almost instantaneously. In this paper, we propose to fill in the gap between effectiveness and efficiency by proposing an ALign And DIstill Network (ALADIN). ALADIN first produces high-effective scores by aligning at fine-grained level images and texts. Then, it learns a shared embedding space – where an efficient kNN search can be performed – by distilling the relevance scores obtained from the fine-grained alignments. We obtained remarkable results on MS-COCO, showing that our method can compete with state-of-the-art VL Transformers while being almost 90 times faster. The code for reproducing our results is available at https://github.com/mesnico/ALADIN.

2022 Relazione in Atti di Convegno

Applications of AI and HPC in the Health Domain

Authors: Oniga, D.; Cantalupo, B.; Tartaglione, E.; Perlo, D.; Grangetto, M.; Aldinucci, M.; Bolelli, F.; Pollastri, F.; Cancilla, M.; Canalini, L.; Grana, C.; Alcalde, C. M.; Cardillo, F. A.; Florea, M.

2022 Capitolo/Saggio

Automated Prediction of Kidney Failure in IgA Nephropathy with Deep Learning from Biopsy Images

Authors: Testa, F.; Fontana, F.; Pollastri, F.; Chester, J.; Leonelli, M.; Giaroni, F.; Gualtieri, F.; Bolelli, F.; Mancini, E.; Nordio, M.; Sacco, P.; Ligabue, G.; Giovanella, S.; Ferri, M.; Alfano, G.; Gesualdo, L.; Cimino, S.; Donati, G.; Grana, C.; Magistroni, R.

Published in: CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY

Background and objectives Digital pathology and artificial intelligence offer new opportunities for automatic histologic scoring. We applied a deep learning … (Read full abstract)

Background and objectives Digital pathology and artificial intelligence offer new opportunities for automatic histologic scoring. We applied a deep learning approach to IgA nephropathy biopsy images to develop an automatic histologic prognostic score, assessed against ground truth (kidney failure) among patients with IgA nephropathy who were treated over 39 years. We assessed noninferiority in comparison with the histologic component of currently validated predictive tools. We correlated additional histologic features with our deep learning predictive score to identify potential additional predictive features. Design, setting, participants, & measurements Training for deep learning was performed with randomly selected, digitalized, cortical Periodic acid–Schiff–stained sections images (363 kidney biopsy specimens) to develop our deep learning predictive score. We estimated noninferiority using the area under the receiver operating characteristic curve (AUC) in a randomly selected group (95 biopsy specimens) against the gold standard Oxford classification (MEST-C) scores used by the International IgA Nephropathy Prediction Tool and the clinical decision supporting system for estimating the risk of kidney failure in IgA nephropathy. We assessed additional potential predictive histologic features against a subset (20 kidney biopsy specimens) with the strongest and weakest deep learning predictive scores. Results We enrolled 442 patients; the 10-year kidney survival was 78%, and the study median follow-up was 6.7 years. Manual MEST-C showed no prognostic relationship for the endocapillary parameter only. The deep learning predictive score was not inferior to MEST-C applied using the International IgA Nephropathy Prediction Tool and the clinical decision supporting system (AUC of 0.84 versus 0.77 and 0.74, respectively) and confirmed a good correlation with the tubolointerstitial score (r50.41, P,0.01). We observed no correlations between the deep learning prognostic score and the mesangial, endocapillary, segmental sclerosis, and crescent parameters. Additional potential predictive histopathologic features incorporated by the deep learning predictive score included (1)inflammation within areas of interstitial fibrosis and tubular atrophy and (2) hyaline casts. Conclusions The deep learning approach was noninferior to manual histopathologic reporting and considered prognostic features not currently included in MEST-C assessment.

2022 Articolo su rivista

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Authors: Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

2022 Articolo su rivista

Page 25 of 106 • Total publications: 1054