Publications by Silvia Cascianelli

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Silvia Cascianelli

Handwritten Text Generation from Visual Archetypes

Authors: Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen … (Read full abstract)

Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.

2023 Relazione in Atti di Convegno

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Authors: Pippi, V.; Cascianelli, S.; Kermorvant, C.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and … (Read full abstract)

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

2023 Relazione in Atti di Convegno

HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

Authors: Pippi, V.; Quattrini, F.; Cascianelli, S.; Cucchiara, R.

2023 Relazione in Atti di Convegno

Towards Explainable Navigation and Recounting

Authors: Poppi, Samuele; Rawal, Niyati; Bigazzi, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Explainability and interpretability of deep neural networks have become of crucial importance over the years in Computer Vision, concurrently with … (Read full abstract)

Explainability and interpretability of deep neural networks have become of crucial importance over the years in Computer Vision, concurrently with the need to understand increasingly complex models. This necessity has fostered research on approaches that facilitate human comprehension of neural methods. In this work, we propose an explainable setting for visual navigation, in which an autonomous agent needs to explore an unseen indoor environment while portraying and explaining interesting scenes with natural language descriptions. We combine recent advances in ongoing research fields, employing an explainability method on images generated through agent-environment interaction. Our approach uses explainable maps to visualize model predictions and highlight the correlation between the observed entities and the generated words, to focus on prominent objects encountered during the environment exploration. The experimental section demonstrates that our approach can identify the regions of the images that the agent concentrates on to describe its point of view, improving explainability.

2023 Relazione in Atti di Convegno

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

Authors: Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: ... IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, … (Read full abstract)

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, there has been an increasing interest in applying Artificial Intelligence techniques for virtually unwrapping and automatically detecting ink on the Herculaneum papyri collection. This collection consists of carbonized scrolls and fragments of documents, which have been digitized via X-ray tomography to allow the development of ad-hoc deep learning-based DDR solutions. In this work, we propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the challenging Herculaneum papyri, demonstrating its suitability via deep experimental analysis. To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation (https://github.com/aimagelab/vffc).

2023 Relazione in Atti di Convegno

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Authors: Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

2022 Articolo su rivista

CaMEL: Mean Teacher Learning for Image Captioning

Authors: Barraco, Manuele; Stefanini, Matteo; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual … (Read full abstract)

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities. In this paper we present CaMEL, a novel Transformer-based architecture for image captioning. Our proposed approach leverages the interaction of two interconnected language models that learn from each other during the training phase. The interplay between the two language models follows a mean teacher learning paradigm with knowledge distillation. Experimentally, we assess the effectiveness of the proposed solution on the COCO dataset and in conjunction with different visual feature extractors. When comparing with existing proposals, we demonstrate that our model provides state-of-the-art caption quality with a significantly reduced number of parameters. According to the CIDEr metric, we obtain a new state of the art on COCO when training without using external data. The source code and trained models will be made publicly available at: https://github.com/aimagelab/camel.

2022 Relazione in Atti di Convegno

Differential Diagnosis of Alzheimer Disease vs. Mild Cognitive Impairment Based on Left Temporal Lateral Lobe Hypomethabolism on 18F-FDG PET/CT and Automated Classifiers

Authors: Nuvoli, S.; Bianconi, F.; Rondini, M.; Lazzarato, A.; Marongiu, A.; Fravolini, M. L.; Cascianelli, S.; Amici, S.; Filippi, L.; Spanu, A.; Palumbo, B.

Published in: DIAGNOSTICS

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT … (Read full abstract)

Purpose: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain F-18-FDG PET/CT to improve the differential diagnosis between Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI). Procedures: We retrospectively analyzed a total of 150 consecutive patients who underwent diagnostic evaluation for suspected AD (n = 67) or MCI (n = 83). All patients received brain 18F-FDG PET/CT according to the international guidelines, and images were analyzed both Qualitatively (QL) and Quantitatively (QN), the latter by a fully automated post-processing software that produced a z score metabolic map of 25 anatomically different cortical regions. A subset of n = 122 cases with a confirmed diagnosis of AD (n = 53) or MDI (n = 69) by 18-24-month clinical follow-up was finally included in the study. Univariate analysis and three automated classification models (classification tree-ClT-, ridge classifier-RC- and linear Support Vector Machine -lSVM-) were considered to estimate the ability of the z scores to discriminate between AD and MCI cases in. Results: The univariate analysis returned 14 areas where the z scores were significantly different between AD and MCI groups, and the classification accuracy ranged between 74.59% and 76.23%, with ClT and RC providing the best results. The best classification strategy consisted of one single split with a cut-off value of approximate to -2.0 on the z score from temporal lateral left area: cases below this threshold were classified as AD and those above the threshold as MCI. Conclusions: Our findings confirm the usefulness of brain 18F-FDG PET/CT QL and QN analyses in differentiating AD from MCI. Moreover, the combined use of automated classifications models can improve the diagnostic process since its use allows identification of a specific hypometabolic area involved in AD cases in respect to MCI. This data improves the traditional 18F-FDG PET/CT image interpretation and the diagnostic assessment of cognitive disorders.

2022 Abstract in Rivista

Embodied Navigation at the Art Gallery

Authors: Bigazzi, Roberto; Landi, Federico; Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Embodied agents, trained to explore and navigate indoor photorealistic environments, have achieved impressive results on standard datasets and benchmarks. So … (Read full abstract)

Embodied agents, trained to explore and navigate indoor photorealistic environments, have achieved impressive results on standard datasets and benchmarks. So far, experiments and evaluations have involved domestic and working scenes like offices, flats, and houses. In this paper, we build and release a new 3D space with unique characteristics: the one of a complete art museum. We name this environment ArtGallery3D (AG3D). Compared with existing 3D scenes, the collected space is ampler, richer in visual features, and provides very sparse occupancy information. This feature is challenging for occupancy-based agents which are usually trained in crowded domestic environments with plenty of occupancy information. Additionally, we annotate the coordinates of the main points of interest inside the museum, such as paintings, statues, and other items. Thanks to this manual process, we deliver a new benchmark for PointGoal navigation inside this new space. Trajectories in this dataset are far more complex and lengthy than existing ground-truth paths for navigation in Gibson and Matterport3D. We carry on extensive experimental evaluation using our new space for evaluation and prove that existing methods hardly adapt to this scenario. As such, we believe that the availability of this 3D model will foster future research and help improve existing solutions.

2022 Relazione in Atti di Convegno

Focus on Impact: Indoor Exploration with Intrinsic Motivation

Authors: Bigazzi, Roberto; Landi, Federico; Cascianelli, Silvia; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: IEEE ROBOTICS AND AUTOMATION LETTERS

Exploration of indoor environments has recently experienced a significant interest, also thanks to the introduction of deep neural agents built … (Read full abstract)

Exploration of indoor environments has recently experienced a significant interest, also thanks to the introduction of deep neural agents built in a hierarchical fashion and trained with Deep Reinforcement Learning (DRL) on simulated environments. Current state-of-the-art methods employ a dense extrinsic reward that requires the complete a priori knowledge of the layout of the training environment to learn an effective exploration policy. However, such information is expensive to gather in terms of time and resources. In this work, we propose to train the model with a purely intrinsic reward signal to guide exploration, which is based on the impact of the robot’s actions on its internal representation of the environment. So far, impact-based rewards have been employed for simple tasks and in procedurally generated synthetic environments with countable states. Since the number of states observable by the agent in realistic indoor environments is non-countable, we include a neural-based density model and replace the traditional count-based regularization with an estimated pseudo-count of previously visited states. The proposed exploration approach outperforms DRL-based competitors relying on intrinsic rewards and surpasses the agents trained with a dense extrinsic reward computed with the environment layouts. We also show that a robot equipped with the proposed approach seamlessly adapts to point-goal navigation and real-world deployment.

2022 Articolo su rivista

Page 2 of 6 • Total publications: 55