Publications by Silvia Cascianelli

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Silvia Cascianelli

A Workflow for Cost- and Time-Aware Refueling Itinerary Optimization

Authors: Savarese, Marco; Zaccagnino, Carmine; De Blasi, Antonio; Salici, Giacomo; Cascianelli, Silvia; Vezzani, Roberto; Grazia, Carlo Augusto

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO … (Read full abstract)

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO design. While prior work introduced the conceptual modules of RI-PIENO, their operational pipeline was not described in detail. This study makes the workflow explicit, covering the end-to-end process from CAN Bus data acquisition and stop detection to the construction of daily trip graphs, refueling optimization, and mileage prediction. By clarifying the sequence of operations, the contribution provides a reproducible and extensible foundation for future research and development.

2026 Relazione in Atti di Convegno

Alfie: Democratising RGBA image generation with no $$$

Authors: Quattrini, Fabio; Pippi, Vittorio; Cascianelli, Silvia; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Authors: Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference … (Read full abstract)

Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example is the generation of panorama images, which has been tackled in recent works by combining independent diffusion paths over overlapping latent features, which is referred to as joint diffusion, obtaining perceptually aligned panoramas. However, these methods often yield semantically incoherent outputs and trade-off diversity for uniformity. To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Extensive quantitative and qualitative experimental analysis, together with a user study, demonstrate that our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence. We release the code at https://github.com/aimagelab/MAD.

2025 Relazione in Atti di Convegno

RI-PIENO - Revised and Improved Petrol-Filling Itinerary Estimation aNd Optimization

Authors: Savarese, Marco; De Blasi, Antonio; Zaccagnino, Carmine; Salici, Giacomo; Cascianelli, Silvia; Vezzani, Roberto; Grazia, Carlo Augusto

Efficient energy provisioning is a fundamental requirement for modern transportation systems, making refueling path optimization a critical challenge. Existing solutions … (Read full abstract)

Efficient energy provisioning is a fundamental requirement for modern transportation systems, making refueling path optimization a critical challenge. Existing solutions often focus either on inter-vehicle communication or intravehicle monitoring, leveraging Intelligent Transportation Systems, Digital Twins, and Software-Defined Internet of Vehicles with Cloud/Fog/Edge infrastructures. However, integrated frameworks that adapt dynamically to driver mobility patterns are still underdeveloped. Building on our previous PIENO framework, we present RI-PIENO (Revised and Improved Petrolfilling Itinerary Estimation aNd Optimization), a system that combines intra-vehicle sensor data with external geospatial and fuel price information, processed via IoT-enabled Cloud/Fog services. RI-PIENO models refueling as a dynamic, time-evolving directed acyclic graph that reflects both habitual daily trips and real-time vehicular inputs, transforming the system from a static recommendation tool into a continuously adaptive decision engine. We validate RI-PIENO in a daily-commute use case through realistic multi-driver, multi-week simulations, showing that it achieves significant cost savings and more efficient routing compared to previous approaches. The framework is designed to leverage emerging roadside infrastructure and V2X communication, supporting scalable deployment within next-generation IoT and vehicular networking ecosystems.

2025 Relazione in Atti di Convegno

VATr++: Choose Your Words Wisely for Handwritten Text Generation

Authors: Vanherle, B.; Pippi, V.; Cascianelli, S.; Michiels, N.; Van Reeth, F.; Cucchiara, R.

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Styled Handwritten Text Generation (HTG) has received significant attention in recent years,propelled by the success of learning-based solutions employing GANs,Transformers,and,preliminarily,Diffusion … (Read full abstract)

Styled Handwritten Text Generation (HTG) has received significant attention in recent years,propelled by the success of learning-based solutions employing GANs,Transformers,and,preliminarily,Diffusion Models. Despite this surge in interest,there remains a critical yet understudied aspect - the impact of the input,both visual and textual,on the HTG model training and its subsequent influence on performance. This work extends the VATr [1] Styled-HTG approach by addressing the pre-processing and training issues that it faces,which are common to many HTG models. In particular,we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover,in this work,we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular,we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so,we aim to establish a foundation for fair and meaningful comparisons between HTG strategies,fostering progress in the field.

2025 Articolo su rivista

Zero-Shot Styled Text Image Generation, but Make It Autoregressive

Authors: Pippi, Vittorio; Quattrini, Fabio; Cascianelli, Silvia; Tonioni, Alessio; Cucchiara, Rita

2025 Relazione in Atti di Convegno

Binarizing Documents by Leveraging both Space and Frequency

Authors: Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. … (Read full abstract)

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.

2024 Relazione in Atti di Convegno

Embodied Agents for Efficient Exploration and Smart Scene Description

Authors: Bigazzi, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last … (Read full abstract)

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment by high-lighting the prominent objects and the correlation between them as encountered during the exploration. To quantitatively assess the performance of the proposed approach, we also devise a specific score that takes into account both exploration and description skills. The experiments carried out on both photorealistic simulated environments and real-world ones demonstrate that our approach can effectively describe the robot's point of view during exploration, improving the human-friendly interpretability of its observations.

2023 Relazione in Atti di Convegno

Evaluating synthetic pre-Training for handwriting processing tasks

Authors: Pippi, V.; Cascianelli, S.; Baraldi, L.; Cucchiara, R.

Published in: PATTERN RECOGNITION LETTERS

In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting … (Read full abstract)

In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks. To this end, we build a large synthetic dataset of word images rendered in several handwriting fonts, which offers a complete supervision sig-nal. We use it to train a simple convolutional neural network (ConvNet) with a fully supervised objective. The vector representations of the images obtained from the pre-trained ConvNet can then be consid-ered as encodings of the handwriting style. We exploit such representations for Writer Retrieval, Writer Identification, Writer Verification, and Writer Classification and demonstrate that our pre-training strat-egy allows extracting rich representations of the writers' style that enable the aforementioned tasks with competitive results with respect to task-specific State-of-the-Art approaches.& COPY; 2023 Elsevier B.V. All rights reserved.

2023 Articolo su rivista

From Show to Tell: A Survey on Deep Learning-based Image Captioning

Authors: Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cascianelli, Silvia; Fiameni, Giuseppe; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, large research efforts have been devoted … (Read full abstract)

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, large research efforts have been devoted to image captioning, i.e. describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview of image captioning approaches, from visual encoding and text generation to training strategies, datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in architectures and training strategies. Moreover, many variants of the problem and its open challenges are discussed. The final goal of this work is to serve as a tool for understanding the existing literature and highlighting the future directions for a research area where Computer Vision and Natural Language Processing can find an optimal synergy.

2023 Articolo su rivista
2 3 »

Page 1 of 6 • Total publications: 55