Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

A Workflow for Cost- and Time-Aware Refueling Itinerary Optimization

Authors: Savarese, Marco; Zaccagnino, Carmine; De Blasi, Antonio; Salici, Giacomo; Cascianelli, Silvia; Vezzani, Roberto; Grazia, Carlo Augusto

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO … (Read full abstract)

The complete workflow of the RI-PIENO framework is presented, a system for refueling itinerary optimization that extends the original PIENO design. While prior work introduced the conceptual modules of RI-PIENO, their operational pipeline was not described in detail. This study makes the workflow explicit, covering the end-to-end process from CAN Bus data acquisition and stop detection to the construction of daily trip graphs, refueling optimization, and mileage prediction. By clarifying the sequence of operations, the contribution provides a reproducible and extensible foundation for future research and development.

2026 Relazione in Atti di Convegno

An Investigation on Incremental Learning from Unbalanced Streamed Data

Authors: Borghi, Guido; Graffieti, Gabriele; Vezzani, Roberto

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2026 Relazione in Atti di Convegno

CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction

Authors: Grasselli, Mattia; Porrello, Angelo; Grazia, Carlo Augusto

2026 Relazione in Atti di Convegno

DOLFIN: Balancing Stability and Plasticity in Federated Continual Learning

Authors: Moussadek, Omayma; Salami, Riccardo; Calderara, Simone

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously … (Read full abstract)

Federated continual learning (FCL) enables models to learn new tasks across multiple distributed clients, protecting privacy and without forgetting previously acquired knowledge. However, current methods face challenges balancing performance, privacy preservation, and communication efficiency. We introduce a Distributed Online LoRA for Federated INcremental learning methodDOLFIN, a novel approach combining Vision Transformers with low-rank adapters designed to efficiently and stably learn new tasks in federated environments. Our method leverages LoRA for minimal communication overhead and incorporates Dual Gradient Projection Memory (DualGPM) to prevent forgetting. Evaluated on CIFAR-100, ImageNet-R, ImageNet-A, and CUB-200 under two Dirichlet heterogeneity settings,DOLFINconsistently surpasses six strong baselines in final average accuracy while matching their memory footprint. Orthogonal low-rank adapters offer an effective and scalable solution for privacy-preserving continual learning in federated settings.

2026 Relazione in Atti di Convegno

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms … (Read full abstract)

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms remain largely underexplored. In his work, we introduce FG-TRACER, a framework designed to analyze the information flow between visual and textual modalities in MLLMs in free-form generation. Notably, our numerically stabilized computational method enables the first systematic analysis of multimodal information flow in underexplored domains such as image captioning and chain-of-thought (CoT) reasoning. We apply FG-TRACER to two state-of-the-art MLLMs—LLaMA 3.2-Vision and LLaVA 1.5—across three vision–language benchmarks—TextVQA, COCO 2014, and ChartQA—and we conduct a word-level analysis of multimodal integration. Our findings uncover distinct patterns of multimodal fusion across models and tasks, demonstrating that fusion dynamics are both model- and task-dependent. Overall, FG-TRACER offers a robust methodology for probing the internal mechanisms of MLLMs in free-form settings, providing new insights into their multimodal reasoning strategies. Our source code is publicly available at https://anonymous.4open.science/r/FG-TRACER-CB5A/.

2026

Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval

Authors: Caffagni, Davide.; Cocchi, Federico; Mambelli, Anna; Tutrone, Fabio; Zanella, Marco; Cornia, Marcella.; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual … (Read full abstract)

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual analysis. However, retrieval systems often struggle when training data are scarce, as is the case for low-resource languages or specialized domains such as ancient texts. To address this challenge, we propose a novel paradigm for domain-specific sentence similarity search, where the embedding space is shaped by a combination of limited real data and a large amount of synthetic data generated by Large Language Models (LLMs). Specifically, we employ LLMs to generate domain-specific sentence pairs and fine-tune a sentence embedding model, effectively distilling knowledge from the LLM to the retrieval model. We validate our method through a case study on biblical intertextuality in Latin, demonstrating that synthetic data augmentation significantly improves retrieval effectiveness in a domain with scarce annotated resources. More broadly, our approach offers a scalable and adaptable framework for enhancing retrieval in domain-specific contexts. Source code and trained models are available at https://github.com/aimagelab/biblical-retrieval-synthesis.

2026 Relazione in Atti di Convegno

Gradient-sign Masking for Task Vector Transport Across Pre-Trained Models

Authors: Rinaldi, Filippo; Panariello, Aniello; Salici, Giacomo; Liu, Fengyuan; Ciccone, Marco; Porrello, Angelo; Calderara, Simone

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same … (Read full abstract)

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning. We further show that transporting task vectors improves multi-task and multi-source model merging. Code is available at https://github.com/fillo-rinaldi/GradFix.

2026 Relazione in Atti di Convegno

Histological Brain Imaging Super-resolution with Frequency-guided Diffusion Models

Authors: Casari, Giovanni; Bolelli, Federico; Grana, Costantino

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and … (Read full abstract)

High-resolution histological imaging provides essential detail for quantitative brain modeling, yet acquiring whole-brain data at micrometer scale remains technically and economically challenging. This work introduces Brain-SR, a diffusion-based super-resolution framework designed to reconstruct high-resolution cortical sections from low-resolution BigBrain data. Building upon the InvSR paradigm, our method performs resolution enhancement in the latent space of a pretrained variational autoencoder, guided by a task-specific noise-predictor network. A key contribution is a frequency-domain supervision term that compares the magnitude spectra of predicted and target patches, enforcing spectral consistency while remaining robust to local misalignments. Quantitative evaluations demonstrate that Brain-SR achieves substantial improvements in LPIPS (-27%) and FID (-58%) compared to baseline diffusion Super-Resolution, while spectral analysis confirms accurate recovery of the frequency distribution. The resulting reconstructions preserve neuronal structures consistent with high-resolution references, offering a practical step toward large-scale, morphologically faithful brain histology reconstruction. The code is publicly available to support reproducibility: https://github.com/AImageLab-zip/Brain-SR.

2026 Relazione in Atti di Convegno

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Authors: Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations … (Read full abstract)

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body. In the context of fashion design, computer vision techniques have the potential to enhance and streamline the design process. Departing from prior research primarily focused on virtual try-on, this paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures. To address this problem, we propose extending latent diffusion models to incorporate these multiple modalities and modifying the structure of the denoising network, taking multimodal prompts as input. To condition the proposed architecture on fabric textures, we employ textual inversion techniques and let diverse cross-attention layers of the denoising network attend to textual and texture information, thus incorporating different granularity conditioning details. Given the lack of datasets for the task, we extend two existing fashion datasets, Dress Code and VITON-HD, with multimodal annotations. Experimental evaluations demonstrate the effectiveness of our proposed approach in terms of realism and coherence concerning the provided multimodal inputs.

2026 Articolo su rivista

PopEYE - Infrared Ocular Image Dataset for Eye State and Gaze-Direction Classification

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to … (Read full abstract)

The PopEYE dataset is a specialized collection of 14,976 near-infrared (NIR) images of the human eye region, specifically designed to support the development and benchmarking of computer vision algorithms for eye-state detection and coarse gaze-direction classification. Each image is provided in a fixed resolution of 772 × 520 pixels in 8-bit grayscale PNG format. The acquisition was performed frontally using a custom-developed Maxwellian-view optical configuration, consisting of a board-level CMOS camera and a specialized lens system where the subject's eye is precisely positioned at the focal point. This setup ensures a high-contrast representation of the anterior segment, making the pupil, iris, limbus, and portions of the sclera and eyelids clearly distinguishable under stable 850 nm infrared illumination. The dataset is categorized into six mutually exclusive classes identified through manual annotation supported by fixed visual aids and an expert system algorithm. The classification includes a correct positioning class for eyes open and properly aligned for clinical measurements (8,160 images), a closed class representing full eye closures such as blinks or sustained lid closure (1,790 images), and four directional classes representing gaze shifts relative to the central optical axis, specifically up (1,379 images), down (1,015 images), left (1,296 images), and right (1,336 images). The data captures the natural anatomical variability of 22 subjects and incorporates common real-world artifacts such as specular reflections from NIR sources and partial pupil occlusions by eyelashes or eyelids. By providing standardized labels and high-resolution NIR imagery, PopEYE serves as a robust resource for training machine learning models intended for real-time patient monitoring during ophthalmic examinations.

2026 Banca dati
2 3 »

Page 1 of 106 • Total publications: 1054