Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Authors: Barsellotti, Luca; Amoroso, Roberto; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Open-vocabulary semantic segmentation aims at segmenting arbitrary categories expressed in textual form. Previous works have trained over large amounts of … (Read full abstract)

Open-vocabulary semantic segmentation aims at segmenting arbitrary categories expressed in textual form. Previous works have trained over large amounts of image-caption pairs to enforce pixel-level multimodal alignments. However captions provide global information about the semantics of a given image but lack direct localization of individual concepts. Further training on large-scale datasets inevitably brings significant computational costs. In this paper we propose FreeDA a training-free diffusion-augmented method for open-vocabulary semantic segmentation which leverages the ability of diffusion models to visually localize generated concepts and local-global similarities to match class-agnostic regions with semantic classes. Our approach involves an offline stage in which textual-visual reference embeddings are collected starting from a large set of captions and leveraging visual and semantic contexts. At test time these are queried to support the visual matching process which is carried out by jointly considering class-agnostic regions and global semantic similarities. Extensive analyses demonstrate that FreeDA achieves state-of-the-art performance on five datasets surpassing previous methods by more than 7.0 average points in terms of mIoU and without requiring any training. Our source code is available at https://aimagelab.github.io/freeda/.

2024 Relazione in Atti di Convegno

Trends, Applications, and Challenges in Human Attention Modelling

Authors: Cartella, Giuseppe; Cornia, Marcella; Cuculo, Vittorio; D'Amelio, Alessandro; Zanca, Dario; Boccignone, Giuseppe; Cucchiara, Rita

Published in: IJCAI

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying … (Read full abstract)

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges.

2024 Relazione in Atti di Convegno

Unlearning Vision Transformers without Retaining Data via Low-Rank Decompositions

Authors: Poppi, Samuele; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

The implementation of data protection regulations such as the GDPR and the California Consumer Privacy Act has sparked a growing … (Read full abstract)

The implementation of data protection regulations such as the GDPR and the California Consumer Privacy Act has sparked a growing interest in removing sensitive information from pre-trained models without requiring retraining from scratch, all while maintaining predictive performance on remaining data. Recent studies on machine unlearning for deep neural networks have resulted in different attempts that put constraints on the training procedure and which are limited to small-scale architectures and with poor adaptability to real-world requirements. In this paper, we develop an approach to delete information on a class from a pre-trained model, by injecting a trainable low-rank decomposition into the network parameters, and without requiring access to the original training set. Our approach greatly reduces the number of parameters to train as well as time and memory requirements. This allows a painless application to real-life settings where the entire training set is unavailable, and compliance with the requirement of time-bound deletion. We conduct experiments on various Vision Transformer architectures for class forgetting. Extensive empirical analyses demonstrate that our proposed method is efficient, safe to apply, and effective in removing learned information while maintaining accuracy.

2024 Relazione in Atti di Convegno

Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images

Authors: Cartella, Giuseppe; Cuculo, Vittorio; Cornia, Marcella; Cucchiara, Rita

Published in: IEEE SIGNAL PROCESSING LETTERS

Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural … (Read full abstract)

Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural language of your desired output is all you need to obtain breathtaking results. However, as the use of generative models grows, so do concerns about the propagation of malicious content and misinformation. Consequently, the research community is actively working on the development of novel fake detection techniques, primarily focusing on low-level features and possible fingerprints left by generative models during the image generation process. In a different vein, in our work, we leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. To achieve this, we collect a novel dataset of partially manipulated images using diffusion models and conduct an eye-tracking experiment to record the eye movements of different observers while viewing real and fake stimuli. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images. Statistical findings reveal that, when perceiving counterfeit samples, humans tend to focus on more confined regions of the image, in contrast to the more dispersed observational pattern observed when viewing genuine images. Our dataset is publicly available at: https://github.com/aimagelab/unveiling-the-truth.

2024 Articolo su rivista

Video Surveillance and Privacy: A Solvable Paradox?

Authors: Cucchiara, Rita; Baraldi, Lorenzo; Cornia, Marcella; Sarto, Sara

Published in: COMPUTER

Video Surveillance started decades ago to remotely monitor specific areas and allow control from human inspectors. Later, Computer Vision gradually … (Read full abstract)

Video Surveillance started decades ago to remotely monitor specific areas and allow control from human inspectors. Later, Computer Vision gradually replaced human monitoring, firstly through motion alerts and now with Deep Learning techniques. From the beginning of this journey, people have worried about the risk of privacy violations. This article surveys the main steps of Computer Vision in Video Surveillance, from early approaches for people detection and tracking to action analysis and language description, outlining the most relevant directions on the topic to deal with privacy concerns. We show how the relationship between Video Surveillance and privacy is a biased paradox since surveillance provides increased safety but does not necessarily require the people identification. Through experiments on action recognition and natural language description, we showcase that the paradox of surveillance and privacy can be solved by Artificial Intelligence and that the respect of human rights is not an impossible chimera.

2024 Articolo su rivista

What’s Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU

Authors: Bernhard, Maximilian; Amoroso, Roberto; Kindermann, Yannic; Baraldi, Lorenzo; Cucchiara, Rita; Tresp, Volker; Schubert, Matthias

2024 Relazione in Atti di Convegno

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Authors: Caffagni, Davide; Cocchi, Federico; Moratelli, Nicholas; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to work beyond the pure textual … (Read full abstract)

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to work beyond the pure textual modality. As research is being carried out to design novel architectures and vision-and-language adapters in this paper we concentrate on endowing such models with the capability of answering questions that require external knowledge. Our approach termed Wiki-LLaVA aims at integrating an external knowledge source of multimodal documents which is accessed through a hierarchical retrieval pipeline. Relevant passages using this approach are retrieved from the external knowledge source and employed as additional context for the LLM augmenting the effectiveness and precision of generated dialogues. We conduct extensive experiments on datasets tailored for visual question answering with external data and demonstrate the appropriateness of our approach.

2024 Relazione in Atti di Convegno

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Authors: Di Nucci, D.; Simoni, A.; Tomei, M.; Ciuffreda, L.; Vezzani, R.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and … (Read full abstract)

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images. Despite their efficiency, NeRF models can pose challenges in certain scenarios such as vehicle inspection, where the lack of sufficient data or the presence of challenging elements (e.g. reflections) strongly impact the accuracy of the reconstruction. To this aim, we introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques. The dataset is publicly released at https://aimagelab.ing.unimore.it/go/ carpatch and can be used as an evaluation guide and as a baseline for future work on this challenging topic.

2023 Relazione in Atti di Convegno

Consistency-Based Self-supervised Learning for Temporal Anomaly Localization

Authors: Panariello, A.; Porrello, A.; Calderara, S.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2023 Relazione in Atti di Convegno

Deep Learning and Large Scale Models for Bank Transactions

Authors: Garuti, Fabrizio; Luetto, Simone; Cucchiara, Rita; Sangineto, Enver

Published in: CEUR WORKSHOP PROCEEDINGS

The success of Artificial Intelligence (AI) in different research and application areas has increased the interest in adopting Deep Learning … (Read full abstract)

The success of Artificial Intelligence (AI) in different research and application areas has increased the interest in adopting Deep Learning techniques also in the financial field. Particularly interesting is the case of financial transactional data, which represent one of the most valuable sources of information for banks and other financial institutes. However, the heterogeneity of the data, composed of both numerical and categorical attributes, makes the use of standard Deep Learning methods difficult. In this paper, we present UniTTAB, a Transformer network for transactional time series, which can uniformly represent heterogeneous time-dependent data, and which is trained on a very large scale of real transactional data. As far as we know, the dataset we used for training is the largest real bank transactions dataset used for Deep Learning methods in this field, being all the other common datasets either much smaller or synthetically generated. The use of this very large real training dataset, makes our UniTTAB the first foundation model for transactional data.

2023 Relazione in Atti di Convegno

Page 8 of 51 • Total publications: 504