Publications by Vittorio Cuculo

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Vittorio Cuculo

Decoding Facial Expressions in Video: A Multiple Instance Learning Perspective on Action Units

Authors: Del Gaudio, Livia; Cuculo, Vittorio; Cucchiara, Rita

Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to … (Read full abstract)

Facial expression recognition (FER) in video sequences is a longstanding challenge in affective computing and computer vision, particularly due to the temporal complexity and subtlety of emotional expressions. In this paper, we propose a novel pipeline that leverages facial Action Units (AUs) as structured time series descriptors of facial muscle activity, enabling emotion classification in videos through a Multiple Instance Learning (MIL) framework. Our approach models each video as a bag of AU-based instances, capturing localized temporal patterns, and allows for robust learning even when only coarse video-level emotion labels are available. Crucially, the approach incorporates interpretability mechanisms that highlight the temporal segments most influential to the final prediction, providing informed decision-making and facilitating downstream analysis. Experimental results on benchmark FER video datasets demonstrate that our method achieves competitive performance using only visual data, without requiring multimodal signals or frame-level supervision. This highlights its potential as an interpretable and efficient solution for weakly supervised emotion recognition in real-world scenarios.

2025 Relazione in Atti di Convegno

ECoGNet: an EEG-based Effective Connectivity Graph Neural Network for Brain Disorder Detection

Authors: Burger, Jacopo; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella

Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the … (Read full abstract)

Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the need for tools that can effectively capture these alterations. Effective Connectivity Networks (ECNs), which model causal interactions between brain regions, offer a promising approach to characterizing AD and FTD related neural changes. In this study, we estimate ECNs from EEG traces using a state-of-the-art causal discovery method specifically designed for time-series data, to recover the causal structure of the interactions between brain areas. The recovered ECNs are integrated into a novel Graph Neural Network architecture (ECoGNet), where nodes represent brain regions and edge features encode causal relationships. Our method combines ECNs with features summarizing local brain dynamics to improve AD and FTD detection. Evaluated on a publicly available EEG dataset, the proposed approach demonstrates superior performance compared to models that either use non-causal connectivity networks or omit connectivity information entirely.

2025 Relazione in Atti di Convegno

Enhancing rPPG Pulse-Signal Recovery by Facial Sampling and PSD Clustering

Authors: Grossi, Giuliano; Boccignone, Giuseppe; Conte, Donatello; Cuculo, Vittorio; D'Amelio, Alessandro; Lanzarotti, Raffaella

Published in: BIOMEDICAL SIGNAL PROCESSING AND CONTROL

2025 Articolo su rivista

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Authors: Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cornia, Marcella; Boccignone, Giuseppe; Cucchiara, Rita

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. … (Read full abstract)

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths. Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives. Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research.

2025 Relazione in Atti di Convegno

Pixels of Faith: Exploiting Visual Saliency to Detect Religious Image Manipulation

Authors: Cartella, G.; Cuculo, V.; Cornia, M.; Papasidero, M.; Ruozzi, F.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

Remote Respiration Measurement with RGB Cameras: A Review and Benchmark

Authors: Boccignone, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella; Patania, Sabrina

Published in: ACM COMPUTING SURVEYS

2025 Articolo su rivista

Sanctuaria-Gaze: A Multimodal Egocentric Dataset for Human Attention Analysis in Religious Sites

Authors: Cartella, Giuseppe; Cuculo, Vittorio; Cornia, Marcella; Papasidero, Marco; Ruozzi, Federico; Cucchiara, Rita

Published in: ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE

We introduce Sanctuaria-Gaze, a multimodal dataset featuring egocentric recordings from 40 visits to four architecturally and culturally significant sanctuaries in … (Read full abstract)

We introduce Sanctuaria-Gaze, a multimodal dataset featuring egocentric recordings from 40 visits to four architecturally and culturally significant sanctuaries in Northern Italy. Collected using wearable devices with integrated eye trackers, the dataset offers RGB videos synchronized with streams of gaze coordinates, head motion, and environmental point cloud, resulting in over four hours of recordings. Along with the dataset, we provide a framework for automatic detection and analysis of Areas of Interest (AOIs). This framework fills a critical gap by offering an open-source, flexible tool for gaze-based research that adapts to dynamic settings without requiring manual intervention. Our study analyzes human visual attention to sacred, architectural, and cultural objects, providing insights into how visitors engage with these elements and how their background influences their interactions. By releasing both the dataset and the analysis framework, Sanctuaria-Gaze aims to advance interdisciplinary research on gaze behavior, human-computer interaction, and visual attention in real-world environments. Code and dataset are available at https://github.com/aimagelab/Sanctuaria-Gaze.

2025 Articolo su rivista

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

Authors: D'Amelio, Alessandro; Cartella, Giuseppe; Cuculo, Vittorio; Lucchi, Manuele; Cornia, Marcella; Cucchiara, Rita; Boccignone, Giuseppe

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the … (Read full abstract)

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches.

2025 Relazione in Atti di Convegno

Unravelling Neurodivergent Gaze Behaviour through Visual Attention Causal Graphs

Authors: Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cucchiara, Rita; Boccignone, Giuseppe

Can the very fabric of how we visually explore the world hold the key to distinguishing individuals with Autism Spectrum … (Read full abstract)

Can the very fabric of how we visually explore the world hold the key to distinguishing individuals with Autism Spectrum Disorder (ASD)? While eye tracking has long promised quantifiable insights into neurodevelopmental conditions, the causal underpinnings of gaze behaviour remain largely uncharted territory. Moving beyond traditional descriptive metrics of gaze, this study employs cutting-edge causal discovery methods to reconstruct the directed networks that govern the flow of attention across natural scenes. Given the well-documented atypical patterns of visual attention in ASD, particularly regarding socially relevant cues, our central hypothesis is that individuals with ASD exhibit distinct causal signatures in their gaze patterns, significantly different from those of typically developing controls. To our knowledge, this is the first study to explore the diagnostic potential of causal modeling of eye movements in uncovering the cognitive phenotypes of ASD and offers a novel window into the neurocognitive alterations characteristic of the disorder.

2025 Relazione in Atti di Convegno

Pain and Fear in the Eyes: Gaze Dynamics Predicts Social Anxiety from Fear Generalisation

Authors: Patania, Sabrina; D’Amelio, Alessandro; Cuculo, Vittorio; Limoncini, Matteo; Ghezzi, Marco; Conversano, Vincenzo; Boccignone, Giuseppe

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2024 Relazione in Atti di Convegno
2 3 »

Page 1 of 4 • Total publications: 39