Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

A Novel Attention-based Aggregation Function to Combine Vision and Language

Authors: Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The joint understanding of vision and language has been recently gaining a lot of attention in both the Computer Vision … (Read full abstract)

The joint understanding of vision and language has been recently gaining a lot of attention in both the Computer Vision and Natural Language Processing communities, with the emergence of tasks such as image captioning, image-text matching, and visual question answering. As both images and text can be encoded as sets or sequences of elements - like regions and words - proper reduction functions are needed to transform a set of encoded elements into a single response, like a classification or similarity score. In this paper, we propose a novel fully-attentive reduction method for vision and language. Specifically, our approach computes a set of scores for each element of each modality employing a novel variant of cross-attention, and performs a learnable and cross-modal reduction, which can be used for both classification and ranking. We test our approach on image-text matching and visual question answering, building fair comparisons with other reduction choices, on both COCO and VQA 2.0 datasets. Experimentally, we demonstrate that our approach leads to a performance increase on both tasks. Further, we conduct ablation studies to validate the role of each component of the approach.

2021 Relazione in Atti di Convegno

A Systematic Comparison of Depth Map Representations for Face Recognition

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Maltoni, Davide; Cucchiara, Rita

Published in: SENSORS

2021 Articolo su rivista

AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction

Authors: Bertugli, A.; Calderara, S.; Coscia, P.; Ballan, L.; Cucchiara, R.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. … (Read full abstract)

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. A key component of this task is represented by the inherently multi-modal nature of human paths which makes socially acceptable multiple futures when human interactions are involved. To this end, we propose a generative architecture for multi-future trajectory predictions based on Conditional Variational Recurrent Neural Networks (C-VRNNs). Conditioning mainly relies on prior belief maps, representing most likely moving directions and forcing the model to consider past observed dynamics in generating future positions. Human interactions are modelled with a graph-based attention mechanism enabling an online attentive hidden state refinement of the recurrent estimation. To corroborate our model, we perform extensive experiments on publicly-available datasets (e.g., ETH/UCY, Stanford Drone Dataset, STATS SportVU NBA, Intersection Drone Dataset and TrajNet++) and demonstrate its effectiveness in crowded scenes compared to several state-of-the-art methods.

2021 Articolo su rivista

Assessing the Role of Boundary-level Objectives in Indoor Semantic Segmentation

Authors: Amoroso, Roberto; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Providing fine-grained and accurate segmentation maps of indoor scenes is a challenging task with relevant applications in the fields of … (Read full abstract)

Providing fine-grained and accurate segmentation maps of indoor scenes is a challenging task with relevant applications in the fields of augmented reality, image retrieval, and personalized robotics. While most of the recent literature on semantic segmentation has focused on outdoor scenarios, the generation of accurate indoor segmentation maps has been partially under-investigated. With the goal of increasing the accuracy of semantic segmentation in indoor scenarios, we focus on the analysis of boundary-level objectives, which foster the generation of fine-grained boundaries between different semantic classes and which have never been explored in the case of indoor segmentation. In particular, we test and devise variants of both the Boundary and Active Boundary losses, two recent proposals which deal with the prediction of semantic boundaries. Through experiments on the NYUDv2 dataset, we quantify the role of such losses in terms of accuracy and quality of boundary prediction and demonstrate the accuracy gain of the proposed variants.

2021 Relazione in Atti di Convegno

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Authors: Monti, Alessio; Bertugli, Alessia; Calderara, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in … (Read full abstract)

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address the aforementioned aspects by proposing a new recurrent generative model that considers both single agents' future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and to integrate it with data about agents' possible future objectives. Our proposal is general enough to be applied to different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

2021 Relazione in Atti di Convegno

Estimating (and fixing) the Effect of Face Obfuscation in Video Recognition

Authors: Tomei, Matteo; Baraldi, Lorenzo; Bronzin, Simone; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

2021 Relazione in Atti di Convegno

Explore and Explain: Self-supervised Navigation and Recounting

Authors: Bigazzi, Roberto; Landi, Federico; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In … (Read full abstract)

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.

2021 Relazione in Atti di Convegno

FashionSearch++: Improving Consumer-to-Shop Clothes Retrieval with Hard Negatives

Authors: Morelli, Davide; Cornia, Marcella; Cucchiara, Rita

Published in: CEUR WORKSHOP PROCEEDINGS

Consumer-to-shop clothes retrieval has recently emerged in computer vision and multimedia communities with the development of architectures that can find … (Read full abstract)

Consumer-to-shop clothes retrieval has recently emerged in computer vision and multimedia communities with the development of architectures that can find similar in-shop clothing images given a query photo. Due to its nature, the main challenge lies in the domain gap between user-acquired and in-shop images. In this paper, we follow the most recent successful research in this area employing convolutional neural networks as feature extractors and propose to enhance the training supervision through a modified triplet loss that takes into account hard negative examples. We test the proposed approach on the Street2Shop dataset, achieving results comparable to state-of-the-art solutions and demonstrating good generalization properties when dealing with different settings and clothing categories.

2021 Relazione in Atti di Convegno

Foreword by general chairs

Authors: Cucchiara, R.; Del Bimbo, A.; Sclaroff, S.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2021 Relazione in Atti di Convegno

Foreword by general chairs

Authors: Cucchiara, R.; Del Bimbo, A.; Sclaroff, S.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2021 Relazione in Atti di Convegno

Page 14 of 51 • Total publications: 504