Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

Authors: Fabbri, Matteo; Braso, Guillem; Maugeri, Gianluca; Cetintas, Orcun; Gasparini, Riccardo; Osep, Aljosa; Calderara, Simone; Leal-Taixe, Laura; Cucchiara, Rita

Published in: PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION

2021 Relazione in Atti di Convegno

Multi-Category Mesh Reconstruction From Image Collections

Authors: Simoni, Alessandro; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a … (Read full abstract)

Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released.

2021 Relazione in Atti di Convegno

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation

Authors: Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Corsini, Massimiliano; Cucchiara, Rita

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a … (Read full abstract)

Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. The goal gets even harder as the actions available to the agent get simpler and move towards low-level, atomic interactions with the environment. This setting takes the name of low-level VLN. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, long-term dependencies, and adaptability towards different locomotive settings. To that end, we devise "Perceive, Transform, and Act" (PTA): a fully-attentive VLN architecture that leaves the recurrent approach behind and the first Transformer-like architecture incorporating three different modalities -- natural language, images, and low-level actions for the agent control. In particular, we adopt an early fusion strategy to merge lingual and visual information efficiently in our encoder. We then propose to refine the decoding phase with a late fusion extension between the agent's history of actions and the perceptual modalities. We experimentally validate our model on two datasets: PTA achieves promising results in low-level VLN on R2R and achieves good performance in the recently proposed R4R benchmark. Our code is publicly available at https://github.com/aimagelab/perceive-transform-and-act.

2021 Articolo su rivista

Out of the Box: Embodied Navigation in the Real World

Authors: Bigazzi, Roberto; Landi, Federico; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The research field of Embodied AI has witnessed substantial progress in visual navigation and exploration thanks to powerful simulating platforms … (Read full abstract)

The research field of Embodied AI has witnessed substantial progress in visual navigation and exploration thanks to powerful simulating platforms and the availability of 3D data of indoor and photorealistic environments. These two factors have opened the doors to a new generation of intelligent agents capable of achieving nearly perfect PointGoal Navigation. However, such architectures are commonly trained with millions, if not billions, of frames and tested in simulation. Together with great enthusiasm, these results yield a question: how many researchers will effectively benefit from these advances? In this work, we detail how to transfer the knowledge acquired in simulation into the real world. To that end, we describe the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and propose a novel solution tailored towards the deployment in real-world scenarios. We then deploy our models on a LoCoBot, a Low-Cost Robot equipped with a single Intel RealSense camera. Different from previous work, our testing scene is unavailable to the agent in simulation. The environment is also inaccessible to the agent beforehand, so it cannot count on scene-specific semantic priors. In this way, we reproduce a setting in which a research group (potentially from other fields) needs to employ the agent visual navigation capabilities as-a-Service. Our experiments indicate that it is possible to achieve satisfying results when deploying the obtained model in the real world.

2021 Relazione in Atti di Convegno

RefiNet: 3D Human Pose Refinement with Depth Maps

Authors: D’Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely … (Read full abstract)

Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely investigated in the 2D domain, i.e. intensity images. Therefore, most of the available methods for this task are mainly based on 2D Convolutional Neural Networks and huge manually-annotated RGB datasets, achieving stunning results. In this paper, we propose RefiNet, a multi-stage framework that regresses an extremely-precise 3D human pose estimation from a given 2D pose and a depth map. The framework consists of three different modules, each one specialized in a particular refinement and data representation, i.e. depth patches, 3D skeleton and point clouds. Moreover, we present a new dataset, called Baracca, acquired with RGB, depth and thermal cameras and specifically created for the automotive context. Experimental results confirm the quality of the refinement procedure that largely improves the human pose estimations of off-the-shelf 2D methods.

2021 Relazione in Atti di Convegno

Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis

Authors: Poppi, Samuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular … (Read full abstract)

As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the right relevance to each input pixel with respect to the output of the network. In this paper, we focus on Class Activation Mapping (CAM) approaches, which provide an effective visualization by taking weighted averages of the activation maps. To enhance the evaluation and the reproducibility of such approaches, we propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches. To evaluate the appropriateness of the proposal, we compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.

2021 Relazione in Atti di Convegno

RMS-Net: Regression and Masking for Soccer Event Spotting

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2021 Relazione in Atti di Convegno

SHREC 2021: Skeleton-based hand gesture recognition in the wild

Authors: Caputo, Ariel; Giacchetti, Andrea; Soso, Simone; Pintani, Deborah; D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Simoni, Alessandro; Vezzani, Roberto; Cucchiara, Rita; Ranieri, Andrea; Giannini, Franca; Lupinetti, Katia; Monti, Marina; Maghoumi, Mehran; Laviola Jr, Joseph; Le, Minh-Quan; Nguyen, Hai-Dang; Tran, Minh-Triet

Published in: COMPUTERS & GRAPHICS

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of … (Read full abstract)

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of this contest was to test state-of-the-art methods that can be used to online detect command gestures from hands' movements tracking on a basic benchmark where simple gestures are performed interleaving them with other actions. Unlike previous contests and benchmarks on trajectory-based gesture recognition, we proposed an online gesture recognition task, not providing pre-segmented gestures, but asking the participants to find gestures within recorded trajectories. The results submitted by the participants show that an online detection and recognition of sets of very simple gestures from 3D trajectories captured with a cheap sensor can be effectively performed. The best methods proposed could be, therefore, directly exploited to design effective gesture-based interfaces to be used in different contexts, from Virtual and Mixed reality applications to the remote control of home devices.

2021 Articolo su rivista

Unifying tensor factorization and tensor nuclear norm approaches for low-rank tensor completion

Authors: Du, S.; Xiao, Q.; Shi, Y.; Cucchiara, R.; Ma, Y.

Published in: NEUROCOMPUTING

Low-rank tensor completion (LRTC) has gained significant attention due to its powerful capability of recovering missing entries. However, it has … (Read full abstract)

Low-rank tensor completion (LRTC) has gained significant attention due to its powerful capability of recovering missing entries. However, it has to repeatedly calculate the time-consuming singular value decomposition (SVD). To address this drawback, we, based on the tensor-tensor product (t-product), propose a new LRTC method-the unified tensor factorization (UTF)-for 3-way tensor completion. We first integrate the tensor factorization (TF) and the tensor nuclear norm (TNN) regularization into a framework that inherits the benefits of both TF and TNN: fast calculation and convex optimization. The conditions under which TF and TNN are equivalent are analyzed. Then, UTF for tensor completion is presented and an efficient iterative updated algorithm based on the alternate direction method of multipliers (ADMM) is used for our UTF optimization, and the solution of the proposed alternate minimization algorithm is also proven to be able to converge to a Karush–Kuhn–Tucker (KKT) point. Finally, numerical experiments on synthetic data completion and image/video inpainting tasks demonstrate the effectiveness of our method over other state-of-the-art tensor completion methods.

2021 Articolo su rivista

Vehicle and method for inspecting a railway line

Authors: Avizzano, Carlo Alberto; Borghi, Guido; Calderara, Simone; Cucchiara, Rita; Fedeli, Eugenio; Ermini, Mirko; Gonnelli, Mirco; Labanca, Giacomo; Frisoli, Antonio; Gasparini, Riccardo; Solazzi, Massimiliano; Tiseni, Luca; Leonardis, Daniele; Satler, Massimo

2021 Brevetto

Page 16 of 51 • Total publications: 505