Publications by Roberto Vezzani

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Roberto Vezzani

Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output … (Read full abstract)

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to traditional cameras, their use is partially prevented by the limited applicability of traditional data processing and vision algorithms. To this aim, we present a framework which exploits the output stream of event cameras to synthesize RGB frames, relying on an initial or a periodic set of color key-frames and the sequence of intermediate events. Differently from existing work, we propose a deep learning-based frame synthesis method, consisting of an adversarial architecture combined with a recurrent module. Qualitative results and quantitative per-pixel, perceptual, and semantic evaluation on four public datasets confirm the quality of the synthesized images.

2020 Relazione in Atti di Convegno

Mercury: a vision-based framework for Driver Monitoring

Authors: Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

In this paper, we propose a complete framework, namely Mercury, that combines Computer Vision and Deep Learning algorithms to continuously … (Read full abstract)

In this paper, we propose a complete framework, namely Mercury, that combines Computer Vision and Deep Learning algorithms to continuously monitor the driver during the driving activity. The proposed solution complies to the require-ments imposed by the challenging automotive context: the light invariance, in or-der to have a system able to work regardless of the time of day and the weather conditions. Therefore, infrared-based images, i.e. depth maps (in which each pixel corresponds to the distance between the sensor and that point in the scene), have been exploited in conjunction with traditional intensity images. Second, the non-invasivity of the system is required, since driver’s movements must not be impeded during the driving activity: in this context, the use of camer-as and vision-based algorithms is one of the best solutions. Finally, real-time per-formance is needed since a monitoring system must immediately react as soon as a situation of potential danger is detected.

2020 Relazione in Atti di Convegno

Multimodal Hand Gesture Classification for the Human-Car Interaction

Authors: D’Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INFORMATICS

2020 Articolo su rivista

Driver Face Verification with Depth Maps

Authors: Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

Published in: SENSORS

Face verification is the task of checking if two provided images contain the face of the same person or not. … (Read full abstract)

Face verification is the task of checking if two provided images contain the face of the same person or not. In this work, we propose a fully-convolutional Siamese architecture to tackle this task, achieving state-of-the-art results on three publicly-released datasets, namely Pandora, High-Resolution Range-based Face Database (HRRFaceD), and CurtinFaces. The proposed method takes depth maps as the input, since depth cameras have been proven to be more reliable in different illumination conditions. Thus, the system is able to work even in the case of the total or partial absence of external light sources, which is a key feature for automotive applications. From the algorithmic point of view, we propose a fully-convolutional architecture with a limited number of parameters, capable of dealing with the small amount of depth data available for training and able to run in real time even on a CPU and embedded boards. The experimental results show acceptable accuracy to allow exploitation in real-world applications with in-board cameras. Finally, exploiting the presence of faces occluded by various head garments and extreme head poses available in the Pandora dataset, we successfully test the proposed system also during strong visual occlusions. The excellent results obtained confirm the efficacy of the proposed method.

2019 Articolo su rivista

Face Verification from Depth using Privileged Information

Authors: Borghi, Guido; Pini, Stefano; Grazioli, Filippo; Vezzani, Roberto; Cucchiara, Rita

In this paper, a deep Siamese architecture for depth-based face verification is presented. The proposed approach efficiently verifies if two … (Read full abstract)

In this paper, a deep Siamese architecture for depth-based face verification is presented. The proposed approach efficiently verifies if two face images belong to the same person while handling a great variety of head poses and occlusions. The architecture, namely JanusNet, consists in a combination of a depth, a RGB and a hybrid Siamese network. During the training phase, the hybrid network learns to extract complementary mid-level convolutional features which mimic the features of the RGB network, simultaneously leveraging on the light invariance of depth images. At testing time, the model, relying only on depth data, achieves state-of-art results and real time performance, despite the lack of deep-oriented depth-based datasets.

2019 Relazione in Atti di Convegno

Hand Gestures for the Human-Car Interaction: the Briareo dataset

Authors: Manganaro, Fabio; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in … (Read full abstract)

Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in this paper we propose a new dataset, called Briareo, specifically collected for the hand gesture recognition task in the automotive context. The dataset is acquired from an innovative point of view, exploiting different kinds of cameras, i.e. RGB, infrared stereo, and depth, that provide various types of images and 3D hand joints. Moreover, the dataset contains a significant amount of hand gesture samples, performed by several subjects, allowing the use of deep learning-based approaches. Finally, a framework for hand gesture segmentation and classification is presented, exploiting a method introduced to assess the quality of the proposed dataset.

2019 Relazione in Atti di Convegno

Manual Annotations on Depth Maps for Human Pose Estimation

Authors: D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Few works tackle the Human Pose Estimation on depth maps. Moreover, these methods usually rely on automatically annotated datasets, and … (Read full abstract)

Few works tackle the Human Pose Estimation on depth maps. Moreover, these methods usually rely on automatically annotated datasets, and these annotations are often imprecise and unreliable, limiting the achievable accuracy using this data as ground truth. For this reason, in this paper we propose an annotation refinement tool of human poses, by means of body joints, and a novel set of fine joint annotations for the Watch-n-Patch dataset, which has been collected with the proposed tool. Furthermore, we present a fully-convolutional architecture that performs the body pose estimation directly on depth maps. The extensive evaluation shows that the proposed architecture outperforms the competitors in different training scenarios and is able to run in real-time.

2019 Relazione in Atti di Convegno

SHREC 2019 Track: Online Gesture Recognition

Authors: Caputo, F. M.; Burato, S.; Pavan, G.; Voillemin, T.; Wannous, H.; Vandeborre, J. P.; Maghoumi, M.; Taranta, E. M.; Razmjoo, A.; J. J. Laviola Jr., ; Manganaro, Fabio; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R.; Nguyen, H.; Tran, M. T.; Giachetti, A.

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of … (Read full abstract)

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of this contest was to test state-of-the-art methods that can be used to online detect command gestures from hands' movements tracking on a basic benchmark where simple gestures are performed interleaving them with other actions. Unlike previous contests and benchmarks on trajectory-based gesture recognition, we proposed an online gesture recognition task, not providing pre-segmented gestures, but asking the participants to find gestures within recorded trajectories. The results submitted by the participants show that an online detection and recognition of sets of very simple gestures from 3D trajectories captured with a cheap sensor can be effectively performed. The best methods proposed could be, therefore, directly exploited to design effective gesture-based interfaces to be used in different contexts, from Virtual and Mixed reality applications to the remote control of home devices.

2019 Relazione in Atti di Convegno

Video synthesis from Intensity and Event Frames

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the … (Read full abstract)

Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the difficulty of applying traditional computer vision algorithms on event data limits their usability. Therefore, in this paper we investigate the use of a deep learning-based architecture that combines an initial grayscale frame and a series of event data to estimate the following intensity frames. In particular, a fully-convolutional encoder-decoder network is employed and evaluated for the frame synthesis task on an automotive event-based dataset. Performance obtained with pixel-wise metrics confirms the quality of the images synthesized by the proposed architecture.

2019 Relazione in Atti di Convegno

Deep Head Pose Estimation from Depth Data for In-car Automotive Applications

Authors: Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the … (Read full abstract)

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, extit{Biwi Kinect Head Pose}, showing that our approach achieves state-of-art results and is able to meet real time performance requirements.

2018 Relazione in Atti di Convegno

Page 4 of 13 • Total publications: 124