Publications by Roberto Vezzani

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Roberto Vezzani

From Depth Data to Head Pose Estimation: a Siamese approach

Authors: Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it … (Read full abstract)

The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework.

2017 Relazione in Atti di Convegno

POSEidon: Face-from-Depth for Driver Pose Estimation

Authors: Borghi, Guido; Venturelli, Marco; Vezzani, Roberto; Cucchiara, Rita

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging … (Read full abstract)

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second.

2017 Relazione in Atti di Convegno

Fast gesture recognition with Multiple StreamDiscrete HMMs on 3D Skeletons

Authors: Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high … (Read full abstract)

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel doublestage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify presegmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them.

2016 Relazione in Atti di Convegno

Shot, scene and keyframe ordering for interactive video re-use

Authors: Baraldi, L.; Grana, C.; Borghi, G.; Vezzani, R.; Cucchiara, R.

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to … (Read full abstract)

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to select the best representative key-frames, which could be used in new interactive interfaces for accessing large collections of edited videos. The final goal is to enable an improved access to video footage and the re-use of video content with the direct management of user-selected video-clips.

2016 Relazione in Atti di Convegno

YACCLAB - Yet Another Connected Components Labeling Benchmark

Authors: Grana, Costantino; Bolelli, Federico; Baraldi, Lorenzo; Vezzani, Roberto

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The problem of labeling the connected components (CCL) of a binary image is well-defined and several proposals have been presented … (Read full abstract)

The problem of labeling the connected components (CCL) of a binary image is well-defined and several proposals have been presented in the past. Since an exact solution to the problem exists and should be mandatory provided as output, algorithms mainly differ on their execution speed. In this paper, we propose and describe YACCLAB, Yet Another Connected Components Labeling Benchmark. Together with a rich and varied dataset, YACCLAB contains an open source platform to test new proposals and to compare them with publicly available competitors. Textual and graphical outputs are automatically generated for three kinds of test, which analyze the methods from different perspectives. The fairness of the comparisons is guaranteed by running on the same system and over the same datasets. Examples of usage and the corresponding comparisons among state-of-the-art techniques are reported to confirm the potentiality of the benchmark.

2016 Relazione in Atti di Convegno

A General-Purpose Sensing Floor Architecture for Human-Environment Interaction

Authors: Vezzani, Roberto; Lombardi, Martino; Pieracci, Augusto; Santinelli, Paolo; Cucchiara, Rita

Published in: ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS

Smart environments are now designed as natural interfaces to capture and understand human behavior without a need for explicit human-computer … (Read full abstract)

Smart environments are now designed as natural interfaces to capture and understand human behavior without a need for explicit human-computer interaction. In this paper, we present a general-purpose architecture that acquires and understands human behaviors through a sensing floor. The pressure field generated by moving people is captured and analyzed. Specific actions and events are then detected by a low-level processing engine and sent to high-level interfaces providing different functions. The proposed architecture and sensors are modular, general-purpose, cheap, and suitable for both small- and large-area coverage. Some sample entertainment and virtual reality applications that we developed to test the platform are presented.

2015 Articolo su rivista

Automatic configuration and calibration of modular sensing floors

Authors: Vezzani, Roberto; Lombardi, Martino; Cucchiara, Rita

Sensing floors are becoming an emerging solution for many privacy-compliant and large area surveillance systems. Many research and even commercial … (Read full abstract)

Sensing floors are becoming an emerging solution for many privacy-compliant and large area surveillance systems. Many research and even commercial Technologies have been proposed in the last years. Similarly to distributed camera networks, the problem of calibration is crucial, specially when installed in wide areas. This paper addresses the general problem of automatic calibration and configuration of modular and scalable sensing floors. Working on training data only, the system automatically finds the spatial placement of each sensor module and estimates threshold parameters needed for people detection. Tests on several training sequences captured with a commercial sensing floor are provided to validate the method

2015 Relazione in Atti di Convegno

Detection of Human Movements with Pressure Floor Sensors

Authors: Lombardi, Martino; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Following the recent Internet of Everything (IoE) trend, several general-purpose devices have been proposed to acquire as much information as … (Read full abstract)

Following the recent Internet of Everything (IoE) trend, several general-purpose devices have been proposed to acquire as much information as possible from the environment and from people interacting with it. Among the others, sensing floors are recently attracting the interest of the research community. In this paper, we propose a new model to store and process floor data. The model does not assume a regular grid distribution of the sensing elements and is based on the ground reaction force (GRF) concept, widely used in biomechanics. It allows the correct detection and tracking of people, outperforming the common background subtraction schema adopted in the past. Several tests on a real sensing floor prototype are reported and discussed

2015 Relazione in Atti di Convegno

Mapping Appearance Descriptors on 3D Body Models for People Re-identification

Authors: Baltieri, Davide; Vezzani, Roberto; Cucchiara, Rita

Published in: INTERNATIONAL JOURNAL OF COMPUTER VISION

People Re-identification aims at associating multiple instances of a person’s appearance acquired from different points of view, different cameras, or … (Read full abstract)

People Re-identification aims at associating multiple instances of a person’s appearance acquired from different points of view, different cameras, or after a spatial or a limited temporal gap to the same identifier. The basic hypothesis is that the person’s appearance is mostly constant. Many appearance descriptors have been adopted in the past, but they are often subject to severe perspective and view-point issues. In this paper, we propose a complete re-identification framework which exploits non-articulated 3D body models to spatially map appearance descriptors (color and gradient histograms) into the vertices of a regularly sampled 3D body surface. The matching and the shot integration steps are directly handled in the 3D body model, reducing the effects of occlusions, partial views or pose changes, which normally afflict 2D descriptors. A fast and effective model to image alignment is also proposed. It allows operation on common surveillance cameras or image collections. A comprehensive experimental evaluation is presented using the benchmark suite 3DPeS

2015 Articolo su rivista

3D Hough transform for sphere recognition on point clouds

Authors: Camurri, Marco; Vezzani, Roberto; Cucchiara, Rita

Published in: MACHINE VISION AND APPLICATIONS

Three-dimensional object recognition on range data and 3D point clouds is becoming more important nowadays. Since many real objects have … (Read full abstract)

Three-dimensional object recognition on range data and 3D point clouds is becoming more important nowadays. Since many real objects have a shape that could be approximated by simple primitives, robust pattern recognition can be used to search for primitive models. For example, the Hough transform is a well-known technique which is largely adopted in 2D image space. In this paper, we systematically analyze different probabilistic/randomized Hough transform algorithms for spherical object detection in dense point clouds. In particular, we study and compare four variants which are characterized by the number of points drawn together for surface computation into the parametric space and we formally discuss their models. We also propose a new method that combines the advantages of both single-point and multi-point approaches for a faster and more accurate detection. The methods are tested on synthetic and real datasets.

2014 Articolo su rivista

Page 6 of 13 • Total publications: 124