Publications by Roberto Vezzani

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Roberto Vezzani

Deep Learning-Based Method for Vision-Guided Robotic Grasping of Unknown Objects

Authors: Bergamini, Luca; Sposato, Mario; Peruzzini, Margherita; Vezzani, Roberto; Pellicciari, Marcello

Published in: ADVANCES IN TRANSDISCIPLINARY ENGINEERING

Collaborative robots must operate safely and efficiently in ever-changing unstructured environments, grasping and manipulating many different objects. Artificial vision has … (Read full abstract)

Collaborative robots must operate safely and efficiently in ever-changing unstructured environments, grasping and manipulating many different objects. Artificial vision has proved to be collaborative robots' ideal sensing technology and it is widely used for identifying the objects to manipulate and for detecting their optimal grasping. One of the main drawbacks of state of the art robotic vision systems is the long training needed for teaching the identification and optimal grasps of each object, which leads to a strong reduction of the robot productivity and overall operating flexibility. To overcome such limit, we propose an engineering method, based on deep learning techniques, for the detection of the robotic grasps of unknown objects in an unstructured environment, which should enable collaborative robots to autonomously generate grasping strategies without the need of training and programming. A novel loss function for the training of the grasp prediction network has been developed and proved to work well also with low resolution 2-D images, then allowing the use of a single, smaller and low cost camera, that can be better integrated in robotic end-effectors. Despite the availability of less information (resolution and depth) a 75% of accuracy has been achieved on the Cornell data set and it is shown that our implementation of the loss function does not suffer of the common problems reported in literature. The system has been implemented using the ROS framework and tested on a Baxter collaborative robot.

2018 Relazione in Atti di Convegno

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Authors: Fabbri, Matteo; Borghi, Guido; Lanzi, Fabio; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer … (Read full abstract)

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer is not. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available for darkness of difficult luminance conditions. Experimental results are very promising and are as far as better than previous proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

2018 Relazione in Atti di Convegno

Fully Convolutional Network for Head Detection with Depth Images

Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also … (Read full abstract)

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also a key element for many disciplines, like Human Computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.

2018 Relazione in Atti di Convegno

Hands on the wheel: a Dataset for Driver Hand Detection and Tracking

Authors: Borghi, Guido; Frigieri, Elia; Vezzani, Roberto; Cucchiara, Rita

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person … (Read full abstract)

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person behavior, attitude and interactions. In particular, this is true for the automotive context, in which hand analysis allows to predict preparatory movements for maneuvers or to investigate the driver’s attention level. Moreover, due to the recent diffusion of cameras inside new car cockpits, it is feasible to use hand gestures to develop new Human-Car Interaction systems, more user-friendly and safe. In this paper, we propose a new dataset, called Turms, that consists of infrared images of driver’s hands, collected from the back of the steering wheel, an innovative point of view. The Leap Motion device has been selected for the recordings, thanks to its stereo capabilities and the wide view-angle. Besides, we introduce a method to detect the presence and the location of driver’s hands on the steering wheel, during driving activity tasks.

2018 Relazione in Atti di Convegno

Head Detection with Depth Images in the Wild

Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, … (Read full abstract)

Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images.

2018 Relazione in Atti di Convegno

Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World

Authors: Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase … (Read full abstract)

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.

2018 Relazione in Atti di Convegno

Learning to Generate Facial Depth Maps

Authors: Pini, Stefano; Grazioli, Filippo; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an … (Read full abstract)

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task.

2018 Relazione in Atti di Convegno

Sistema e metodo di autenticazione di persone in ambienti a limitata visibilità

Authors: Borghi, Guido; Grazioli, Filippo; Vezzani, Roberto; Pini, Stefano; Cucchiara, Rita

2018 Brevetto

Embedded Recurrent Network for Head Pose Estimation in Car

Authors: Borghi, Guido; Gasparini, Riccardo; Vezzani, Roberto; Cucchiara, Rita

An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. … (Read full abstract)

An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. Head pose is a key element for driver's behavior investigation, pose analysis, attention monitoring and also a useful component to improve the efficacy of Human-Car Interaction systems. In this paper, a Recurrent Neural Network is exploited to tackle the problem of driver head pose estimation, directly and only working on depth images to be more reliable in presence of varying or insufficient illumination. Experimental results, obtained from two public dataset, namely Biwi Kinect Head Pose and ICT-3DHP Database, prove the efficacy of the proposed method that overcomes state-of-art works. Besides, the entire system is implemented and tested on two embedded boards with real time performance.

2017 Relazione in Atti di Convegno

Fast and Accurate Facial Landmark Localization in Depth Images for In-car Applications

Authors: Frigieri, Elia; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to … (Read full abstract)

A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to video surveillance. For instance, it can provide a valuable input to monitor the driver physical state and attention level in automotive context. In this paper, we tackle the problem of facial landmark localization through a deep approach. The developed system runs in real time and, in particular, is more reliable than state-of-the-art competitors specially in presence of light changes and poor illumination, thanks to the use of depth images as input. We also collected and shared a new realistic dataset inside a car, called MotorMark, to train and test the system. In addition, we exploited the public Eurecom Kinect Face Dataset for the evaluation phase, achieving promising results both in terms of accuracy and computational speed.

2017 Relazione in Atti di Convegno

Page 5 of 13 • Total publications: 124