Sistema e metodo di autenticazione di persone in ambienti a limitata visibilità
Authors: Borghi, Guido; Grazioli, Filippo; Vezzani, Roberto; Pini, Stefano; Cucchiara, Rita
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Authors: Borghi, Guido; Grazioli, Filippo; Vezzani, Roberto; Pini, Stefano; Cucchiara, Rita
Authors: Bolelli, Federico; Borghi, Guido; Grana, Costantino
Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
Dematerialization and digitalization of historical documents are key elements for their availability, preservation and diffusion. Unfortunately, the conversion from handwritten to digitalized documents presents several technical challenges. The XDOCS project is created with the main goal of making available and extending the usability of historical documents for a great variety of audience, like scholars, institutions and libraries. In this paper the core elements of XDOCS, i.e. page dewarping and word spotting technique, are described and two new applications, i.e. annotation/indexing and search tool, are presented.
Authors: Balducci, Fabrizio; Borghi, Guido
Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
Melanoma is one of the deadliest form of skin cancers so it becomes crucial the developing of automated systems that analyze and investigate epidermal images to early identify them also reducing unnecessary medical exams. A key element is the availability of user-friendly annotation tools that can be used by non-IT experts to produce well-annotated and high-quality medical data. In this work, we present an annotation tool to manually crate and annotate digital epidermal images, with the aim to extract meta-data (annotations, contour patterns and intersections, color information) stored and organized in an integrated digital library. This tool is obtained following rigid usability principles also based on doctors interviews and opinions. A preliminary but functional evaluation phase has been conducted with non-medical subjects by using questionnaires, in order to check the general usability and the efficacy of the proposed tool.
Authors: Borghi, Guido; Gasparini, Riccardo; Vezzani, Roberto; Cucchiara, Rita
An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. Head pose is a key element for driver's behavior investigation, pose analysis, attention monitoring and also a useful component to improve the efficacy of Human-Car Interaction systems. In this paper, a Recurrent Neural Network is exploited to tackle the problem of driver head pose estimation, directly and only working on depth images to be more reliable in presence of varying or insufficient illumination. Experimental results, obtained from two public dataset, namely Biwi Kinect Head Pose and ICT-3DHP Database, prove the efficacy of the proposed method that overcomes state-of-art works. Besides, the entire system is implemented and tested on two embedded boards with real time performance.
Authors: Frigieri, Elia; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to video surveillance. For instance, it can provide a valuable input to monitor the driver physical state and attention level in automotive context. In this paper, we tackle the problem of facial landmark localization through a deep approach. The developed system runs in real time and, in particular, is more reliable than state-of-the-art competitors specially in presence of light changes and poor illumination, thanks to the use of depth images as input. We also collected and shared a new realistic dataset inside a car, called MotorMark, to train and test the system. In addition, we exploited the public Eurecom Kinect Face Dataset for the evaluation phase, achieving promising results both in terms of accuracy and computational speed.
Authors: Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework.
Authors: Bolelli, Federico; Borghi, Guido; Grana, Costantino
Published in: LECTURE NOTES IN COMPUTER SCIENCE
In this paper we present an innovative technique to semi-automatically index handwritten word images. The proposed method is based on HOG descriptors and exploits Dynamic Time Warping technique to compare feature vectors elaborated from single handwritten words. Our strategy is applied to a new challenging dataset extracted from Italian civil registries of the XIX century. Experimental results, compared with some previously developed word spotting strategies, confirmed that our method outperforms competitors.
Authors: Palazzi, Andrea; Borghi, Guido; Abati, Davide; Calderara, Simone; Cucchiara, Rita
Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones.
Authors: Borghi, Guido; Venturelli, Marco; Vezzani, Roberto; Cucchiara, Rita
Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second.
Authors: Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel doublestage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify presegmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them.