Publications - AImageLab

Hands on the wheel: a Dataset for Driver Hand Detection and Tracking

Authors: Borghi, Guido; Frigieri, Elia; Vezzani, Roberto; Cucchiara, Rita

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person … (Read full abstract)

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person behavior, attitude and interactions. In particular, this is true for the automotive context, in which hand analysis allows to predict preparatory movements for maneuvers or to investigate the driver’s attention level. Moreover, due to the recent diffusion of cameras inside new car cockpits, it is feasible to use hand gestures to develop new Human-Car Interaction systems, more user-friendly and safe. In this paper, we propose a new dataset, called Turms, that consists of infrared images of driver’s hands, collected from the back of the steering wheel, an innovative point of view. The Leap Motion device has been selected for the recordings, thanks to its stereo capabilities and the wide view-angle. Besides, we introduce a method to detect the presence and the location of driver’s hands on the steering wheel, during driving activity tasks.

2018 Relazione in Atti di Convegno

DOI IRIS

Head Detection with Depth Images in the Wild

Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, … (Read full abstract)

Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images.

2018 Relazione in Atti di Convegno

DOI IRIS

Improving Skin Lesion Segmentation with Generative Adversarial Networks

Authors: Pollastri, Federico; Bolelli, Federico; Paredes, Roberto; Grana, Costantino

Published in: PROCEEDINGS IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS

This paper proposes a novel strategy that employs Generative Adversarial Networks (GANs) to augment data in the image segmentation field, … (Read full abstract)

This paper proposes a novel strategy that employs Generative Adversarial Networks (GANs) to augment data in the image segmentation field, and a Convolutional-Deconvolutional Neural Network (CDNN) to automatically generate lesion segmentation mask from dermoscopic images. Training the CDNN with our GAN generated data effectively improves the state-of-the-art.

2018 Relazione in Atti di Convegno

DOI IRIS

LAMV: Learning to align and match videos with kernelized temporal layers

Authors: Baraldi, Lorenzo; Douze, Matthijs; Cucchiara, Rita; Jégou, Hervé

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels … (Read full abstract)

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.

2018 Relazione in Atti di Convegno

DOI IRIS

Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World

Authors: Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase … (Read full abstract)

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.

2018 Relazione in Atti di Convegno

DOI IRIS

Learning to Generate Facial Depth Maps

Authors: Pini, Stefano; Grazioli, Filippo; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an … (Read full abstract)

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task.

2018 Relazione in Atti di Convegno

DOI IRIS

Low-cost pupillometry for human-computer interface

Authors: Goddi, A; Ponzio, F; Ficarra, E; Di Cataldo, S; Roatta, S.

Changes in pupil size are governed by the autonomic nervous system but may also be systematically driven by voluntary shifting … (Read full abstract)

Changes in pupil size are governed by the autonomic nervous system but may also be systematically driven by voluntary shifting the gaze in depth. Thus, the pupil accommodative response (PAR) that accompanies voluntary gaze shifts from a far (3 m distance) to a near (30 cm) visual target might be exploited as a simple human-computer interface (HCI), bypassing the somato-motor system.

2018 Poster

IRIS

MDM2 and Aurora Kinase a Contribute to SETD2 Loss of Function in Advanced Systemic Mastocytosis: Implications for Pathogenesis and Treatment

Authors: Mancini, Manuela; Monaldi, Cecilia; De Santis, Sara; Papayannidis, Cristina; Rondoni, Michela; Bavaro, Luana; Martelli, Margherita; Maria Chiara, Abbenante; Curti, Antonio; Ficarra, Elisa; Paciello, Giulia; Chiara Fontana, Maria; Zanotti, Roberta; Bonifacio, Massimiliano; Scaffidi, Luigi; Pagano, Livio; Criscuolo, Marianna; Albano, Francesco; Ciceri, Fabio; Elena, Chiara; Tosi, Patrizia; Delledonne, Massimo; Avanzato, Carla; Xumerle, Luciano; Valent, Peter; Martinelli, Giovanni; Cavo, Michele; Soverini, Simona

Published in: BLOOD

2018 Abstract in Rivista

DOI IRIS

Metodo e sistema per il riconoscimento biometrico univoco di un animale, basati sull'utilizzo di tecniche di deep learning

Authors: Calderara, Simone; Bergamini, Luca; Capobianco Dondona, Andrea; Del Negro, Ercole; Di Tondo, Francesco

La presente invenzione descrive un metodo e sistema per il riconoscimento biometrico univoco di un animale, basato sull’utilizzo di tecniche … (Read full abstract)

La presente invenzione descrive un metodo e sistema per il riconoscimento biometrico univoco di un animale, basato sull’utilizzo di tecniche di deep learning. Il metodo è caratterizzato dalle seguenti fasi: a. fase di allenamento su di un dominio umano ed un dominio animale per l’ottenimento di embedding animali in uno spazio latente omologo a quello umano per mezzo di reti neurali convolutive; b. memorizzazione degli embedding animali ottenuti in una banca dati; c. riconoscimento di una identità animale per mezzo di reti neurali convolutive. La presente invenzione comprende anche un sistema per il riconoscimento biometrico univoco di un animale che utilizza il metodo precedentemente descritto.

2018 Brevetto

IRIS

Multi-views Embedding for Cattle Re-identification

Authors: Bergamini, Luca; Porrello, Angelo; Andrea Capobianco Dondona, ; Ercole Del Negro, ; Mattioli, Mauro; D’Alterio, Nicola; Calderara, Simone

People re-identification task has seen enormous improvements in the latest years, mainly due to the development of better image features … (Read full abstract)

People re-identification task has seen enormous improvements in the latest years, mainly due to the development of better image features extraction from deep Convolutional Neural Networks (CNN) and the availability of large datasets. However, little research has been conducted on animal identification and re-identification, even if this knowledge may be useful in a rich variety of different scenarios. Here, we tackle cattle re-identification exploiting deep CNN and show how this task is poorly related to the human one, presenting unique challenges that make it far from being solved. We present various baselines, both based on deep architectures or on standard machine learning algorithms, and compared them with our solution. Finally, a rich ablation study has been conducted to further investigate the unique peculiarities of this task.

2018 Relazione in Atti di Convegno

DOI IRIS