Publications by Guido Borghi

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Guido Borghi

SHREC 2022 track on online detection of heterogeneous gestures

Authors: Emporio, M.; Caputo, A.; Giachetti, A.; Cristani, M.; Borghi, G.; D'Eusanio, A.; Le, M. -Q.; Nguyen, H. -D.; Tran, M. -T.; Ambellan, F.; Hanik, M.; Nava-Yazdani, E.; Von Tycowicz, C.

Published in: COMPUTERS & GRAPHICS

This paper presents the outcomes of a contest organized to evaluate methods for the online recognition of heterogeneous gestures from … (Read full abstract)

This paper presents the outcomes of a contest organized to evaluate methods for the online recognition of heterogeneous gestures from sequences of 3D hand poses. The task is the detection of gestures belonging to a dictionary of 16 classes characterized by different pose and motion features. The dataset features continuous sequences of hand tracking data where the gestures are interleaved with non-significant motions. The data have been captured using the Hololens 2 finger tracking system in a realistic use-case of mixed reality interaction. The evaluation is based not only on the detection performances but also on the latency and the false positives, making it possible to understand the feasibility of practical interaction tools based on the algorithms proposed. The outcomes of the contest's evaluation demonstrate the necessity of further research to reduce recognition errors, while the computational cost of the algorithms proposed is sufficiently low.

2022 Articolo su rivista

Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data

Authors: D'Eusanio, A.; Pini, S.; Borghi, G.; Simoni, A.; Vezzani, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, … (Read full abstract)

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.

2022 Relazione in Atti di Convegno

A Double Siamese Framework for Differential Morphing Attack Detection

Authors: Borghi, Guido; Pancisi, Emanuele; Ferrara, Matteo; Maltoni, Davide

Published in: SENSORS

Face morphing and related morphing attacks have emerged as a serious security threat for automatic face recognition systems and a … (Read full abstract)

Face morphing and related morphing attacks have emerged as a serious security threat for automatic face recognition systems and a challenging research field. Therefore, the availability of effective and reliable morphing attack detectors is strongly needed. In this paper, we proposed a framework based on a double Siamese architecture to tackle the morphing attack detection task in the differential scenario, in which two images, a trusted live acquired image and a probe image (morphed or bona fide) are given as the input for the system. In particular, the presented framework aimed to merge the information computed by two different modules to predict the final score. The first one was designed to extract information about the identity of the input faces, while the second module was focused on the detection of artifacts related to the morphing process. Experimental results were obtained through several and rigorous cross-dataset tests, exploiting three well-known datasets, namely PMDB, MorphDB, and AMSL, containing automatic and manually refined facial morphed images, showing that the proposed framework was able to achieve satisfying results.

2021 Articolo su rivista

A Systematic Comparison of Depth Map Representations for Face Recognition

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Maltoni, Davide; Cucchiara, Rita

Published in: SENSORS

2021 Articolo su rivista

Automated Artifact Retouching in Morphed Images with Attention Maps

Authors: Borghi, G.; Franco, A.; Graffieti, G.; Maltoni, D.

Published in: IEEE ACCESS

Morphing attack is an important security threat for automatic face recognition systems. High-quality morphed images, i.e. images without significant visual … (Read full abstract)

Morphing attack is an important security threat for automatic face recognition systems. High-quality morphed images, i.e. images without significant visual artifacts such as ghosts, noise, and blurring, exhibit higher chances of success, being able to fool both human examiners and commercial face verification algorithms. Therefore, the availability of large sets of high-quality morphs is fundamental for training and testing robust morphing attack detection algorithms. However, producing a high-quality morphed image is an expensive and time-consuming task since manual post-processing is generally required to remove the typical artifacts generated by landmark-based morphing techniques. This work describes an approach based on the Conditional Generative Adversarial Network paradigm for automated morphing artifact retouching and the use of Attention Maps to guide the generation process and limit the retouch to specific areas. In order to work with high-resolution images, the framework is applied on different facial crops, which, once processed and retouched, are accurately blended to reconstruct the whole morphed face. Specifically, we focus on four different squared face regions, i.e. the right and left eyes, the nose, and the mouth, that are frequently affected by artifacts. Several qualitative and quantitative experimental evaluations have been conducted to confirm the effectiveness of the proposal in terms of, among the others, pixel-wise metrics, identity preservation, and human observer analysis. Results confirm the feasibility and the accuracy of the proposed framework.

2021 Articolo su rivista

Improving Car Model Classification through Vehicle Keypoint Localization

Authors: Simoni, Alessandro; D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto

In this paper, we present a novel multi-task framework which aims to improve the performance of car model classification leveraging … (Read full abstract)

In this paper, we present a novel multi-task framework which aims to improve the performance of car model classification leveraging visual features and pose information extracted from single RGB images. In particular, we merge the visual features obtained through an image classification network and the features computed by a model able to predict the pose in terms of 2D car keypoints. We show how this approach considerably improves the performance on the model classification task testing our framework on a subset of the Pascal3D dataset containing the car classes. Finally, we conduct an ablation study to demonstrate the performance improvement obtained with respect to a single visual classifier network.

2021 Relazione in Atti di Convegno

RefiNet: 3D Human Pose Refinement with Depth Maps

Authors: D’Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely … (Read full abstract)

Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely investigated in the 2D domain, i.e. intensity images. Therefore, most of the available methods for this task are mainly based on 2D Convolutional Neural Networks and huge manually-annotated RGB datasets, achieving stunning results. In this paper, we propose RefiNet, a multi-stage framework that regresses an extremely-precise 3D human pose estimation from a given 2D pose and a depth map. The framework consists of three different modules, each one specialized in a particular refinement and data representation, i.e. depth patches, 3D skeleton and point clouds. Moreover, we present a new dataset, called Baracca, acquired with RGB, depth and thermal cameras and specifically created for the automotive context. Experimental results confirm the quality of the refinement procedure that largely improves the human pose estimations of off-the-shelf 2D methods.

2021 Relazione in Atti di Convegno

SHREC 2021: Skeleton-based hand gesture recognition in the wild

Authors: Caputo, Ariel; Giacchetti, Andrea; Soso, Simone; Pintani, Deborah; D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Simoni, Alessandro; Vezzani, Roberto; Cucchiara, Rita; Ranieri, Andrea; Giannini, Franca; Lupinetti, Katia; Monti, Marina; Maghoumi, Mehran; Laviola Jr, Joseph; Le, Minh-Quan; Nguyen, Hai-Dang; Tran, Minh-Triet

Published in: COMPUTERS & GRAPHICS

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of … (Read full abstract)

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of this contest was to test state-of-the-art methods that can be used to online detect command gestures from hands' movements tracking on a basic benchmark where simple gestures are performed interleaving them with other actions. Unlike previous contests and benchmarks on trajectory-based gesture recognition, we proposed an online gesture recognition task, not providing pre-segmented gestures, but asking the participants to find gestures within recorded trajectories. The results submitted by the participants show that an online detection and recognition of sets of very simple gestures from 3D trajectories captured with a cheap sensor can be effectively performed. The best methods proposed could be, therefore, directly exploited to design effective gesture-based interfaces to be used in different contexts, from Virtual and Mixed reality applications to the remote control of home devices.

2021 Articolo su rivista

Vehicle and method for inspecting a railway line

Authors: Avizzano, Carlo Alberto; Borghi, Guido; Calderara, Simone; Cucchiara, Rita; Fedeli, Eugenio; Ermini, Mirko; Gonnelli, Mirco; Labanca, Giacomo; Frisoli, Antonio; Gasparini, Riccardo; Solazzi, Massimiliano; Tiseni, Luca; Leonardis, Daniele; Satler, Massimo

2021 Brevetto

Video Frame Synthesis combining Conventional and Event Cameras

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Published in: INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output … (Read full abstract)

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional wellknown datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirm the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.

2021 Articolo su rivista

Page 5 of 9 • Total publications: 81