Publications - AImageLab

RI-PIENO - Revised and Improved Petrol-Filling Itinerary Estimation aNd Optimization

Authors: Savarese, Marco; De Blasi, Antonio; Zaccagnino, Carmine; Salici, Giacomo; Cascianelli, Silvia; Vezzani, Roberto; Grazia, Carlo Augusto

Efficient energy provisioning is a fundamental requirement for modern transportation systems, making refueling path optimization a critical challenge. Existing solutions … (Read full abstract)

Efficient energy provisioning is a fundamental requirement for modern transportation systems, making refueling path optimization a critical challenge. Existing solutions often focus either on inter-vehicle communication or intravehicle monitoring, leveraging Intelligent Transportation Systems, Digital Twins, and Software-Defined Internet of Vehicles with Cloud/Fog/Edge infrastructures. However, integrated frameworks that adapt dynamically to driver mobility patterns are still underdeveloped. Building on our previous PIENO framework, we present RI-PIENO (Revised and Improved Petrolfilling Itinerary Estimation aNd Optimization), a system that combines intra-vehicle sensor data with external geospatial and fuel price information, processed via IoT-enabled Cloud/Fog services. RI-PIENO models refueling as a dynamic, time-evolving directed acyclic graph that reflects both habitual daily trips and real-time vehicular inputs, transforming the system from a static recommendation tool into a continuously adaptive decision engine. We validate RI-PIENO in a daily-commute use case through realistic multi-driver, multi-week simulations, showing that it achieves significant cost savings and more efficient routing compared to previous approaches. The framework is designed to leverage emerging roadside infrastructure and V2X communication, supporting scalable deployment within next-generation IoT and vehicular networking ecosystems.

2025 Relazione in Atti di Convegno

DOI IRIS

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Authors: Rossi, Daniel; Borghi, Guido; Vezzani, Roberto

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial … (Read full abstract)

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.

2025 Relazione in Atti di Convegno

DOI IRIS

D-SPDH: Improving 3D Robot Pose Estimation in Sim2Real Scenario via Depth Data

Authors: Simoni, A.; Borghi, G.; Garattoni, L.; Francesca, G.; Vezzani, R.

Published in: IEEE ACCESS

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation … (Read full abstract)

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation and collaboration between humans and machines, with a particular interest in robotic systems. A pivotal element in actualizing this novel and challenging collaborative paradigm involves different technical tasks, including the comprehension of 3D poses exhibited by both humans and robots through the utilization of non-intrusive systems, such as cameras. In this scenario, the availability of vision-based systems capable of detecting in real-time the robot's pose is needed as a first step towards a safe and effective interaction to, for instance, avoid collisions. Therefore, in this work, we propose a vision-based system, referred to as D-SPDH, able to estimate the 3D robot pose. The system is based on double-branch architecture and depth data as a single input; any additional information regarding the state of the internal encoders of the robot is not required. The working scenario is the Sim2Real, i.e., the system is trained only with synthetic data and then tested on real sequences, thus eliminating the time-consuming acquisition and annotation procedures of real data, common phases in deep learning algorithms. Moreover, we introduce SimBa++, a dataset featuring both synthetic and real sequences with new real-world double-arm movements, and that represents a challenging setting in which the proposed approach is tested. Experimental results show that our D-SPDH method achieves state-of-the-art and real-time performance, paving the way a possible future non-invasive systems to monitor human-robot interactions.

2024 Articolo su rivista

DOI IRIS

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction

Authors: Di Nucci, Davide; Simoni, Alessandro; Tomei, Matteo; Ciuffreda, Luca; Vezzani, Roberto; Cucchiara, Rita

2024 Relazione in Atti di Convegno

IRIS

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Authors: Di Nucci, D.; Simoni, A.; Tomei, M.; Ciuffreda, L.; Vezzani, R.; Cucchiara, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and … (Read full abstract)

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images. Despite their efficiency, NeRF models can pose challenges in certain scenarios such as vehicle inspection, where the lack of sufficient data or the presence of challenging elements (e.g. reflections) strongly impact the accuracy of the reconstruction. To this aim, we introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques. The dataset is publicly released at https://aimagelab.ing.unimore.it/go/ carpatch and can be used as an evaluation guide and as a baseline for future work on this challenging topic.

2023 Relazione in Atti di Convegno

DOI IRIS

Computer Vision in Human Analysis: From Face and Body to Clothes

Authors: Daoudi, Mohamed; Vezzani, Roberto; Borghi, Guido; Ferrari, Claudio; Cornia, Marcella; Becattini, Federico; Pilzer, Andrea

Published in: SENSORS

For decades, researchers of different areas, ranging from artificial intelligence to computer vision, have intensively investigated human-centered data, i.e., data … (Read full abstract)

For decades, researchers of different areas, ranging from artificial intelligence to computer vision, have intensively investigated human-centered data, i.e., data in which the human plays a significant role, acquired through a non-invasive approach, such as cameras. This interest has been largely supported by the highly informative nature of this kind of data, which provides a variety of information with which it is possible to understand many aspects including, for instance, the human body or the outward appearance. Some of the main tasks related to human analysis are focused on the body (e.g., human pose estimation and anthropocentric measurement estimation), the hands (e.g., gesture detection and recognition), the head (e.g., head pose estimation), or the face (e.g., emotion and expression recognition). Additional tasks are based on non-corporal elements, such as motion (e.g., action recognition and human behavior understanding) and clothes (e.g., garment-based virtual try-on and attribute recognition). Unfortunately, privacy issues severely limit the usage and the diffusion of this kind of data, making the exploitation of learning approaches challenging. In particular, privacy issues behind the acquisition and the use of human-centered data must be addressed by public and private institutions and companies. Thirteen high-quality papers have been published in this Special Issue and are summarized in the following: four of them are focused on the human face (facial geometry, facial landmark detection, and emotion recognition), two on eye image analysis (eye status classification and 3D gaze estimation), five on the body (pose estimation, conversational gesture analysis, and action recognition), and two on the outward appearance (transferring clothing styles and fashion-oriented image captioning). These numbers confirm the high interest in human-centered data and, in particular, the variety of real-world applications that it is possible to develop.

2023 Articolo su rivista

DOI IRIS

Depth-based 3D human pose refinement: Evaluating the refinet framework

Authors: D'Eusanio, A.; Simoni, A.; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R.

Published in: PATTERN RECOGNITION LETTERS

In recent years, Human Pose Estimation has achieved impressive results on RGB images. The advent of deep learning architectures and … (Read full abstract)

In recent years, Human Pose Estimation has achieved impressive results on RGB images. The advent of deep learning architectures and large annotated datasets have contributed to these achievements. However, little has been done towards estimating the human pose using depth maps, and especially towards obtaining a precise 3D body joint localization. To fill this gap, this paper presents RefiNet, a depth-based 3D human pose refinement framework. Given a depth map and an initial coarse 2D human pose, RefiNet regresses a fine 3D pose. The framework is composed of three modules, based on different data representations, i.e. 2D depth patches, 3D human skeletons, and point clouds. An extensive experimental evaluation is carried out to investigate the impact of the model hyper-parameters and to compare RefiNet with off-the-shelf 2D methods and literature approaches. Results confirm the effectiveness of the proposed framework and its limited computational requirements.

2023 Articolo su rivista

DOI IRIS

Method for generating probabilistic representations and deep neural network

Authors: Garattoni, Lorenzo; Francesca, Gianpiero; Pini, Stefano; Simoni, Alessandro; Vezzani, Roberto; Borghi, Guido

2023 Brevetto

IRIS

Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps

Authors: Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Published in: IEEE ROBOTICS AND AUTOMATION LETTERS

Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the … (Read full abstract)

Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the detection of unsafe situations or the study of mutual interactions for statistical and social purposes. In this paper, we propose a non-invasive and light-invariant framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera. The method can be applied to any robot without requiring hardware access to the internal states. We introduce a novel representation of the predicted pose, namely Semi-Perspective Decoupled Heatmaps (SPDH), to accurately compute 3D joint locations in world coordinates adapting efficient deep networks designed for the 2D Human Pose Estimation. The proposed approach, which takes as input a depth representation based on XYZ coordinates, can be trained on synthetic depth data and applied to real-world settings without the need for domain adaptation techniques. To this end, we present the SimBa dataset, based on both synthetic and real depth images, and use it for the experimental evaluation. Results show that the proposed approach, made of a specific depth map representation and the SPDH, overcomes the current state of the art.

2022 Articolo su rivista

DOI IRIS

Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data

Authors: D'Eusanio, A.; Pini, S.; Borghi, G.; Simoni, A.; Vezzani, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, … (Read full abstract)

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.

2022 Relazione in Atti di Convegno

DOI IRIS

Publications by Roberto Vezzani

RI-PIENO - Revised and Improved Petrol-Filling Itinerary Estimation aNd Optimization

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

D-SPDH: Improving 3D Robot Pose Estimation in Sim2Real Scenario via Depth Data

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Computer Vision in Human Analysis: From Face and Body to Clothes

Depth-based 3D human pose refinement: Evaluating the refinet framework

Method for generating probabilistic representations and deep neural network

Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps

Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data