Publications by Guido Borghi

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Guido Borghi

LLMs as NAO Robot 3D Motion Planners

Authors: Catalini, Riccardo; Salici, Giacomo; Biagi, Federico; Borghi, Guido; Biagiotti, Luigi; Vezzani, Roberto

In this study, we demonstrate the capabilities of state-of-the-art Large Language Models (LLMs) in teaching social robots to perform specific … (Read full abstract)

In this study, we demonstrate the capabilities of state-of-the-art Large Language Models (LLMs) in teaching social robots to perform specific actions within a 3D environment. Specifically, we introduce the use of LLMs to generate sequences of 3D joint angles - in both zero-shot and one-shot prompting - that a humanoid robot must follow to perform a given action. This work is driven by the growing demand for intuitive interactions with social robots: indeed, LLMs could empower non-expert users to operate and benefit from robotic systems effectively. Additionally, this method leverages the possibility to generate synthetic data without effort, enabling privacy-focused use cases. To evaluate the output quality of seven different LLMs, we conducted a blind user study to compare the pose sequences. Participants were shown videos of the well-known NAO robot performing the generated actions and were asked to identify the intended action and choose the best match with the original instruction from a collection of candidates created by different LLMs. The results highlight that the majority of LLMs are indeed capable of planning correct and complete recognizable actions, showing a novel perspective of how AI can be applied to social robotics.

2025 Relazione in Atti di Convegno

San Vitale Challenge: Automatic Reconstruction of Ancient Colored Glass Windows

Authors: Di Domenico, N.; Borghi, G.; Franco, A.; Boschetti, M.; Giacomini, F.; Barzaghi, S.; Ferucci, S.; Zambruno, S.; Mularoni, L.; Gao, Q.; Che, C.; Li, G.; Zu, Y.; Hao, J.; Zhang, J.; Ducz, A.; Gego, L.; Imeri, K.; Nemkin, V.; Rakhmatillaev, A.; Szatmari, S.; Rowan, W.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The sixth-century Basilica of San Vitale in Ravenna, Italy, once featured intricate circular colored glass windows that illuminated its interior. … (Read full abstract)

The sixth-century Basilica of San Vitale in Ravenna, Italy, once featured intricate circular colored glass windows that illuminated its interior. Although these windows are now lost, several fragments were recovered during recent restorations. Unfortunately, reconstructing the original glass windows from these fragments is extremely complex and time-consuming, requiring the use of specialized expertise. Therefore, the development of automatic reconstruction techniques based on Artificial Intelligence is particularly important and challenging, due to, for instance, the presence of uniform color, damaged glass edges, and many fragment outliers. In this direction, the San Vitale Challenge was organized to gather the best methods and algorithms, as described and summarized in this paper. The challenge, split into several sub-tracks of increasing difficulty and realism, received the submission of several solutions, ranging from more classical computer vision algorithms to purely deep learning-based approaches, whose results are quantitatively evaluated and compared. In the last part of the paper, directions for future developments of such systems are discussed.

2025 Relazione in Atti di Convegno

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Authors: Rossi, Daniel; Borghi, Guido; Vezzani, Roberto

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial … (Read full abstract)

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.

2025 Relazione in Atti di Convegno

TONO: A Synthetic Dataset for Face Image Compliance to ISO/ICAO Standard

Authors: Borghi, Guido; Franco, Annalisa; Di Domenico, Nicolò; Maltoni, Davide

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2025 Relazione in Atti di Convegno

Towards on-device continual learning with Binary Neural Networks in industrial scenarios

Authors: Vorabbi, L.; Carraggi, A.; Maltoni, D.; Borghi, G.; Santi, S.

Published in: IMAGE AND VISION COMPUTING

This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within … (Read full abstract)

This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within the Internet of Things context. As deep learning continues to gain traction in IoT applications, the need for efficient models that can learn continuously from incremental data streams without requiring extensive computational resources has become more pressing. We propose a solution that integrates Continual Learning with BNNs, utilizing replay memory to prevent catastrophic forgetting. Our method focuses on quantized neural networks, introducing the quantization also for the backpropagation step, significantly reducing memory and computational requirements. Furthermore, we enhance the replay memory mechanism by storing intermediate feature maps (i.e. latent replay) with 1bit precision instead of raw data, enabling efficient memory usage. In addition to well-known benchmarks, we introduce the DL-Hazmat dataset, which consists of over 140k high-resolution grayscale images of 64 hazardous material symbols. Experimental results show a significant improvement in model accuracy and a substantial reduction in memory requirements, demonstrating the effectiveness of our method in enabling deep learning applications on embedded devices in real-world scenarios. Our work expands the application of Continual Learning and BNNs for efficient on-device training, offering a promising solution for IoT and other resource-constrained environments.

2025 Articolo su rivista

Adversarial Identity Injection for Semantic Face Image Synthesis

Authors: Tarollo, G.; Fontanini, T.; Ferrari, C.; Borghi, G.; Prati, A.

Published in: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS

Nowadays, deep learning models have reached incredible performance in the task of image generation. Plenty of literature works address the … (Read full abstract)

Nowadays, deep learning models have reached incredible performance in the task of image generation. Plenty of literature works address the task of face generation and editing, with human and automatic systems that struggle to distinguish what's real from generated. Whereas most systems reached excellent visual generation quality, they still face difficulties in preserving the identity of the starting input subject. Among all the explored techniques, Semantic Image Synthesis (SIS) methods, whose goal is to generate an image conditioned on a semantic segmentation mask, are the most promising, even though preserving the perceived identity of the input subject is not their main concern. Therefore, in this paper, we investigate the problem of identity preservation in face image generation and present an SIS architecture that exploits a cross-attention mechanism to merge identity, style, and semantic features to generate faces whose identities are as similar as possible to the input ones. Experimental results reveal that the proposed method is not only suitable for preserving the identity but is also effective in the face recognition adversarial attack, i.e. hiding a second identity in the generated faces.

2024 Relazione in Atti di Convegno

Compact High-Resolution Multi-Wavelength LED Light Source for Eye Stimulation

Authors: Gibertoni, Giovanni; Borghi, Guido; Rovati, Luigi

Published in: ELECTRONICS

Eye stimulation research plays a critical role in advancing our understanding of visual processing and developing new therapies for visual … (Read full abstract)

Eye stimulation research plays a critical role in advancing our understanding of visual processing and developing new therapies for visual impairments. Despite its importance, researchers and clinicians still face challenges with the availability of cost-effective, precise, and versatile tools for conducting these studies. Therefore, this study introduces a high-resolution, compact, and budget-friendly multi-wavelength LED light source tailored for precise and versatile eye stimulation, addressing the aforementioned needs in medical research and visual science. Accommodating standard 3 mm or 5 mm package LEDs, the system boasts broad compatibility, while its integration with any microcontroller capable of PWM generation and supporting SPI and UART communication ensures adaptability across diverse applications. Operating at high resolution (18 bits or more) with great linearity, the LED light source offers nuanced control for sophisticated eye stimulation protocols. The simple 3D printable optical design allows the coupling of up to seven different wavelengths while ensuring the cost-effectiveness of the device. The system’s output has been designed to be fiber-coupled with standard SMA connectors to be compatible with most solutions. The proposed implementation significantly undercuts the cost of commercially available solutions, providing a viable, budget-friendly option for advancing eye stimulation research.

2024 Articolo su rivista

D-SPDH: Improving 3D Robot Pose Estimation in Sim2Real Scenario via Depth Data

Authors: Simoni, A.; Borghi, G.; Garattoni, L.; Francesca, G.; Vezzani, R.

Published in: IEEE ACCESS

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation … (Read full abstract)

In recent years, there has been a notable surge in the significance attributed to technologies facilitating secure and efficient cohabitation and collaboration between humans and machines, with a particular interest in robotic systems. A pivotal element in actualizing this novel and challenging collaborative paradigm involves different technical tasks, including the comprehension of 3D poses exhibited by both humans and robots through the utilization of non-intrusive systems, such as cameras. In this scenario, the availability of vision-based systems capable of detecting in real-time the robot's pose is needed as a first step towards a safe and effective interaction to, for instance, avoid collisions. Therefore, in this work, we propose a vision-based system, referred to as D-SPDH, able to estimate the 3D robot pose. The system is based on double-branch architecture and depth data as a single input; any additional information regarding the state of the internal encoders of the robot is not required. The working scenario is the Sim2Real, i.e., the system is trained only with synthetic data and then tested on real sequences, thus eliminating the time-consuming acquisition and annotation procedures of real data, common phases in deep learning algorithms. Moreover, we introduce SimBa++, a dataset featuring both synthetic and real sequences with new real-world double-arm movements, and that represents a challenging setting in which the proposed approach is tested. Experimental results show that our D-SPDH method achieves state-of-the-art and real-time performance, paving the way a possible future non-invasive systems to monitor human-robot interactions.

2024 Articolo su rivista

Differential Morphing Attack Detection via Triplet-Based Metric Learning and Artifact Extraction

Authors: Liu, Chengcheng; Ferrara, Matteo; Franco, Annalisa; Borghi, Guido; Zhong, Dexing

2024 Relazione in Atti di Convegno

Enabling On-Device Continual Learning with Binary Neural Networks and Latent Replay

Authors: Vorabbi, Lorenzo; Maltoni, Davide; Borghi, Guido; Santi, Stefano

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is … (Read full abstract)

On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.

2024 Relazione in Atti di Convegno

Page 2 of 9 • Total publications: 81