Diffusion Transformers for Tabular Data Time Series Generation
Authors: Garuti, Fabrizio; Sangineto, Enver; Luetto, Simone; Forni, Lorenzo; Cucchiara, Rita
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Authors: Garuti, Fabrizio; Sangineto, Enver; Luetto, Simone; Forni, Lorenzo; Cucchiara, Rita
Authors: Cappellino, Chiara; Mancusi, Gianluca; Mosconi, Matteo; Porrello, Angelo; Calderara, Simone; Cucchiara, Rita
Authors: Fincato, M.; Vezzani, R.
Published in: SENSORS
Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model's capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model's ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available.
Authors: Burger, Jacopo; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
Alzheimer’s Disease (AD) and Frontotemporal Dementia (FTD), among the most prevalent neurodegenerative disorders, disrupt brain activity and connectivity, highlighting the need for tools that can effectively capture these alterations. Effective Connectivity Networks (ECNs), which model causal interactions between brain regions, offer a promising approach to characterizing AD and FTD related neural changes. In this study, we estimate ECNs from EEG traces using a state-of-the-art causal discovery method specifically designed for time-series data, to recover the causal structure of the interactions between brain areas. The recovered ECNs are integrated into a novel Graph Neural Network architecture (ECoGNet), where nodes represent brain regions and edge features encode causal relationships. Our method combines ECNs with features summarizing local brain dynamics to improve AD and FTD detection. Evaluated on a publicly available EEG dataset, the proposed approach demonstrates superior performance compared to models that either use non-causal connectivity networks or omit connectivity information entirely.
Authors: Bertoli, Annalisa; Fantuzzi, Cesare
Published in: COMPUTERS
In recent years, the increasing complexity of production systems driven by technological development has created new opportunities in the industrial world but has also brought challenges in the practical use of these systems by operators. One of the biggest changes is data existence and its accessibility. This work proposes an IoT architecture specifically designed for real-world industrial environments. The goal is to present a system that can be effectively implemented to monitor operations and production processes in real time. This solution improves fault detection and identification, giving the operators the critical information needed to make informed decisions. The IoT architecture is implemented in two different industrial applications, demonstrating the flexibility of the architecture across various industrial contexts. It highlights how the system is monitored to reduce downtime when a fault occurs, making clear the loss in performance and the fault that causes this loss. Additionally, this approach supports human operators in a deeper understanding of their working environment, enabling them to make decisions based on real-time data.
Authors: Morelli, Nicola; Marchesini, Kevin; Lumetti, Luca; Santi, Daniele; Grana, Costantino; Bolelli, Federico
Testicular ultrasound imaging is vital for assessing male infertility, with testicular inhomogeneity serving as a key biomarker. However, subjective interpretation and the scarcity of publicly available datasets pose challenges to automated classification. In this study, we explore supervised and unsupervised pretraining strategies using a ResNet-based architecture, supplemented by diffusion-based generative models to synthesize realistic ultrasound images. Our results demonstrate that pretraining significantly enhances classification performance compared to training from scratch, and synthetic data can effectively substitute real images in the pretraining process, alleviating data-sharing constraints. These methods offer promising advancements toward robust, clinically valuable automated analysis of male infertility. The source code is publicly available at https://github.com/AImageLab-zip/TesticulUS/.
Authors: Sanguigni, Fulvio; Morelli, Davide; Cornia, Marcella; Cucchiara, Rita
In recent years, the fashion industry has increasingly adopted AI technologies to enhance customer experience, driven by the proliferation of e-commerce platforms and virtual applications. Among the various tasks, virtual try-on and multimodal fashion image editing – which utilizes diverse input modalities such as text, garment sketches, and body poses – have become a key area of research. Diffusion models have emerged as a leading approach for such generative tasks, offering superior image quality and diversity. However, most existing virtual try-on methods rely on having a specific garment input, which is often impractical in real-world scenarios where users may only provide textual specifications. To address this limitation, in this work we introduce Fashion Retrieval-Augmented Generation (Fashion-RAG), a novel method that enables the customization of fashion items based on user preferences provided in textual form. Our approach retrieves multiple garments that match the input specifications and generates a personalized image by incorporating attributes from the retrieved items. To achieve this, we employ textual inversion techniques, where retrieved garment images are projected into the textual embedding space of the Stable Diffusion text encoder, allowing seamless integration of retrieved elements into the generative process. Experimental results on the Dress Code dataset demonstrate that Fashion-RAG outperforms existing methods both qualitatively and quantitatively, effectively capturing fine-grained visual details from retrieved garments. To the best of our knowledge, this is the first work to introduce a retrieval-augmented generation approach specifically tailored for multimodal fashion image editing.
Authors: Corso, Giulia; Lovino, Marta; Akpinar, Reha; Di Tommaso, Luca; Ficarra, Elisa; Ranzini, Marta
Authors: Betti, Federico; Baraldi, Lorenzo; Baraldi, Lorenzo; Cucchiara, Rita; Sebe, Nicu
Published in: INTERNATIONAL JOURNAL OF COMPUTER VISION