Publications by Lorenzo Baraldi

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Lorenzo Baraldi

Ai4ar: An ai-based mobile application for the automatic generation of ar contents

Authors: Pierdicca, R.; Paolanti, M.; Frontoni, E.; Baraldi, L.

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

Augmented reality (AR) is the process of using technology to superimpose images, text or sounds on top of what a … (Read full abstract)

Augmented reality (AR) is the process of using technology to superimpose images, text or sounds on top of what a person can already see. Art galleries and museums started to develop AR applications to increase engagement and provide an entirely new kind of exploration experience. However, the creation of contents results a very time consuming process, thus requiring an ad-hoc development for each painting to be increased. In fact, for the creation of an AR experience on any painting, it is necessary to choose the points of interest, to create digital content and then to develop the application. If this is affordable for the great masterpieces of an art gallery, it would be impracticable for an entire collection. In this context, the idea of this paper is to develop AR applications based on Artificial Intelligence. In particular, automatic captioning techniques are the key core for the implementation of AR application for improving the user experience in front of a painting or an artwork in general. The study has demonstrated the feasibility through a proof of concept application, implemented for hand held devices, and adds to the body of knowledge in mobile AR application as this approach has not been applied in this field before.

2020 Relazione in Atti di Convegno

Explaining Digital Humanities by Aligning Images and Textual Descriptions

Authors: Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: PATTERN RECOGNITION LETTERS

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer … (Read full abstract)

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer Vision and the Natural Language Processing communities. This research effort has resulted in algorithms that can retrieve images from textual descriptions and vice versa, when realistic images and sentences with simple semantics are employed and when paired training data is provided. In this paper, we go beyond these limitations and tackle the design of visual-semantic algorithms in the domain of the Digital Humanities. This setting not only advertises more complex visual and semantic structures but also features a significant lack of training data which makes the use of fully-supervised approaches infeasible. With this aim, we propose a joint visual-semantic embedding that can automatically align illustrations and textual elements without paired supervision. This is achieved by transferring the knowledge learned on ordinary visual-semantic datasets to the artistic domain. Experiments, performed on two datasets specifically designed for this domain, validate the proposed strategies and quantify the domain shift between natural images and artworks.

2020 Articolo su rivista

Meshed-Memory Transformer for Image Captioning

Authors: Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita

Published in: PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability … (Read full abstract)

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability to multi-modal contexts like image captioning, however, is still largely under-explored. With the aim of filling this gap, we present M² - a Meshed Transformer with Memory for Image Captioning. The architecture improves both the image encoding and the language generation steps: it learns a multi-level representation of the relationships between image regions integrating learned a priori knowledge, and uses a mesh-like connectivity at decoding stage to exploit low- and high-level features. Experimentally, we investigate the performance of the M² Transformer and different fully-attentive models in comparison with recurrent ones. When tested on COCO, our proposal achieves a new state of the art in single-model and ensemble configurations on the "Karpathy" test split and on the online test server. We also assess its performances when describing objects unseen in the training set. Trained models and code for reproducing the experiments are publicly available at :https://github.com/aimagelab/meshed-memory-transformer.

2020 Relazione in Atti di Convegno

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

Authors: Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which … (Read full abstract)

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to real contexts. In this paper, we propose a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands. Our model is inspired by the Transformer model and employs only two Transformer layers in the encoding and decoding stages. Further, it incorporates a novel memory-aware encoding of image regions. Experiments demonstrate that our approach achieves competitive results in terms of caption quality while featuring reduced computational demands. Further, to evaluate its applicability on autonomous agents, we conduct experiments on simulated scenes taken from the perspective of domestic robots.

2020 Relazione in Atti di Convegno

Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling

Authors: Bolelli, Federico; Allegretti, Stefano; Baraldi, Lorenzo; Grana, Costantino

Published in: IEEE TRANSACTIONS ON IMAGE PROCESSING

Connected Components Labeling is an essential step of many Image Processing and Computer Vision tasks. Since the first proposal of … (Read full abstract)

Connected Components Labeling is an essential step of many Image Processing and Computer Vision tasks. Since the first proposal of a labeling algorithm, which dates back to the sixties, many approaches have optimized the computational load needed to label an image. In particular, the use of decision forests and state prediction have recently appeared as valuable strategies to improve performance. However, due to the overhead of the manual construction of prediction states and the size of the resulting machine code, the application of these strategies has been restricted to small masks, thus ignoring the benefit of using a block-based approach. In this paper, we combine a block-based mask with state prediction and code compression: the resulting algorithm is modeled as a Directed Rooted Acyclic Graph with multiple entry points, which is automatically generated without manual intervention. When tested on synthetic and real datasets, in comparison with optimized implementations of state-of-the-art algorithms, the proposed approach shows superior performance, surpassing the results obtained by all compared approaches in all settings.

2020 Articolo su rivista

Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms

Authors: Bolelli, Federico; Cancilla, Michele; Baraldi, Lorenzo; Grana, Costantino

Published in: JOURNAL OF REAL-TIME IMAGE PROCESSING

The problem of labeling the connected components of a binary image is well-defined and several proposals have been presented in … (Read full abstract)

The problem of labeling the connected components of a binary image is well-defined and several proposals have been presented in the past. Since an exact solution to the problem exists, algorithms mainly differ on their execution speed. In this paper, we propose and describe YACCLAB, Yet Another Connected Components Labeling Benchmark. Together with a rich and varied dataset, YACCLAB contains an open source platform to test new proposals and to compare them with publicly available competitors. Textual and graphical outputs are automatically generated for many kinds of tests, which analyze the methods from different perspectives. An extensive set of experiments among state-of-the-art techniques is reported and discussed.

2020 Articolo su rivista

A Deep-learning-based approach to VM behavior Identification in Cloud Systems

Authors: Stefanini, M.; Lancellotti, R.; Baraldi, L.; Calderara, S.

2019 Relazione in Atti di Convegno

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

Authors: Tomei, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would … (Read full abstract)

The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real.

2019 Relazione in Atti di Convegno

Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain

Authors: Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models … (Read full abstract)

As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios. In this paper, we address the problem of cross-modal retrieval of images and sentences coming from the artistic domain. To this aim, we collect and manually annotate the Artpedia dataset that contains paintings and textual sentences describing both the visual content of the paintings and other contextual information. Thus, the problem is not only to match images and sentences, but also to identify which sentences actually describe the visual content of a given image. To this end, we devise a visual-semantic model that jointly addresses these two challenges by exploiting the latent alignment between visual and textual chunks. Experimental evaluations, obtained by comparing our model to different baselines, demonstrate the effectiveness of our solution and highlight the challenges of the proposed dataset. The Artpedia dataset is publicly available at: http://aimagelab.ing.unimore.it/artpedia.

2019 Relazione in Atti di Convegno

Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes

Authors: Bolelli, Federico; Cancilla, Michele; Baraldi, Lorenzo; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper we describe the algorithmic implementation details of "Connected Components Labeling on DRAGs'' (Directed Rooted Acyclic Graphs), studying … (Read full abstract)

In this paper we describe the algorithmic implementation details of "Connected Components Labeling on DRAGs'' (Directed Rooted Acyclic Graphs), studying the influence of parameters on the results. Moreover, a detailed description of how to install, setup and use YACCLAB (Yet Another Connected Components LAbeling Benchmark) to test DRAG is provided.

2019 Relazione in Atti di Convegno

Page 10 of 15 • Total publications: 144