Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain

Authors: Stefanini, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models … (Read full abstract)

As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios. In this paper, we address the problem of cross-modal retrieval of images and sentences coming from the artistic domain. To this aim, we collect and manually annotate the Artpedia dataset that contains paintings and textual sentences describing both the visual content of the paintings and other contextual information. Thus, the problem is not only to match images and sentences, but also to identify which sentences actually describe the visual content of a given image. To this end, we devise a visual-semantic model that jointly addresses these two challenges by exploiting the latent alignment between visual and textual chunks. Experimental evaluations, obtained by comparing our model to different baselines, demonstrate the effectiveness of our solution and highlight the challenges of the proposed dataset. The Artpedia dataset is publicly available at: http://aimagelab.ing.unimore.it/artpedia.

2019 Relazione in Atti di Convegno

Can adversarial networks hallucinate occluded people with a plausible aspect?

Authors: Fulgeri, F.; Fabbri, Matteo; Alletto, Stefano; Calderara, S.; Cucchiara, R.

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used … (Read full abstract)

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used to recognize, re-identify or simply classify him or her. You can imagine its appearance given your experience, nothing more. Similarly, AI solutions can try to hallucinate missing information with specific deep learning architectures, suitably trained with people with and without occlusions. The goal of this work is to generate a complete image of a person, given an occluded version in input, that should be a) without occlusion b) similar at pixel level to a completely visible people shape c) capable to conserve similar visual attributes (e.g. male/female) of the original one. For the purpose, we propose a new approach by integrating the state-of-the-art of neural network architectures, namely U-nets and GANs, as well as discriminative attribute classification nets, with an architecture specifically designed to de-occlude people shapes. The network is trained to optimize a Loss function which could take into account the aforementioned objectives. As well we propose two datasets for testing our solution: the first one, occluded RAP, created automatically by occluding real shapes of the RAP dataset created by Li et al. (2016) (which collects also attributes of the people aspect); the second is a large synthetic dataset, AiC, generated in computer graphics with data extracted from the GTA video game, that contains 3D data of occluded objects by construction. Results are impressive and outperform any other previous proposal. This result could be an initial step to many further researches to recognize people and their behavior in an open crowded world.

2019 Articolo su rivista

Classifying Signals on Irregular Domains via Convolutional Cluster Pooling

Authors: Porrello, Angelo; Abati, Davide; Calderara, Simone; Cucchiara, Rita

Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH

We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties … (Read full abstract)

We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains.

2019 Relazione in Atti di Convegno

Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes

Authors: Bolelli, Federico; Cancilla, Michele; Baraldi, Lorenzo; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In this paper we describe the algorithmic implementation details of "Connected Components Labeling on DRAGs'' (Directed Rooted Acyclic Graphs), studying … (Read full abstract)

In this paper we describe the algorithmic implementation details of "Connected Components Labeling on DRAGs'' (Directed Rooted Acyclic Graphs), studying the influence of parameters on the results. Moreover, a detailed description of how to install, setup and use YACCLAB (Yet Another Connected Components LAbeling Benchmark) to test DRAG is provided.

2019 Relazione in Atti di Convegno

Data-Based Design of Robust Fault Isolation Residuals Using LASSO optimization

Authors: Cascianelli, Silvia; Crocetti, Francesco; Costante, Gabriele; Valigi, Paolo; Fravolini, Mario Luca

2019 Relazione in Atti di Convegno

Dealing with Lack of Training Data for Convolutional Neural Networks: The Case of Digital Pathology

Authors: Ponzio, Francesco; Urgese, Gianvito; Ficarra, Elisa; Di Cataldo, Santa

Published in: ELECTRONICS

Thanks to their capability to learn generalizable descriptors directly from images, deep Convolutional Neural Networks (CNNs) seem the ideal solution … (Read full abstract)

Thanks to their capability to learn generalizable descriptors directly from images, deep Convolutional Neural Networks (CNNs) seem the ideal solution to most pattern recognition problems. On the other hand, to learn the image representation, CNNs need huge sets of annotated samples that are unfeasible in many every-day scenarios. This is the case, for example, of Computer-Aided Diagnosis (CAD) systems for digital pathology, where additional challenges are posed by the high variability of the cancerous tissue characteristics. In our experiments, state-of-the-art CNNs trained from scratch on histological images were less accurate and less robust to variability than a traditional machine learning framework, highlighting all the issues of fully training deep networks with limited data from real patients. To solve this problem, we designed and compared three transfer learning frameworks, leveraging CNNs pre-trained on non-medical images. This approach obtained very high accuracy, requiring much less computational resource for the training. Our findings demonstrate that transfer learning is a solution to the automated classification of histological samples and solves the problem of designing accurate and computationally-efficient CAD systems with limited training data.

2019 Articolo su rivista

Driver Face Verification with Depth Maps

Authors: Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita

Published in: SENSORS

Face verification is the task of checking if two provided images contain the face of the same person or not. … (Read full abstract)

Face verification is the task of checking if two provided images contain the face of the same person or not. In this work, we propose a fully-convolutional Siamese architecture to tackle this task, achieving state-of-the-art results on three publicly-released datasets, namely Pandora, High-Resolution Range-based Face Database (HRRFaceD), and CurtinFaces. The proposed method takes depth maps as the input, since depth cameras have been proven to be more reliable in different illumination conditions. Thus, the system is able to work even in the case of the total or partial absence of external light sources, which is a key feature for automotive applications. From the algorithmic point of view, we propose a fully-convolutional architecture with a limited number of parameters, capable of dealing with the small amount of depth data available for training and able to run in real time even on a CPU and embedded boards. The experimental results show acceptable accuracy to allow exploitation in real-world applications with in-board cameras. Finally, exploiting the presence of faces occluded by various head garments and extreme head poses available in the Pandora dataset, we successfully test the proposed system also during strong visual occlusions. The excellent results obtained confirm the efficacy of the proposed method.

2019 Articolo su rivista

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

Authors: Landi, Federico; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural … (Read full abstract)

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must perform a series of low-level actions, such as rotate, before stepping ahead. In this paper, we propose to exploit dynamic convolutional filters to encode the visual information and the lingual description in an efficient way. Differently from some previous works that abstract from the agent perspective and use high-level navigation spaces, we design a policy which decodes the information provided by dynamic convolution into a series of low-level, agent friendly actions. Results show that our model exploiting dynamic filters performs better than other architectures with traditional convolution, being the new state of the art for embodied VLN in the low-level action space. Additionally, we attempt to categorize recent work on VLN depending on their architectural choices and distinguish two main groups: we call them low-level actions and high-level actions models. To the best of our knowledge, we are the first to propose this analysis and categorization for VLN.

2019 Relazione in Atti di Convegno

End-to-end 6-DoF Object Pose Estimation through Differentiable Rasterization

Authors: Palazzi, Andrea; Bergamini, Luca; Calderara, Simone; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Here we introduce an approximated differentiable renderer to refine a 6-DoF pose prediction using only 2D alignment information. To this … (Read full abstract)

Here we introduce an approximated differentiable renderer to refine a 6-DoF pose prediction using only 2D alignment information. To this end, a two-branched convolutional encoder network is employed to jointly estimate the object class and its 6-DoF pose in the scene. We then propose a new formulation of an approximated differentiable renderer to re-project the 3D object on the image according to its predicted pose; in this way the alignment error between the observed and the re-projected object silhouette can be measured. Since the renderer is differentiable, it is possible to back-propagate through it to correct the estimated pose at test time in an online learning fashion. Eventually we show how to leverage the classification branch to profitably re-project a representative model of the predicted class (i.e. a medoid) instead. Each object in the scene is processed independently and novel viewpoints in which both objects arrangement and mutual pose are preserved can be rendered. Differentiable renderer code is available at:https://github.com/ndrplz/tensorflow-mesh-renderer.

2019 Relazione in Atti di Convegno

Experimental Prediction Intervals for Monitoring Wind Turbines: an Ensemble Approach

Authors: Cascianelli, Silvia; Astolfi, Davide; Costante, Gabriele; Castellani, Francesco; Fravolini, Mario Luca

2019 Relazione in Atti di Convegno

Page 44 of 106 • Total publications: 1054