Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Spotting Insects from Satellites: Modeling the Presence of Culicoides Imicola Through Deep CNNs

Authors: Vincenzi, Stefano; Porrello, Angelo; Buzzega, Pietro; Conte, Annamaria; Ippoliti, Carla; Candeloro, Luca; Di Lorenzo, Alessio; Capobianco Dondona, Andrea; Calderara, Simone

Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, … (Read full abstract)

Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, several surveillance plans have been put in place for limiting the spread of such diseases, typically involving on-field measurements. Such a systematic and effective plan still misses, due to the high costs and efforts required for implementing it. Ideally, any attempt in this field should consider the triangle vectors-host-pathogen, which is strictly linked to the environmental and climatic conditions. In this paper, we exploit satellite imagery from Sentinel-2 mission, as we believe they encode the environmental factors responsible for the vector's spread. Our analysis - conducted in a data-driver fashion - couples spectral images with ground-truth information on the abundance of Culicoides imicola. In this respect, we frame our task as a binary classification problem, underpinning Convolutional Neural Networks (CNNs) as being able to learn useful representation from multi-band images. Additionally, we provide a multi-instance variant, aimed at extracting temporal patterns from a short sequence of spectral images. Experiments show promising results, providing the foundations for novel supportive tools, which could depict where surveillance and prevention measures could be prioritized.

2019 Relazione in Atti di Convegno

Towards Cycle-Consistent Models for Text and Image Retrieval

Authors: Cornia, Marcella; Baraldi, Lorenzo; Rezazadegan Tavakoli, Hamed; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn … (Read full abstract)

Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn a joint multi-modal embedding space in which text and images could be projected and compared. Here we investigate a different approach, and reformulate the problem of cross-modal retrieval as that of learning a translation between the textual and visual domain. In particular, we propose an end-to-end trainable model which can translate text into image features and vice versa, and regularizes this mapping with a cycle-consistency criterion. Preliminary experimental evaluations show promising results with respect to ordinary visual-semantic models.

2019 Relazione in Atti di Convegno

Training adversarial discriminators for cross-channel abnormal event detection in crowds

Authors: Ravanbakhsh, M.; Sangineto, E.; Nabi, M.; Sebe, N.

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the … (Read full abstract)

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the lack of sufficient abnormal ground truth data makes end-to-end training of large deep networks hard in this domain. In this paper we propose to use Generative Adversarial Nets (GANs), which are trained to generate only the normal distribution of the data. During the adversarial GAN training, a discriminator (D) is used as a supervisor for the generator network(G) and vice versa. At testing time we use D to solve our discriminative task (abnormality detection), where D has been trained without the need of manually-annotated abnormal data. Moreover, in order to prevent G learn a trivial identity function, we use a cross-channel approach, forcing G to transform raw-pixel data in motion information and vice versa. The quantitative results on standard benchmarks show that our method outperforms previous state-of-the-art methods in both the frame-level and the pixel-level evaluation.

2019 Relazione in Atti di Convegno

Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss

Authors: Roy, Subhankar; Siarohin, Aliaksandr; Sangineto, Enver; Bulo, Samuel Rota; Sebe, Nicu; Ricci, Elisa

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

A classifier trained on a dataset seldom works on other datasets obtained under different conditions due to domain shift. This … (Read full abstract)

A classifier trained on a dataset seldom works on other datasets obtained under different conditions due to domain shift. This problem is commonly addressed by domain adaptation methods. In this work we introduce a novel deep learning framework which unifies different paradigms in unsupervised domain adaptation. Specifically, we propose domain alignment layers which implement feature whitening for the purpose of matching source and target feature distributions. Additionally, we leverage the unlabeled target data by proposing the Min-Entropy Consensus loss, which regularizes training while avoiding the adoption of many user-defined hyper-parameters. We report results on publicly available datasets, considering both digit classification and object recognition tasks. We show that, in most of our experiments, our approach improves upon previous methods, setting new state-of-the-art performances.

2019 Relazione in Atti di Convegno

Video synthesis from Intensity and Event Frames

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the … (Read full abstract)

Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the difficulty of applying traditional computer vision algorithms on event data limits their usability. Therefore, in this paper we investigate the use of a deep learning-based architecture that combines an initial grayscale frame and a series of event data to estimate the following intensity frames. In particular, a fully-convolutional encoder-decoder network is employed and evaluated for the frame synthesis task on an automotive event-based dataset. Performance obtained with pixel-wise metrics confirms the quality of the images synthesized by the proposed architecture.

2019 Relazione in Atti di Convegno

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

Authors: Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, … (Read full abstract)

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.

2019 Relazione in Atti di Convegno

What was Monet seeing while painting? Translating artworks to photo-realistic images

Authors: Tomei, Matteo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured … (Read full abstract)

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured from the world as it is. This brings to an incompatibility between such methods and digital data from the artistic domain, on which current techniques under-perform. A possible solution is to reduce the domain shift at the pixel level, thus translating artistic images to realistic copies. In this paper, we present a model capable of translating paintings to photo-realistic images, trained without paired examples. The idea is to enforce a patch level similarity between real and generated images, aiming to reproduce photo-realistic details from a memory bank of real images. This is subsequently adopted in the context of an unpaired image-to-image translation framework, mapping each image from one distribution to a new one belonging to the other distribution. Qualitative and quantitative results are presented on Monet, Cezanne and Van Gogh paintings translation tasks, showing that our approach increases the realism of generated images with respect to the CycleGAN approach.

2019 Relazione in Atti di Convegno

Whitening and coloring batch transform for GANS

Authors: Siarohin, A.; Sangineto, E.; Sebe, N.

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters … (Read full abstract)

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters of BN are commonly used in conditional Generative Adversarial Networks (cGANs) for representing class-specific information using conditional Batch Normalization (cBN). In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch normalization. We show that our conditional Coloring can represent categorical conditioning information which largely helps the cGAN qualitative results. Moreover, we show that full-feature whitening is important in a general GAN scenario in which the training process is known to be highly unstable. We test our approach on different datasets and using different GAN networks and training protocols, showing a consistent improvement in all the tested frameworks. Our CIFAR-10 conditioned results are higher than all previous works on this dataset.

2019 Relazione in Atti di Convegno

Worldly eyes on video: Learnt vs. reactive deployment of attention to dynamic stimuli

Authors: Cuculo, V.; D'Amelio, A.; Grossi, G.; Lanzarotti, R.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Computational visual attention is a hot topic in computer vision. However, most efforts are devoted to model saliency, whilst the … (Read full abstract)

Computational visual attention is a hot topic in computer vision. However, most efforts are devoted to model saliency, whilst the actual eye guidance problem, which brings into play the sequence of gaze shifts characterising overt attention, is overlooked. Further, in those cases where the generation of gaze behaviour is considered, stimuli of interest are by and large static (still images) rather than dynamic ones (videos). Under such circumstances, the work described in this note has a twofold aim: (i) addressing the problem of estimating and generating visual scan paths, that is the sequences of gaze shifts over videos; (ii) investigating the effectiveness in scan path generation offered by features dynamically learned on the base of human observers attention dynamics as opposed to bottom-up derived features. To such end a probabilistic model is proposed. By using a publicly available dataset, our approach is compared against a model of scan path simulation that does not rely on a learning step.

2019 Relazione in Atti di Convegno

A Hierarchical Quasi-Recurrent approach to Video Captioning

Authors: Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they … (Read full abstract)

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.

2018 Relazione in Atti di Convegno

Page 48 of 106 • Total publications: 1054