Publications - AImageLab

Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation

Authors: Liu, Yahui; Sangineto, Enver; Chen, Yajing; Bao, Linchao; Zhang, Haoxian; Sebe, Nicu; Lepri, Bruno; Wang, Wei; Nadai, Marco De

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

2021 Relazione in Atti di Convegno

DOI IRIS

TriGAN: image-to-image translation for multi-source domain adaptation

Authors: Roy, S.; Siarohin, A.; Sangineto, E.; Sebe, N.; Ricci, E.

Published in: MACHINE VISION AND APPLICATIONS

Most domain adaptation methods consider the problem of transferring knowledge to the target domain from a single-source dataset. However, in … (Read full abstract)

Most domain adaptation methods consider the problem of transferring knowledge to the target domain from a single-source dataset. However, in practical applications, we typically have access to multiple sources. In this paper we propose the first approach for multi-source domain adaptation (MSDA) based on generative adversarial networks. Our method is inspired by the observation that the appearance of a given image depends on three factors: the domain, the style (characterized in terms of low-level features variations) and the content. For this reason, we propose to project the source image features onto a space where only the dependence from the content is kept, and then re-project this invariant representation onto the pixel space using the target domain and style. In this way, new labeled images can be generated which are used to train a final target classifier. We test our approach using common MSDA benchmarks, showing that it outperforms state-of-the-art methods.

2021 Articolo su rivista

DOI IRIS

Whitening for Self-Supervised Representation Learning

Authors: Ermolov, A.; Siarohin, A.; Sangineto, E.; Sebe, N.

Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH

2021 Relazione in Atti di Convegno

IRIS

Attention-based Fusion for Multi-source Human Image Generation

Authors: Lathuiliere, Stephane; Sangineto, Enver; Siarohin, Aliaksandr; Sebe, Nicu

We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target … (Read full abstract)

We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target pose and a set X of source appearance images. In this way, we can exploit multiple, possibly complementary images of the same person which are usually available at training and at testing time. The solution we propose is mainly based on a local attention mechanism which selects relevant information from different source image regions, avoiding the necessity to build specific generators for each specific cardinality of X. The empirical evaluation of our method shows the practical interest of addressing the person-image generation problem in a multi-source setting.

2020 Relazione in Atti di Convegno

DOI IRIS

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

Authors: Zhang, Jichao; Chen, Jingjing; Tang, Hao; Wang, Wei; Yan, Yan; Sangineto, Enver; Sebe, Nicu

We address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need of … (Read full abstract)

We address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need of precise annotations of the gaze angle and the head pose. We created a new dataset called CelebAGaze consisting of two domains X, Y, where the eyes are either staring at the camera or somewhere else. Our method consists of three novel modules: the Gaze Correction module(GCM), the Gaze Animation module(GAM), and the Pretrained Autoencoder module (PAM). Specifically, GCM and GAM separately train a dual in-painting network using data from the domain X for gaze correction and data from the domain Y for gaze animation. Additionally, a Synthesis-As-Training method is proposed when training GAM to encourage the features encoded from the eye region to be correlated with the angle information, resulting in gaze animation achieved by interpolation in the latent space. To further preserve the identity information e.g., eye shape, iris color, we propose the PAM with an Autoencoder, which is based on Self-Supervised mirror learning where the bottleneck features are angle-invariant and which works as an extra input to the dual in-painting models. Extensive experiments validate the effectiveness of the proposed method for gaze correction and gaze animation in the wild and demonstrate the superiority of our approach in producing more compelling results than state-of-the-art baselines. Our code, the pretrained models and supplementary results are available at:https://github.com/zhangqianhui/GazeAnimation.

2020 Relazione in Atti di Convegno

DOI IRIS

Online Continual Learning under Extreme Memory Constraints

Authors: Fini, Enrico; Lathuilière, Stéphane; Sangineto, Enver; Nabi, Moin; Ricci, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

2020 Relazione in Atti di Convegno

DOI IRIS

Self Paced Deep Learning for Weakly Supervised Object Detection

Authors: Sangineto, E.; Nabi, M.; Culibrk, D.; Sebe, N.

Published in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not … (Read full abstract)

In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not available, most of the solutions proposed so far are based on an iterative, Multiple Instance Learning framework in which the current classifier is used to select the highest-confidence boxes in each image, which are treated as pseudo-ground truth in the next training iteration. However, the errors of an immature classifier can make the process drift, usually introducing many of false positives in the training dataset. To alleviate this problem, we propose in this paper a training protocol based on the self-paced learning paradigm. The main idea is to iteratively select a subset of images and boxes that are the most reliable, and use them for training. While in the past few years similar strategies have been adopted for SVMs and other classifiers, we are the first showing that a self-paced approach can be used with deep-network-based classifiers in an end-to-end training pipeline. The method we propose is built on the fully-supervised Fast-RCNN architecture and can be applied to similar architectures which represent the input image as a bag of boxes. We show state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013. OnILSVRC 2013 our results based on a low-capacity AlexNet network outperform even those weakly-supervised approaches which are based on much higher-capacity networks.

2019 Articolo su rivista

DOI IRIS

Training adversarial discriminators for cross-channel abnormal event detection in crowds

Authors: Ravanbakhsh, M.; Sangineto, E.; Nabi, M.; Sebe, N.

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the … (Read full abstract)

Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios.However, the ambiguity and the lack of sufficient abnormal ground truth data makes end-to-end training of large deep networks hard in this domain. In this paper we propose to use Generative Adversarial Nets (GANs), which are trained to generate only the normal distribution of the data. During the adversarial GAN training, a discriminator (D) is used as a supervisor for the generator network(G) and vice versa. At testing time we use D to solve our discriminative task (abnormality detection), where D has been trained without the need of manually-annotated abnormal data. Moreover, in order to prevent G learn a trivial identity function, we use a cross-channel approach, forcing G to transform raw-pixel data in motion information and vice versa. The quantitative results on standard benchmarks show that our method outperforms previous state-of-the-art methods in both the frame-level and the pixel-level evaluation.

2019 Relazione in Atti di Convegno

DOI IRIS

Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss

Authors: Roy, Subhankar; Siarohin, Aliaksandr; Sangineto, Enver; Bulo, Samuel Rota; Sebe, Nicu; Ricci, Elisa

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

A classifier trained on a dataset seldom works on other datasets obtained under different conditions due to domain shift. This … (Read full abstract)

A classifier trained on a dataset seldom works on other datasets obtained under different conditions due to domain shift. This problem is commonly addressed by domain adaptation methods. In this work we introduce a novel deep learning framework which unifies different paradigms in unsupervised domain adaptation. Specifically, we propose domain alignment layers which implement feature whitening for the purpose of matching source and target feature distributions. Additionally, we leverage the unlabeled target data by proposing the Min-Entropy Consensus loss, which regularizes training while avoiding the adoption of many user-defined hyper-parameters. We report results on publicly available datasets, considering both digit classification and object recognition tasks. We show that, in most of our experiments, our approach improves upon previous methods, setting new state-of-the-art performances.

2019 Relazione in Atti di Convegno

DOI IRIS

Whitening and coloring batch transform for GANS

Authors: Siarohin, A.; Sangineto, E.; Sebe, N.

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters … (Read full abstract)

Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters of BN are commonly used in conditional Generative Adversarial Networks (cGANs) for representing class-specific information using conditional Batch Normalization (cBN). In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch normalization. We show that our conditional Coloring can represent categorical conditioning information which largely helps the cGAN qualitative results. Moreover, we show that full-feature whitening is important in a general GAN scenario in which the training process is known to be highly unstable. We test our approach on different datasets and using different GAN networks and training protocols, showing a consistent improvement in all the tested frameworks. Our CIFAR-10 conditioned results are higher than all previous works on this dataset.

2019 Relazione in Atti di Convegno

IRIS

Publications by Enver Sangineto

Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation

TriGAN: image-to-image translation for multi-source domain adaptation

Whitening for Self-Supervised Representation Learning

Attention-based Fusion for Multi-source Human Image Generation

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

Online Continual Learning under Extreme Memory Constraints

Self Paced Deep Learning for Weakly Supervised Object Detection

Training adversarial discriminators for cross-channel abnormal event detection in crowds

Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss

Whitening and coloring batch transform for GANS