Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

What was Monet seeing while painting? Translating artworks to photo-realistic images

Authors: Tomei, Matteo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured … (Read full abstract)

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured from the world as it is. This brings to an incompatibility between such methods and digital data from the artistic domain, on which current techniques under-perform. A possible solution is to reduce the domain shift at the pixel level, thus translating artistic images to realistic copies. In this paper, we present a model capable of translating paintings to photo-realistic images, trained without paired examples. The idea is to enforce a patch level similarity between real and generated images, aiming to reproduce photo-realistic details from a memory bank of real images. This is subsequently adopted in the context of an unpaired image-to-image translation framework, mapping each image from one distribution to a new one belonging to the other distribution. Qualitative and quantitative results are presented on Monet, Cezanne and Van Gogh paintings translation tasks, showing that our approach increases the realism of generated images with respect to the CycleGAN approach.

2019 Relazione in Atti di Convegno

Aligning Text and Document Illustrations: towards Visually Explainable Digital Humanities

Authors: Baraldi, Lorenzo; Cornia, Marcella; Grana, Costantino; Cucchiara, Rita

While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities … (Read full abstract)

While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities domain, which, nevertheless, is a rich source of visual and textual data. To foster research in this direction, we investigate the learning of visual-semantic embeddings for historical document illustrations, devising both supervised and semi-supervised approaches. We exploit the joint visual-semantic embeddings to automatically align illustrations and textual elements, thus providing an automatic annotation of the visual content of a manuscript. Experiments are performed on the Borso d'Este Holy Bible, one of the most sophisticated illuminated manuscript from the Renaissance, which we manually annotate aligning every illustration with textual commentaries written by experts. Experimental results quantify the domain shift between ordinary visual-semantic datasets and the proposed one, validate the proposed strategies, and devise future works on the same line.

2018 Relazione in Atti di Convegno

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

Authors: Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: INTELLIGENZA ARTIFICIALE

Estimating the focus of attention of a person looking at an image or a video is a crucial step which … (Read full abstract)

Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades both in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we discuss the effectiveness of convolutional neural networks (CNNs) models in saliency prediction. We present a set of Deep Learning architectures developed by us, which can combine both bottom-up cues and higher-level semantics, and extract spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. We will show how these deep networks closely recall the early saliency models, although improved with the semantics learned from the human ground-truth. Eventually, we will present a use-case in which saliency prediction is used to improve the automatic description of images.

2018 Articolo su rivista

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts

Authors: Cornia, Marcella; Pini, Stefano; Baraldi, Lorenzo; Cucchiara, Rita

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

Automatic image cropping techniques are particularly important to improve the visual quality of cropped images and can be applied to … (Read full abstract)

Automatic image cropping techniques are particularly important to improve the visual quality of cropped images and can be applied to a wide range of applications such as photo-editing, image compression, and thumbnail selection. In this paper, we propose a saliency-based image cropping method which produces significant cropped images by only relying on the corresponding saliency maps. Experiments on standard image cropping datasets demonstrate the benefit of the proposed solution with respect to other cropping methods. Moreover, we present an image selection method that can be effectively applied to automatically select the most representative pages of historical manuscripts thus improving the navigation of historical digital libraries.

2018 Relazione in Atti di Convegno

Comportamento non verbale intergruppi “oggettivo”: una replica dello studio di Dovidio, kawakami e Gaertner (2002)

Authors: Di Bernardo, Gian Antonio; Vezzali, Loris; Giovannini, Dino; Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Zambonelli, Franco; Cucchiara, Rita; Cadamuro, Alessia; Cocco, Veronica Margherita

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi … (Read full abstract)

Vi è una lunga tradizione di ricerca che ha analizzato il comportamento non verbale, anche considerando relazioni intergruppi. Solitamente, questi studi si avvalgono di valutazioni di coder esterni, che tuttavia sono soggettive e aperte a distorsioni. Abbiamo condotto uno studio in cui si è preso come riferimento il celebre studio di Dovidio, Kawakami e Gaertner (2002), apportando tuttavia alcune modifiche e considerando la relazione tra bianchi e neri. Partecipanti bianchi, dopo aver completato misure di pregiudizio esplicito e implicito, incontravano (in ordine contro-bilanciato) un collaboratore bianco e uno nero. Con ognuno di essi, parlavano per tre minuti di un argomento neutro e di un argomento saliente per la distinzione di gruppo (in ordine contro-bilanciato). Tali interazioni erano registrate con una telecamera kinect, che è in grado di tenere conto della componente tridimensionale del movimento. I risultati hanno rivelato vari elementi di interesse. Anzitutto, si sono creati indici oggettivi, a partire da un’analisi della letteratura, alcuni dei quali non possono essere rilevati da coder esterni, quali distanza interpersonale e volume di spazio tra le persone. I risultati hanno messo in luce alcuni aspetti rilevanti: (1) l’atteggiamento implicito è associato a vari indici di comportamento non verbale, i quali mediano sulle valutazioni dei partecipanti fornite dai collaboratori; (2) le interazioni vanno considerate in maniera dinamica, tenendo conto che si sviluppano nel tempo; (3) ciò che può essere importante è il comportamento non verbale globale, piuttosto che alcuni indici specifici pre-determinati dagli sperimentatori.

2018 Abstract in Atti di Convegno

Deep Head Pose Estimation from Depth Data for In-car Automotive Applications

Authors: Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the … (Read full abstract)

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, extit{Biwi Kinect Head Pose}, showing that our approach achieves state-of-art results and is able to meet real time performance requirements.

2018 Relazione in Atti di Convegno

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Authors: Fabbri, Matteo; Borghi, Guido; Lanzi, Fabio; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer … (Read full abstract)

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer is not. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available for darkness of difficult luminance conditions. Experimental results are very promising and are as far as better than previous proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

2018 Relazione in Atti di Convegno

Fully Convolutional Network for Head Detection with Depth Images

Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also … (Read full abstract)

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also a key element for many disciplines, like Human Computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.

2018 Relazione in Atti di Convegno

Guest editorial: Special section on “multimedia understanding via multimodal analytics”

Authors: Yan, Y.; Nie, L.; Cucchiara, R.

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

2018 Articolo su rivista

Hands on the wheel: a Dataset for Driver Hand Detection and Tracking

Authors: Borghi, Guido; Frigieri, Elia; Vezzani, Roberto; Cucchiara, Rita

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person … (Read full abstract)

The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person behavior, attitude and interactions. In particular, this is true for the automotive context, in which hand analysis allows to predict preparatory movements for maneuvers or to investigate the driver’s attention level. Moreover, due to the recent diffusion of cameras inside new car cockpits, it is feasible to use hand gestures to develop new Human-Car Interaction systems, more user-friendly and safe. In this paper, we propose a new dataset, called Turms, that consists of infrared images of driver’s hands, collected from the back of the steering wheel, an innovative point of view. The Leap Motion device has been selected for the recordings, thanks to its stereo capabilities and the wide view-angle. Besides, we introduce a method to detect the presence and the location of driver’s hands on the steering wheel, during driving activity tasks.

2018 Relazione in Atti di Convegno

Page 21 of 51 • Total publications: 505