Publications - AImageLab

Deep Learning-Based Method for Vision-Guided Robotic Grasping of Unknown Objects

Authors: Bergamini, Luca; Sposato, Mario; Peruzzini, Margherita; Vezzani, Roberto; Pellicciari, Marcello

Published in: ADVANCES IN TRANSDISCIPLINARY ENGINEERING

Collaborative robots must operate safely and efficiently in ever-changing unstructured environments, grasping and manipulating many different objects. Artificial vision has … (Read full abstract)

Collaborative robots must operate safely and efficiently in ever-changing unstructured environments, grasping and manipulating many different objects. Artificial vision has proved to be collaborative robots' ideal sensing technology and it is widely used for identifying the objects to manipulate and for detecting their optimal grasping. One of the main drawbacks of state of the art robotic vision systems is the long training needed for teaching the identification and optimal grasps of each object, which leads to a strong reduction of the robot productivity and overall operating flexibility. To overcome such limit, we propose an engineering method, based on deep learning techniques, for the detection of the robotic grasps of unknown objects in an unstructured environment, which should enable collaborative robots to autonomously generate grasping strategies without the need of training and programming. A novel loss function for the training of the grasp prediction network has been developed and proved to work well also with low resolution 2-D images, then allowing the use of a single, smaller and low cost camera, that can be better integrated in robotic end-effectors. Despite the availability of less information (resolution and depth) a 75% of accuracy has been achieved on the Cornell data set and it is shown that our implementation of the loss function does not suffer of the common problems reported in literature. The system has been implemented using the ROS framework and tested on a Baxter collaborative robot.

2018 Relazione in Atti di Convegno

DOI IRIS

DEEP METRIC AND HASH-CODE LEARNING FOR CONTENT-BASED RETRIEVAL OF REMOTE SENSING IMAGES

Authors: Roy, S; Sangineto, E; Demir, B; Sebe, N

The growing volume of Remote Sensing (RS) image archives demands for feature learning techniques and hashing functions which can: (1) … (Read full abstract)

The growing volume of Remote Sensing (RS) image archives demands for feature learning techniques and hashing functions which can: (1) accurately represent the semantics in the RS images; and (2) have quasi real-time performance during retrieval. This paper aims to address both challenges at the same time, by learning a semantic-based metric space for content based RS image retrieval while simultaneously producing binary hash codes for an efficient archive search. This double goal is achieved by training a deep network using a combination of different loss functions which, on the one hand, aim at clustering semantically similar samples (i.e., images), and, on the other hand, encourage the network to produce final activation values (i.e., descriptors) that can be easily binarized. Moreover, since RS annotated training images are too few to train a deep network from scratch, we propose to split the image representation problem in two different phases. In the first we use a general-purpose, pre-trained network to produce an intermediate representation, and in the second we train our hashing network using a relatively small set of training images. Experiments on two aerial benchmark archives show that the proposed method outperforms previous state-of-the-art hashing approaches by up to 5.4% using the same number of hash bits per image.

2018 Relazione in Atti di Convegno

DOI IRIS

Deformable GANs for Pose-Based Human Image Generation

Authors: Siarohin, Aliaksandr; Sangineto, Enver; Lathuiliere, Stephane; Sebe, Nicu

In this paper we address the problem of generating person images conditioned on a given pose. Specifically, given an image … (Read full abstract)

In this paper we address the problem of generating person images conditioned on a given pose. Specifically, given an image of a person and a target pose, we synthesize a new image of that person in the novel pose. In order to deal with pixel-to-pixel misalignments caused by the pose differences, we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreover, a nearest-neighbour loss is proposed instead of the common L 1 and L 2 losses in order to match the details of the generated image with the target image. We test our approach using photos of persons in different poses and we compare our method with previous work in this area showing state-of-the-art results in two benchmarks. Our method can be applied to the wider field of deformable object generation, provided that the pose of the articulated object can be extracted using a keypoint detector.

2018 Relazione in Atti di Convegno

DOI IRIS

Dimensionality reduction strategies for cnn-based classification of histopathological images

Authors: Cascianelli, Silvia; Bello-Cerezo, Raquel; Bianconi, Francesco; Fravolini, Mario L; Belal, Mehdi; Palumbo, Barbara; Kather, Jakob N

2018 Relazione in Atti di Convegno

DOI IRIS

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Authors: Fabbri, Matteo; Borghi, Guido; Lanzi, Fabio; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer … (Read full abstract)

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer is not. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available for darkness of difficult luminance conditions. Experimental results are very promising and are as far as better than previous proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

2018 Relazione in Atti di Convegno

IRIS

Full-GRU Natural Language Video Description for Service Robotics Applications

Authors: Cascianelli, Silvia; Costante, Gabriele; Ciarfuglia, Thomas Alessandro; Valigi, Paolo; Fravolini, Mario Luca

Published in: IEEE ROBOTICS AND AUTOMATION LETTERS

2018 Articolo su rivista

DOI IRIS

Fully Convolutional Network for Head Detection with Depth Images

Authors: Ballotta, Diego; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also … (Read full abstract)

Head detection and localization are one of most investigated and demanding tasks of the Computer Vision community. These are also a key element for many disciplines, like Human Computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.

2018 Relazione in Atti di Convegno

IRIS

geneEX a novel tool to assess differential expression from gene and exon sequencing data

Authors: Scicolone, Orazio Maria; Paciello, Giulia; Ficarra, Elisa

2018 Relazione in Atti di Convegno

DOI IRIS

Guest editorial: Special section on “multimedia understanding via multimodal analytics”

Authors: Yan, Y.; Nie, L.; Cucchiara, R.

Published in: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS

2018 Articolo su rivista

DOI IRIS

Hand-designed local image descriptors vs. off-the-shelf CNN-based features for texture classification: an experimental comparison

Authors: Bello-Cerezo, Raquel; Bianconi, Francesco; Cascianelli, Silvia; Fravolini, Mario Luca; Di Maria, Francesco; Smeraldi, Fabrizio

2018 Relazione in Atti di Convegno

DOI IRIS