Publications - AImageLab

Mining textural knowledge in biological images: applications, methods and trends

Authors: Di Cataldo, Santa; Ficarra, Elisa

Published in: COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Texture analysis is a major task in many areas of computer vision and pattern recognition, including biological imaging. Indeed, visual … (Read full abstract)

Texture analysis is a major task in many areas of computer vision and pattern recognition, including biological imaging. Indeed, visual textures can be exploited to distinguish specific tissues or cells in a biological sample, to highlight chemical reactions between molecules, as well as to detect subcellular patterns that can be evidence of certain pathologies. This makes automated texture analysis fundamental in many applications of biomedicine, such as the accurate detection and grading of multiple types of cancer, the differential diagnosis of autoimmune diseases, or the study of physiological processes. Due to their specific characteristics and challenges, the design of texture analysis systems for biological images has attracted ever-growing attention in the last few years. In this paper, we perform a critical review of this important topic. First, we provide a general definition of texture analysis and discuss its role in the context of bioimaging, with examples of applications from the recent literature. Then, we review the main approaches to automated texture analysis, with special attention to the methods of feature extraction and encoding that can be successfully applied to microscopy images of cells or tissues. Our aim is to provide an overview of the state of the art, as well as a glimpse into the latest and future trends of research in this area.

2017 Articolo su rivista

DOI IRIS

Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild

Authors: Pini, Stefano; Ben Ahmed, Olfa; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita; Huet, Benoit

In this paper, we propose a multimodal deep learning architecture for emotion recognition in video regarding our participation to the … (Read full abstract)

In this paper, we propose a multimodal deep learning architecture for emotion recognition in video regarding our participation to the audio-video based sub-challenge of the Emotion Recognition in the Wild 2017 challenge. Our model combines cues from multiple video modalities, including static facial features, motion patterns related to the evolution of the human expression over time, and audio information. Specifically, it is composed of three sub-networks trained separately: the first and second ones extract static visual features and dynamic patterns through 2D and 3D Convolutional Neural Networks (CNN), while the third one consists in a pretrained audio network which is used to extract useful deep acoustic signals from video. In the audio branch, we also apply Long Short Term Memory (LSTM) networks in order to capture the temporal evolution of the audio features. To identify and exploit possible relationships among different modalities, we propose a fusion network that merges cues from the different modalities in one representation. The proposed architecture outperforms the challenge baselines (38.81% and 40.47%): we achieve an accuracy of 50.39% and 49.92% respectively on the validation and the testing data.

2017 Relazione in Atti di Convegno

DOI IRIS

NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

In the last years video has been swamping the Internet: websites, social networks, and business multimedia systems are adopting video … (Read full abstract)

In the last years video has been swamping the Internet: websites, social networks, and business multimedia systems are adopting video as the most important form of communication and information. Video are normally accessed as a whole and are not indexed in the visual content. Thus, they are often uploaded as short, manually cut clips with user-provided annotations, keywords and tags for retrieval. In this paper, we propose a prototype multimedia system which addresses these two limitations: it overcomes the need of human intervention in the video setting, thanks to fully deep learning-based solutions, and decomposes the storytelling structure of the video into coherent parts. These parts can be shots, key-frames, scenes and semantically related stories, and are exploited to provide an automatic annotation of the visual content, so that parts of video can be easily retrieved. This also allows a principled re-use of the video itself: users of the platform can indeed produce new storytelling by means of multi-modal presentations, add text and other media, and propose a different visual organization of the content. We present the overall solution, and some experiments on the re-use capability of our platform in edutainment by conducting an extensive user valuation %with students from primary schools.

2017 Relazione in Atti di Convegno

DOI IRIS

Personalized Egocentric Video Summarization of Cultural Tour on User Preferences Input

Authors: Varini, P.; Serra, G.; Cucchiara, R.

Published in: IEEE TRANSACTIONS ON MULTIMEDIA

In this paper, we propose a new method for customized summarization of egocentric videos according to specific user preferences, so … (Read full abstract)

In this paper, we propose a new method for customized summarization of egocentric videos according to specific user preferences, so that different users can extract different summaries from the same stream. Our approach, tailored on a cultural heritage scenario, relies on creating a short synopsis of the original video focused on key shots, in which concepts relevant to user preferences can be visually detected and the chronological flow of the original video is preserved. Moreover, we release a new dataset, composed of egocentric streams taken in uncontrolled scenarios, capturing tourists cultural visits in six art cities, with geolocalization information. Our experimental results show that the proposed approach is able to leverage user's preferences with an accent on storyline chronological flow and on visual smoothness.

2017 Articolo su rivista

DOI IRIS

Pixel classification methods to detect skin lesions on dermoscopic medical images

Authors: Balducci, Fabrizio; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

In recent years the interest of biomedical and computer vision communities in acquisition and analysis of epidermal images increased because … (Read full abstract)

In recent years the interest of biomedical and computer vision communities in acquisition and analysis of epidermal images increased because melanoma is one of the deadliest form of skin cancer and its early identification could save lives reducing unnecessary medical treatments. User-friendly automatic tools can be very useful for physicians and dermatologists in fact high-resolution images and their annotated data, combined with analysis pipelines and machine learning techniques, represent the base to develop intelligent and proactive diagnostic systems. In this work we present two skin lesion detection pipelines on dermoscopic medical images, by exploiting standard techniques combined with workarounds that improve results; moreover to highlight the performance we consider a set of metrics combined with pixel labeling and classification. A preliminary but functional evaluation phase has been conducted with a sub-set of hard-to-treat images, in order to check which proposed detection pipeline reaches the best results.

2017 Relazione in Atti di Convegno

DOI IRIS

POSEidon: Face-from-Depth for Driver Pose Estimation

Authors: Borghi, Guido; Venturelli, Marco; Vezzani, Roberto; Cucchiara, Rita

Published in: PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging … (Read full abstract)

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second.

2017 Relazione in Atti di Convegno

DOI IRIS

Preface

Authors: Grana, C.; Baraldi, L.

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

2017 Relazione in Atti di Convegno

IRIS

Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks

Authors: Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON MULTIMEDIA

In this paper, we propose a novel scene detection algorithm which employs semantic, visual, textual and audio cues. We also … (Read full abstract)

In this paper, we propose a novel scene detection algorithm which employs semantic, visual, textual and audio cues. We also show how the hierarchical decomposition of the storytelling video structure can improve retrieval results presentation with semantically and aesthetically effective thumbnails. Our method is built upon two advancements of the state of the art: 1) semantic feature extraction which builds video specific concept detectors; 2) multimodal feature embedding learning, that maps the feature vector of a shot to a space in which the Euclidean distance has task specific semantic properties. The proposed method is able to decompose the video in annotated temporal segments which allow for a query specific thumbnail extraction. Extensive experiments are performed on different data sets to demonstrate the effectiveness of our algorithm. An in-depth discussion on how to deal with the subjectivity of the task is conducted and a strategy to overcome the problem is suggested.

2017 Articolo su rivista

DOI IRIS

Right putamen and age are the most discriminant features to diagnose Parkinson's disease by using 123I-FP-CIT brain SPET data by using an artificial neural network classifier, a classification tree (ClT)

Authors: Cascianelli, S; Tranfaglia, C; Fravolini, Ml; Bianconi, F; Minestrini, M; Nuvoli, S; Tambasco, N; Dottorini, Me; Palumbo, B

Published in: HELLENIC JOURNAL OF NUCLEAR MEDICINE

2017 Abstract in Rivista

IRIS

Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features

Authors: Cascianelli, Silvia; Costante, Gabriele; Bellocchio, Enrico; Valigi, Paolo; Fravolini, Mario L; Ciarfuglia, Thomas A

Published in: ROBOTICS AND AUTONOMOUS SYSTEMS

2017 Articolo su rivista

DOI IRIS