Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

Bridging the experiential gap in cultural visits with computer vision

Authors: Cucchiara, R.; Del Bimbo, A.

This paper discusses the role of computer vision to bridge the experiential gap between the cultural and emotional experience of … (Read full abstract)

This paper discusses the role of computer vision to bridge the experiential gap between the cultural and emotional experience of the visitors in museums or cultural heritage sites. We don't argue against the use of multiple sensors to provide a more complete cultural experience but claim the primary role of computer vision for such a task. Although many research challenges are still far to be solved effectively, especially for detection, re-identification, tracking and recognition, we believe that technology can be deployed already in real contexts and support concrete applications with interesting results that will open the door to valuable future applications.

2016 Relazione in Atti di Convegno

Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager

Authors: Paci, Francesco; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita; Benini, Luca

Published in: LECTURE NOTES IN COMPUTER SCIENCE

With the increasing popularity of wearable cameras, such as GoPro or Narrative Clip, research on continuous activity monitoring from egocentric … (Read full abstract)

With the increasing popularity of wearable cameras, such as GoPro or Narrative Clip, research on continuous activity monitoring from egocentric cameras has received a lot of attention. Research in hardware and software is devoted to find new efficient, stable and long-time running solutions; however, devices are too power-hungry for truly always-on operation, and are aggressively duty-cycled to achieve acceptable lifetimes. In this paper we present a wearable system for context change detection based on an egocentric camera with ultra-low power consumption that can collect data 24/7. Although the resolution of the captured images is low, experimental results in real scenarios demonstrate how our approach, based on Siamese Neural Networks, can achieve visual context awareness. In particular, we compare our solution with hand-crafted features and with state of art technique and propose a novel and challenging dataset composed of roughly 30000 low-resolution images.

2016 Relazione in Atti di Convegno

Exploring Architectural Details Through aWearable Egocentric Vision Device

Authors: Alletto, Stefano; Abati, Davide; Serra, Giuseppe; Cucchiara, Rita

Published in: SENSORS

Augmented user experiences in the cultural heritage domain are in increasing demand by the new digital native tourists of 21st … (Read full abstract)

Augmented user experiences in the cultural heritage domain are in increasing demand by the new digital native tourists of 21st century. In this paper, we propose a novel solution that aims at assisting the visitor during an outdoor tour of a cultural site using the unique first person perspective of wearable cameras. In particular, the approach exploits computer vision techniques to retrieve the details by proposing a robust descriptor based on the covariance of local features. Using a lightweight wearable board the solution can localize the user with respect to the 3D point cloud of the historical landmark and provide him with information about the details he is currently looking at. Experimental results validate the method both in terms of accuracy and computational effort. Furthermore, user evaluation based on real-world experiments shows that the proposal is deemed effective in enriching a cultural experience.

2016 Articolo su rivista

Fast gesture recognition with Multiple StreamDiscrete HMMs on 3D Skeletons

Authors: Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high … (Read full abstract)

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel doublestage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify presegmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them.

2016 Relazione in Atti di Convegno

Historical Document Digitization through Layout Analysis and Deep Content Classification

Authors: Corbelli, Andrea; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita

Document layout segmentation and recognition is an important task in the creation of digitized documents collections, especially when dealing with … (Read full abstract)

Document layout segmentation and recognition is an important task in the creation of digitized documents collections, especially when dealing with historical documents. This paper presents an hybrid approach to layout segmentation as well as a strategy to classify document regions, which is applied to the process of digitization of an historical encyclopedia. Our layout analysis method merges a classic top-down approach and a bottom-up classification process based on local geometrical features, while regions are classified by means of features extracted from a Convolutional Neural Network merged in a Random Forest classifier. Experiments are conducted on the first volume of the ``Enciclopedia Treccani'', a large dataset containing 999 manually annotated pages from the historical Italian encyclopedia.

2016 Relazione in Atti di Convegno

Layout analysis and content enrichment of digitized books

Authors: Grana, Costantino; Serra, Giuseppe; Manfredi, Marco; Coppi, Dalia; Cucchiara, Rita

Published in: MULTIMEDIA TOOLS AND APPLICATIONS

In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus … (Read full abstract)

In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus opening ancient documents to young people and to make them available on the web with old and current content. We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text regions and differentiate them from pictorial elements. Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents.

2016 Articolo su rivista

Multi-Level Net: a Visual Saliency Prediction Model

Authors: Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

State of the art approaches for saliency prediction are based on Full Convolutional Networks, in which saliency maps are built … (Read full abstract)

State of the art approaches for saliency prediction are based on Full Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combination of features coming from different layers of the network. We also present a new loss function to deal with the imbalance issue on saliency masks. Extensive results on three public datasets demonstrate the robustness of our solution. Our model outperforms the state of the art on SALICON, which is the largest and unconstrained dataset available, and obtains competitive results on MIT300 and CAT2000 benchmarks.

2016 Relazione in Atti di Convegno

Optimizing image registration for interactive applications

Authors: Gasparini, Riccardo; Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among … (Read full abstract)

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among the different possibilities, we focus on the cultural heritage domain where a key step in the development applications for augmented cultural experiences is to obtain a precise localization of the user, i.e. the 6 degree-of-freedom of the camera acquiring the images used by the application. Current state of the art perform this task by extracting local descriptors from a query and exhaustively matching them to a sparse 3D model of the environment. While this procedure obtains good localization performance, due to the vast search space involved in the retrieval of 2D-3D correspondences this is often not feasible in real-time and interactive environments. In this paper we hence propose to perform descriptor quantization to reduce the search space and employ multiple KD-Trees combined with a principal component analysis dimensionality reduction to enable an efficient search. We experimentally show that our solution can halve the computational requirements of the correspondence search with regard to the state of the art while maintaining similar accuracy levels.

2016 Relazione in Atti di Convegno

Performance measures and a data set for multi-target, multi-camera tracking

Authors: Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C.

Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance … (Read full abstract)

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080 p, 60 fps video taken by 8 cameras observing more than 2, 700 identities over 85 min; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

2016 Relazione in Atti di Convegno

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Authors: Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri Vulpiani, Michele; Messori, Stefano; Ferri, Nicola

Published in: PLOS ONE

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation … (Read full abstract)

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

2016 Articolo su rivista

Page 26 of 51 • Total publications: 505