Publications by Simone Calderara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Simone Calderara

Using Kinect camera for investigating intergroup non-verbal human interactions

Authors: Vezzali, Loris; Di Bernardo, Gian Antonio; Cadamuro, Alessia; Cocco, Veronica Margherita; Crapolicchio, Eleonora; Bicocchi, Nicola; Calderara, Simone; Giovannini, Dino; Zambonelli, Franco; Cucchiara, Rita

A long tradition in social psychology focused on nonverbal behaviour displayed during dyadic interactions generally relying on evaluations from external … (Read full abstract)

A long tradition in social psychology focused on nonverbal behaviour displayed during dyadic interactions generally relying on evaluations from external coders. However, in addition to the fact that external coders may be biased, they may not capture certain type of behavioural indices. We designed three studies examining explicit and implicit prejudice as predictors of nonberval behaviour as reflected in objective indices provided by Kinect cameras. In the first study, we considered White-Black relations from the perspective of 36 White participants. Results revealed that implicit prejudice was associated with a reduction in interpersonal distance and in the volume of space between Whites and Blacks (vs. Whites and Whites), which in turn were associated with evaluations by collaborators taking part in the interaction. In the second study, 37 non-HIV participants interacted with HIV individuals. We found that implicit prejudice was associated with reduced volume of space between interactants over time (a process of bias overcorrection) only when they tried hard to control their behaviour (as captured by a stroop test). In the third study 35 non-disabled children interacted with disabled children. Results revealed that implicit prejudice was associated with reduced interpersonal distance over time.

2018 Abstract in Atti di Convegno

A new era in the study of intergroup nonverbal behaviour: Studying intergroup dyadic interactions “online”

Authors: Di Bernardo, Gian Antonio; Vezzali, Loris; Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Zambonelli, Franco; Cucchiara, Rita; Cadamuro, Alessia

We examined predictors and consequences of intergroup nonverbal behaviour by relying on new technologies and new objective indices. In three … (Read full abstract)

We examined predictors and consequences of intergroup nonverbal behaviour by relying on new technologies and new objective indices. In three studies, both in the laboratory and in the field with children, behaviour was a function of implicit prejudice.

2017 Abstract in Atti di Convegno

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

Authors: Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Estimating the focus of attention of a person looking at an image or a video is a crucial step which … (Read full abstract)

Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we propose a discussion on why convolutional neural networks (CNNs) are so accurate in saliency prediction. We present our DL architectures which combine both bottom-up cues and higher-level semantics, and incorporate the concept of time in the attentional process through LSTM recurrent architectures. Eventually, we present a video-specific architecture based on the C3D network, which can extracts spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. The merit of this work is to show how these deep networks are not mere brute-force methods tuned on massive amount of data, but represent well-defined architectures which recall very closely the early saliency models, although improved with the semantics learned by human ground-thuth.

2017 Relazione in Atti di Convegno

From Groups to Leaders and Back. Exploring Mutual Predictability Between Social Groups and Their Leaders

Authors: Solera, Francesco; Calderara, Simone; Cucchiara, Rita

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This … (Read full abstract)

Recently, social theories and empirical observations identified small groups and leaders as the basic elements which shape a crowd. This leads to an intermediate level of abstraction that is placed between the crowd as a flow of people, and the crowd as a collection of individuals. Consequently, automatic analysis of crowds in computer vision is also experiencing a shift in focus from individuals to groups and from small groups to their leaders. In this chapter, we present state-of-the-art solutions to the groups and leaders detection problem, which are able to account for physical factors as well as for sociological evidence observed over short time windows. The presented algorithms are framed as structured learning problems over the set of individual trajectories. However, the way trajectories are exploited to predict the structure of the crowd is not fixed but rather learned from recorded and annotated data, enabling the method to adapt these concepts to different scenarios, densities, cultures, and other unobservable complexities. Additionally, we investigate the relation between leaders and their groups and propose the first attempt to exploit leadership as prior knowledge for group detection.

2017 Capitolo/Saggio

Generative Adversarial Models for People Attribute Recognition in Surveillance

Authors: Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. … (Read full abstract)

In this paper we propose a deep architecture for detecting people attributes (e.g. gender, race, clothing ...) in surveillance contexts. Our proposal explicitly deal with poor resolution and occlusion issues that often occur in surveillance footages by enhancing the images by means of Deep Convolutional Generative Adversarial Networks (DCGAN). Experiments show that by combining both our Generative Reconstruction and Deep Attribute Classification Network we can effectively extract attributes even when resolution is poor and in presence of strong occlusions up to 80% of the whole person figure.

2017 Relazione in Atti di Convegno

Learning to Map Vehicles into Bird's Eye View

Authors: Palazzi, Andrea; Borghi, Guido; Abati, Davide; Calderara, Simone; Cucchiara, Rita

Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is … (Read full abstract)

Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones.

2017 Relazione in Atti di Convegno

Learning Where to Attend Like a Human Driver

Authors: Palazzi, Andrea; Solera, Francesco; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita

Despite the advent of autonomous cars, it's likely - at least in the near future - that human attention will … (Read full abstract)

Despite the advent of autonomous cars, it's likely - at least in the near future - that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the driver's gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the driver's gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the driver's gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one driver's gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention.

2017 Relazione in Atti di Convegno

Signal Processing and Machine Learning for Diplegia Classification

Authors: Bergamini, Luca; Calderara, Simone; Bicocchi, Nicola; Ferrari, Alberto; Vitetta, Giorgio

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Diplegia is one of the most common forms of a broad family of motion disorders named cerebral palsy (CP) affecting … (Read full abstract)

Diplegia is one of the most common forms of a broad family of motion disorders named cerebral palsy (CP) affecting the voluntary muscular system. In recent years, various classification criteria have been proposed for CP, to assist in diagnosis, clinical decision-making and communication. In this manuscript, we divide the spastic forms of CP into 4 other categories according to a previous classification criterion and propose a machine learning approach for automatically classifying patients. Training and validation of our approach are based on data about 200 patients acquired using 19 markers and high frequency VICON cameras in an Italian hospital. Our approach makes use of the latest deep learning techniques. More specifically, it involves a multi-layer perceptron network (MLP), combined with Fourier analysis. An encouraging classification performance is obtained for two of the four classes.

2017 Relazione in Atti di Convegno

Tracking social groups within and across cameras

Authors: Solera, Francesco; Calderara, Simone; Ristani, Ergys; Tomasi, Carlo; Cucchiara, Rita

Published in: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

We propose a method for tracking groups from single and multiple cameras with disjoint fields of view. Our formulation follows … (Read full abstract)

We propose a method for tracking groups from single and multiple cameras with disjoint fields of view. Our formulation follows the tracking-by-detection paradigm where groups are the atomic entities and are linked over time to form long and consistent trajectories. To this end, we formulate the problem as a supervised clustering problem where a Structural SVM classifier learns a similarity measure appropriate for group entities. Multi-camera group tracking is handled inside the framework by adopting an orthogonal feature encoding that allows the classifier to learn inter- and intra-camera feature weights differently. Experiments were carried out on a novel annotated group tracking data set, the DukeMTMC-Groups data set. Since this is the first data set on the problem it comes with the proposal of a suitable evaluation measure. Results of adopting learning for the task are encouraging, scoring a +15% improvement in F1 measure over a non-learning based clustering baseline. To our knowledge this is the first proposal of this kind dealing with multi-camera group tracking.

2017 Articolo su rivista

Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour

Authors: Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri Vulpiani, Michele; Messori, Stefano; Ferri, Nicola

Published in: PLOS ONE

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation … (Read full abstract)

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

2016 Articolo su rivista

Page 10 of 16 • Total publications: 155