Publications by Rita Cucchiara

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Rita Cucchiara

Video action detection by learning graph-based spatio-temporal interactions

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has … (Read full abstract)

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.

2021 Articolo su rivista

VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations

Authors: Fincato, Matteo; Landi, Federico; Cornia, Marcella; Cesari, Fabio; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to … (Read full abstract)

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesizes the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.

2021 Relazione in Atti di Convegno

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Authors: Cojocaru, Iulian; Cascianelli, Silvia; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. State-of-the-art approaches in this field usually encode input images with Convolutional Neural Networks, whose kernels are typically defined on a fixed grid and focus on all input pixels independently. However, this is in contrast with the sparse nature of handwritten pages, in which only pixels representing the ink of the writing are useful for the recognition task. Furthermore, the standard convolution operator is not explicitly designed to take into account the great variability in shape, scale, and orientation of handwritten characters. To overcome these limitations, we investigate the use of deformable convolutions for handwriting recognition. This type of convolution deform the convolution kernel according to the content of the neighborhood, and can therefore be more adaptable to geometric variations and other deformations of the text. Experiments conducted on the IAM and RIMES datasets demonstrate that the use of deformable convolutions is a promising direction for the design of novel architectures for handwritten text recognition.

2021 Relazione in Atti di Convegno

Working Memory Connections for LSTM

Authors: Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita

Published in: NEURAL NETWORKS

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when … (Read full abstract)

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

2021 Articolo su rivista

25th international conference on pattern recognition

Authors: Cucchiara, R.; Bimbo, A. D.; Sclaroff, S.

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2020 Curatela

A Transformer-Based Network for Dynamic Hand Gesture Recognition

Authors: D’Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their … (Read full abstract)

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.

2020 Relazione in Atti di Convegno

A Unified Cycle-Consistent Neural Model for Text and Image Retrieval

Authors: Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita

Published in: MULTIMEDIA TOOLS AND APPLICATIONS

Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve … (Read full abstract)

Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve visual items given textual queries and vice-versa. The key idea of many state-of-the-art approaches has been that of learning a joint multi-modal embedding space in which text and images could be projected and compared. Here we take a different approach and reformulate the problem of text-image retrieval as that of learning a translation between the textual and visual domain. Our proposal leverages an end-to-end trainable architecture that can translate text into image features and vice versa and regularizes this mapping with a cycle-consistency criterion. Experimental evaluations for text-to-image and image-to-text retrieval, conducted on small, medium and large-scale datasets show consistent improvements over the baselines, thus confirming the appropriateness of using a cycle-consistent constrain for the text-image matching task.

2020 Articolo su rivista

Anomaly Detection for Vision-based Railway Inspection

Authors: Gasparini, Riccardo; Pini, Stefano; Borghi, Guido; Scaglione, Giuseppe; Calderara, Simone; Fedeli, Eugenio; Cucchiara, Rita

Published in: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE

2020 Relazione in Atti di Convegno

Anomaly Detection, Localization and Classification for Railway Inspection

Authors: Gasparini, Riccardo; D'Eusanio, Andrea; Borghi, Guido; Pini, Stefano; Scaglione, Giuseppe; Calderara, Simone; Fedeli, Eugenio; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

2020 Relazione in Atti di Convegno

Baracca: a Multimodal Dataset for Anthropometric Measurements in Automotive

Authors: Pini, Stefano; D'Eusanio, Andrea; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita

2020 Relazione in Atti di Convegno

Page 17 of 51 • Total publications: 505