Whitening for Self-Supervised Representation Learning
Authors: Ermolov, A.; Siarohin, A.; Sangineto, E.; Sebe, N.
Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH
Explore our research publications: papers, articles, and conference proceedings from AImageLab.
Tip: type @ to pick an author and # to pick a keyword.
Authors: Ermolov, A.; Siarohin, A.; Sangineto, E.; Sebe, N.
Published in: PROCEEDINGS OF MACHINE LEARNING RESEARCH
Authors: Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita
Published in: NEURAL NETWORKS
Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
Authors: Nuvoli, Susanna; Spanu, Angela; Fravolini Mario, Luca; Bianconi, Francesco; Cascianelli, Silvia; Madeddu, Giuseppe; Palumbo, Barbara
Published in: MOLECULAR IMAGING AND BIOLOGY
Purpose: To provide reliable and reproducible heart/mediastinum (H/M) ratio cut-off values for parkinsonian disorders using two machine learning techniques, Support Vector Machines (SVM) and Random Forest (RF) classifier, applied to [123I]MIBG cardiac scintigraphy. Procedures: We studied 85 subjects, 50 with idiopathic Parkinson’s disease, 26 with atypical Parkinsonian syndromes (P), and 9 with essential tremor (ET). All patients underwent planar early and delayed cardiac scintigraphy after [123I]MIBG (111 MBq) intravenous injection. Images were evaluated both qualitatively and quantitatively; the latter by the early and delayed H/M ratio obtained from regions of interest (ROIt1 and ROIt2) drawn on planar images. SVM and RF classifiers were finally used to obtain the correct cut-off value. Results: SVM and RF produced excellent classification performances: SVM classifier achieved perfect classification and RF also attained very good accuracy. The better cut-off for H/M value was 1.55 since it remains the same for both ROIt1 and ROIt2. This value allowed to correctly classify PD from P and ET: patients with H/M ratio less than 1.55 were classified as PD while those with values higher than 1.55 were considered as affected by parkinsonism and/or ET. No difference was found when early or late H/M ratio were considered separately thus suggesting that a single early evaluation could be sufficient to obtain the final diagnosis. Conclusions: Our results evidenced that the use of SVM and CT permitted to define the better cut-off value for H/M ratios both in early and in delayed phase thus underlining the role of [123I]MIBG cardiac scintigraphy and the effectiveness of H/M ratio in differentiating PD from other parkinsonism or ET. Moreover, early scans alone could be used for a reliable diagnosis since no difference was found between early and late. Definitely, a larger series of cases is needed to confirm this data.
Authors: D’Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.
Authors: Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita
Published in: MULTIMEDIA TOOLS AND APPLICATIONS
Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve visual items given textual queries and vice-versa. The key idea of many state-of-the-art approaches has been that of learning a joint multi-modal embedding space in which text and images could be projected and compared. Here we take a different approach and reformulate the problem of text-image retrieval as that of learning a translation between the textual and visual domain. Our proposal leverages an end-to-end trainable architecture that can translate text into image features and vice versa and regularizes this mapping with a cycle-consistency criterion. Experimental evaluations for text-to-image and image-to-text retrieval, conducted on small, medium and large-scale datasets show consistent improvements over the baselines, thus confirming the appropriateness of using a cycle-consistent constrain for the text-image matching task.
Authors: Allegretti, Stefano; Bolelli, Federico; Grana, Costantino
Contours extraction, also known as chain-code extraction, is one of the most common algorithms of binary image processing. Despite being the raster way the most cache friendly and, consequently, fast way to scan an image, most commonly used chain-code algorithms perform contours tracing, and therefore tend to be fairly inefficient. In this paper, we took a rarely used algorithm that extracts contours in raster scan, and optimized its execution time through template functions, look-up tables and decision trees, in order to reduce code branches and the average number of load/store operations required. The result is a very fast solution that outspeeds the state-of-the-art contours extraction algorithm implemented in OpenCV, on a collection of real case datasets. Contribution: This paper significantly improves the performance of existing chain-code algorithms, by smartly introducing decision trees to reduce code branches and the average number of load/store operations required.
Authors: Pierdicca, R.; Paolanti, M.; Frontoni, E.; Baraldi, L.
Published in: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
Augmented reality (AR) is the process of using technology to superimpose images, text or sounds on top of what a person can already see. Art galleries and museums started to develop AR applications to increase engagement and provide an entirely new kind of exploration experience. However, the creation of contents results a very time consuming process, thus requiring an ad-hoc development for each painting to be increased. In fact, for the creation of an AR experience on any painting, it is necessary to choose the points of interest, to create digital content and then to develop the application. If this is affordable for the great masterpieces of an art gallery, it would be impracticable for an entire collection. In this context, the idea of this paper is to develop AR applications based on Artificial Intelligence. In particular, automatic captioning techniques are the key core for the implementation of AR application for improving the user experience in front of a painting or an artwork in general. The study has demonstrated the feasibility through a proof of concept application, implemented for hand held devices, and adds to the body of knowledge in mobile AR application as this approach has not been applied in this field before.
Authors: Boccignone, Giuseppe; Conte, Donatello; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
Published in: IEEE ACCESS
This paper presents a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG). There has been a remarkable development of rPPG techniques in recent years, and the publication of several surveys too, yet a sound assessment of their performance has been overlooked at best, whether not undeveloped. The methodological rationale behind the framework we propose is that in order to study, develop and compare new rPPG methods in a principled and reproducible way, the following conditions should be met: 1) a structured pipeline to monitor rPPG algorithms' input, output, and main control parameters; 2) the availability and the use of multiple datasets; and 3) a sound statistical assessment of methods' performance. The proposed framework is instantiated in the form of a Python package named pyVHR (short for Python tool for Virtual Heart Rate), which is made freely available on GitHub (github.com/phuselab/pyVHR). Here, to substantiate our approach, we evaluate eight well-known rPPG methods, through extensive experiments across five public video datasets, and subsequent nonparametric statistical analysis. Surprisingly, performances achieved by the four best methods, namely POS, CHROM, PCA and SSR, are not significantly different from a statistical standpoint higighting the importance of evaluate the different approaches with a statistical assessment.