Publications

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

The DeepHealth Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous HPC and Cloud Architectures

Authors: Aldinucci, Marco; Atienza, David; Bolelli, Federico; Caballero, Mónica; Colonnelli, Iacopo; Flich, José; Gómez, Jon A.; González, David; Grana, Costantino; Grangetto, Marco; Leo, Simone; López, Pedro; Oniga, Dana; Paredes, Roberto; Pireddu, Luca; Quiñones, Eduardo; Silva, Tatiana; Tartaglione, Enzo; Zapater, Marina

At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress … (Read full abstract)

At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quicky. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools; providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.

2021 Capitolo/Saggio

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Authors: Cancilla, Michele; Canalini, Laura; Bolelli, Federico; Allegretti, Stefano; Carrión, Salvador; Paredes, Roberto; Ander Gómez, Jon; Leo, Simone; Enrico Piras, Marco; Pireddu, Luca; Badouh, Asaf; Marco-Sola, Santiago; Alvarez, Lluc; Moreto, Miquel; Grana, Costantino

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Given the overwhelming impact of machine learning on the last decade, several libraries and frameworks have been developed in recent … (Read full abstract)

Given the overwhelming impact of machine learning on the last decade, several libraries and frameworks have been developed in recent years to simplify the design and training of neural networks, providing array-based programming, automatic differentiation and user-friendly access to hardware accelerators. None of those tools, however, was designed with native and transparent support for Cloud Computing or heterogeneous High-Performance Computing (HPC). The DeepHealth Toolkit is an open source Deep Learning toolkit aimed at boosting productivity of data scientists operating in the medical field by providing a unified framework for the distributed training of neural networks, which is able to leverage hybrid HPC and cloud environments in a transparent way for the user. The toolkit is composed of a Computer Vision library, a Deep Learning library, and a front-end for non-expert users; all of the components are focused on the medical domain, but they are general purpose and can be applied to any other field. In this paper, the principles driving the design of the DeepHealth libraries are described, along with details about the implementation and the interaction between the different elements composing the toolkit. Finally, experiments on common benchmarks prove the efficiency of each separate component and of the DeepHealth Toolkit overall.

2021 Relazione in Atti di Convegno

Training convolutional neural networks to score pneumonia in slaughtered pigs

Authors: Bonicelli, L.; Trachtman, A. R.; Rosamilia, A.; Liuzzo, G.; Hattab, J.; Alcaraz, E. M.; Del Negro, E.; Vincenzi, S.; Dondona, A. C.; Calderara, S.; Marruchella, G.

Published in: ANIMALS

The slaughterhouse can act as a valid checkpoint to estimate the prevalence and the economic impact of diseases in farm … (Read full abstract)

The slaughterhouse can act as a valid checkpoint to estimate the prevalence and the economic impact of diseases in farm animals. At present, scoring lesions is a challenging and time‐consuming activity, which is carried out by veterinarians serving the slaughter chain. Over recent years, artificial intelligence(AI) has gained traction in many fields of research, including livestock production. In particular, AI‐based methods appear able to solve highly repetitive tasks and to consistently analyze large amounts of data, such as those collected by veterinarians during postmortem inspection in high‐throughput slaughterhouses. The present study aims to develop an AI‐based method capable of recognizing and quantifying enzootic pneumonia‐like lesions on digital images captured from slaughtered pigs under routine abattoir conditions. Overall, the data indicate that the AI‐based method proposed herein could properly identify and score enzootic pneumonia‐like lesions without interfering with the slaughter chain routine. According to European legislation, the application of such a method avoids the handling of carcasses and organs, decreasing the risk of microbial contamination, and could provide further alternatives in the field of food hygiene.

2021 Articolo su rivista

TriGAN: image-to-image translation for multi-source domain adaptation

Authors: Roy, S.; Siarohin, A.; Sangineto, E.; Sebe, N.; Ricci, E.

Published in: MACHINE VISION AND APPLICATIONS

Most domain adaptation methods consider the problem of transferring knowledge to the target domain from a single-source dataset. However, in … (Read full abstract)

Most domain adaptation methods consider the problem of transferring knowledge to the target domain from a single-source dataset. However, in practical applications, we typically have access to multiple sources. In this paper we propose the first approach for multi-source domain adaptation (MSDA) based on generative adversarial networks. Our method is inspired by the observation that the appearance of a given image depends on three factors: the domain, the style (characterized in terms of low-level features variations) and the content. For this reason, we propose to project the source image features onto a space where only the dependence from the content is kept, and then re-project this invariant representation onto the pixel space using the target domain and style. In this way, new labeled images can be generated which are used to train a final target classifier. We test our approach using common MSDA benchmarks, showing that it outperforms state-of-the-art methods.

2021 Articolo su rivista

Unifying tensor factorization and tensor nuclear norm approaches for low-rank tensor completion

Authors: Du, S.; Xiao, Q.; Shi, Y.; Cucchiara, R.; Ma, Y.

Published in: NEUROCOMPUTING

Low-rank tensor completion (LRTC) has gained significant attention due to its powerful capability of recovering missing entries. However, it has … (Read full abstract)

Low-rank tensor completion (LRTC) has gained significant attention due to its powerful capability of recovering missing entries. However, it has to repeatedly calculate the time-consuming singular value decomposition (SVD). To address this drawback, we, based on the tensor-tensor product (t-product), propose a new LRTC method-the unified tensor factorization (UTF)-for 3-way tensor completion. We first integrate the tensor factorization (TF) and the tensor nuclear norm (TNN) regularization into a framework that inherits the benefits of both TF and TNN: fast calculation and convex optimization. The conditions under which TF and TNN are equivalent are analyzed. Then, UTF for tensor completion is presented and an efficient iterative updated algorithm based on the alternate direction method of multipliers (ADMM) is used for our UTF optimization, and the solution of the proposed alternate minimization algorithm is also proven to be able to converge to a Karush–Kuhn–Tucker (KKT) point. Finally, numerical experiments on synthetic data completion and image/video inpainting tasks demonstrate the effectiveness of our method over other state-of-the-art tensor completion methods.

2021 Articolo su rivista

Vehicle and method for inspecting a railway line

Authors: Avizzano, Carlo Alberto; Borghi, Guido; Calderara, Simone; Cucchiara, Rita; Fedeli, Eugenio; Ermini, Mirko; Gonnelli, Mirco; Labanca, Giacomo; Frisoli, Antonio; Gasparini, Riccardo; Solazzi, Massimiliano; Tiseni, Luca; Leonardis, Daniele; Satler, Massimo

2021 Brevetto

Video action detection by learning graph-based spatio-temporal interactions

Authors: Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita

Published in: COMPUTER VISION AND IMAGE UNDERSTANDING

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has … (Read full abstract)

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.

2021 Articolo su rivista

Video Frame Synthesis combining Conventional and Event Cameras

Authors: Pini, Stefano; Borghi, Guido; Vezzani, Roberto

Published in: INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output … (Read full abstract)

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional wellknown datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirm the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.

2021 Articolo su rivista

VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations

Authors: Fincato, Matteo; Landi, Federico; Cornia, Marcella; Cesari, Fabio; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to … (Read full abstract)

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesizes the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.

2021 Relazione in Atti di Convegno

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Authors: Cojocaru, Iulian; Cascianelli, Silvia; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

Published in: INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. … (Read full abstract)

Handwritten Text Recognition (HTR) in free-layout pages is a valuable yet challenging task which aims to automatically understand handwritten texts. State-of-the-art approaches in this field usually encode input images with Convolutional Neural Networks, whose kernels are typically defined on a fixed grid and focus on all input pixels independently. However, this is in contrast with the sparse nature of handwritten pages, in which only pixels representing the ink of the writing are useful for the recognition task. Furthermore, the standard convolution operator is not explicitly designed to take into account the great variability in shape, scale, and orientation of handwritten characters. To overcome these limitations, we investigate the use of deformable convolutions for handwriting recognition. This type of convolution deform the convolution kernel according to the content of the neighborhood, and can therefore be more adaptable to geometric variations and other deformations of the text. Experiments conducted on the IAM and RIMES datasets demonstrate that the use of deformable convolutions is a promising direction for the design of novel architectures for handwritten text recognition.

2021 Relazione in Atti di Convegno

Page 37 of 106 • Total publications: 1054