Publications by Elisa Ficarra

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Elisa Ficarra

Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers

Authors: Pipoli, Vittorio; Cappelli, Mattia; Palladini, Alessandro; Peluso, Carlo; Lovino, Marta; Ficarra, Elisa

Published in: COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential … (Read full abstract)

Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model.Methods: This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem.Results: Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R-2 evaluation metric compared to 0.74 of Xpresso.Conclusion: The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power. (C) 2022 Elsevier B.V. All rights reserved.

2022 Articolo su rivista

SARS-CoV-2 variants classification and characterization

Authors: Borgato, S.; Bottino, M.; Lovino, M.; Ficarra, E.

Published in: EPIC SERIES IN COMPUTING

As of late 2019, the SARS-CoV-2 virus has spread globally, giving several variants over time. These variants, unfortunately, differ from … (Read full abstract)

As of late 2019, the SARS-CoV-2 virus has spread globally, giving several variants over time. These variants, unfortunately, differ from the original sequence identified in Wuhan, thus risking compromising the efficacy of the vaccines developed. Some software has been released to recognize currently known and newly spread variants. However, some of these tools are not entirely automatic. Some others, instead, do not return a detailed characterization of all the mutations in the samples. Indeed, such characterization can be helpful for biologists to understand the variability between samples. This paper presents a Machine Learning (ML) approach to identifying existing and new variants completely automatically. In addition, a detailed table showing all the alterations and mutations found in the samples is provided in output to the user. SARS-CoV-2 sequences are obtained from the GISAID database, and a list of features is custom designed (e.g., number of mutations in each gene of the virus) to train the algorithm. The recognition of existing variants is performed through a Random Forest classifier while identifying newly spread variants is accomplished by the DBSCAN algorithm. Both Random Forest and DBSCAN techniques demonstrated high precision on a new variant that arose during the drafting of this paper (used only in the testing phase of the algorithm). Therefore, researchers will significantly benefit from the proposed algorithm and the detailed output with the main alterations of the samples. Data availability: the tool is freely available at https://github.com/sofiaborgato/-SARS-CoV-2-variants-classification-and-characterization.

2022 Relazione in Atti di Convegno

A Bayesian approach to Expert Gate Incremental Learning

Authors: Mieuli, V.; Ponzio, F.; Mascolini, A.; Macii, E.; Ficarra, E.; Di Cataldo, S.

Published in: PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS

Incremental learning involves Machine Learning paradigms that dynamically adjust their previous knowledge whenever new training samples emerge. To address the … (Read full abstract)

Incremental learning involves Machine Learning paradigms that dynamically adjust their previous knowledge whenever new training samples emerge. To address the problem of multi-task incremental learning without storing any samples of the previous tasks, the so-called Expert Gate paradigm was proposed, which consists of a Gate and a downstream network of task-specific CNNs, a.k.a. the Experts. The gate forwards the input to a certain expert, based on the decision made by a set of autoencoders. Unfortunately, as a CNN is intrinsically incapable of dealing with inputs of a class it was not specifically trained on, the activation of the wrong expert will invariably end into a classification error. To address this issue, we propose a probabilistic extension of the classic Expert Gate paradigm. Exploiting the prediction uncertainty estimations provided by Bayesian Convolutional Neural Networks (B-CNNs), the proposed paradigm is able to either reduce, or correct at a later stage, wrong decisions of the gate. The goodness of our approach is shown by experimental comparisons with state-of-the-art incremental learning methods.

2021 Relazione in Atti di Convegno

A Novel Proof-of-concept Framework for the Exploitation of ConvNets on Whole Slide Images

Authors: Alessio, Mascolini; Puzzo, S.; Incatasciato, G.; Ponzio, F.; Ficarra, E.; Di Cataldo, S.

Published in: SMART INNOVATION, SYSTEMS AND TECHNOLOGIES

Traditionally, the analysis of histological samples is visually performed by a pathologist, who inspects under the microscope the tissue samples, … (Read full abstract)

Traditionally, the analysis of histological samples is visually performed by a pathologist, who inspects under the microscope the tissue samples, looking for malignancies and anomalies. This visual assessment is both time consuming and highly unreliable due to the subjectivity of the evaluation. Hence, there are growing efforts towards the automatisation of such analysis, oriented to the development of computer-aided diagnostic tools, with a ever-growing role of techniques based on deep learning. In this work, we analyze some of the issues commonly associated with providing deep learning based techniques to medical professionals. We thus introduce a tool, aimed at both researchers and medical professionals, which simplifies and accelerates the training and exploitation of such models. The outcome of the tool is an attention map representing cancer probability distribution on top of the Whole Slide Image, driving the pathologist through a faster and more accurate diagnostic procedure.

2021 Capitolo/Saggio

Exploration of Convolutional Neural Network models for source code classification

Authors: Barchi, F.; Parisi, E.; Urgese, G.; Ficarra, E.; Acquaviva, A.

Published in: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

The application of Artificial Intelligence is becoming common in many engineering fields. Among them, one of the newest and rapidly … (Read full abstract)

The application of Artificial Intelligence is becoming common in many engineering fields. Among them, one of the newest and rapidly evolving is software generation, where AI can be used to automatically optimise the implementation of an algorithm for a given computing platform. In particular, Deep Learning technologies can be used to the decide how to allocate pieces of code to hardware platforms with multiple cores and accelerators, that are common in high performance and edge computing applications. In this work, we explore the use of Convolutional Neural Networks (CNN)s to analyse the application source code and decide the best compute unit to minimise the execution time. We demonstrate that CNN models can be successfully applied to source code classification, providing higher accuracy with consistently reduced learning time with respect to state-of-the-art methods. Moreover, we show the robustness of the method with respect to source code pre-processing, compiler options and hyper-parameters selection.

2021 Articolo su rivista

FUNGI: FUsioN Gene Integration toolset

Authors: Cervera, Alejandra; Rausio, Heidi; Kähkönen, Tiia; Andersson, Noora; Partel, Gabriele; Rantanen, Ville; Paciello, Giulia; Ficarra, Elisa; Hynninen, Johanna; Hietanen, Sakari; Carpén, Olli; Lehtonen, Rainer; Hautaniemi, Sampsa; Huhtinen, Kaisa

Published in: BIOINFORMATICS

2021 Articolo su rivista

Optimizing Quality Inspection and Control in Powder Bed Metal Additive Manufacturing: Challenges and Research Directions

Authors: Di Cataldo, Santa; Vinco, Sara; Urgese, Gianvito; Calignano, Flaviana; Ficarra, Elisa; Macii, Alberto; Macii, Enrico

Published in: PROCEEDINGS OF THE IEEE

2021 Articolo su rivista

PhyliCS: a Python library to explore scCNA data and quantify spatial tumor heterogeneity

Authors: Montemurro, M.; Grassi, E.; Pizzino, C. G.; Bertotti, A.; Ficarra, E.; Urgese, G.

Published in: BMC BIOINFORMATICS

Background: Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This … (Read full abstract)

Background: Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. Results: We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. Conclusions: PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.

2021 Articolo su rivista

BioSeqZip: a collapser of NGS redundant reads for the optimisation of sequence analysis

Authors: Urgese, Gianvito; Parisi, Emanuele; Scicolone, Orazio; Di Cataldo, Santa; Ficarra, Elisa

Published in: BIOINFORMATICS

Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms … (Read full abstract)

Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the two-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times. Method: BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single or multi-sample data-sets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state. Results: Our extensive experiments on RNA-seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billions of reads into 963 millions of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least. Availability: BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip Supplementary information: Supplementary data are available at Bioinformatics online.

2020 Articolo su rivista

Cytoarchitectural analysis of the neuron-to-glia association in the dorsal root ganglia of normal and diabetic mice

Authors: Ciglieri, Elisa; Vacca, Maurizia; Ferrini, Francesco; Atteya, Mona A; Aimar, Patrizia; Ficarra, Elisa; Di Cataldo, Santa; Merighi, Adalberto; Salio, Chiara

Published in: JOURNAL OF ANATOMY

Dorsal root ganglia (DRGs) host the somata of sensory neurons which convey information from the periphery to the central nervous … (Read full abstract)

Dorsal root ganglia (DRGs) host the somata of sensory neurons which convey information from the periphery to the central nervous system. These neurons have heterogeneous size and neurochemistry, and those of small-to-medium size, which play an important role in nociception, form two distinct subpopulations based on the presence (peptidergic) or absence (non-peptidergic) of transmitter neuropeptides. Few investigations have so far addressed the spatial relationship between neurochemically different subpopulations of DRG neurons and glia. We used a whole-mount mouse lumbar DRG preparation, confocal microscopy and computer-aided 3D analysis to unveil that IB4+ non-peptidergic neurons form small clusters of 4.7 ± 0.26 cells, differently from CGRP+ peptidergic neurons that are, for the most, isolated (1.89 ± 0.11 cells). Both subpopulations of neurons are ensheathed by a thin layer of satellite glial cells (SGCs) that can be observed after immunolabeling with the specific marker glutamine synthetase (GS). Notably, at the ultrastructural level we observed that this glial layer was discontinuous, as there were patches of direct contact between the membranes of two adjacent IB4+ neurons. To test whether this cytoarchitectonic organization was modified in the diabetic neuropathy, one of the most devastating sensory pathologies, mice were made diabetic by streptozotocin (STZ). In diabetic animals, cluster organization of the IB4+ non-peptidergic neurons was maintained, but the neuro-glial relationship was altered, as STZ treatment caused a statistically significant increase of GS staining around CGRP+ neurons but a reduction around IB4+ neurons. Ultrastructural analysis unveiled that SGC coverage was increased at the interface between IB4+ cluster-forming neurons in diabetic mice, with a 50% reduction in the points of direct contacts between cells. These observations demonstrate the existence of a structural plasticity of the DRG cytoarchitecture in response to STZ.

2020 Articolo su rivista

Page 5 of 16 • Total publications: 156