Publications by Elisa Ficarra

Explore our research publications: papers, articles, and conference proceedings from AImageLab.

Tip: type @ to pick an author and # to pick a keyword.

Active filters (Clear): Author: Elisa Ficarra

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation

Authors: Saporita, Alessia; Pipoli, Vittorio; Bolelli, Federico; Baraldi, Lorenzo; Acquaviva, Andrea; Ficarra, Elisa

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms … (Read full abstract)

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a variety of vision–language tasks. However, their internal working mechanisms remain largely underexplored. In his work, we introduce FG-TRACER, a framework designed to analyze the information flow between visual and textual modalities in MLLMs in free-form generation. Notably, our numerically stabilized computational method enables the first systematic analysis of multimodal information flow in underexplored domains such as image captioning and chain-of-thought (CoT) reasoning. We apply FG-TRACER to two state-of-the-art MLLMs—LLaMA 3.2-Vision and LLaVA 1.5—across three vision–language benchmarks—TextVQA, COCO 2014, and ChartQA—and we conduct a word-level analysis of multimodal integration. Our findings uncover distinct patterns of multimodal fusion across models and tasks, demonstrating that fusion dynamics are both model- and task-dependent. Overall, FG-TRACER offers a robust methodology for probing the internal mechanisms of MLLMs in free-form settings, providing new insights into their multimodal reasoning strategies. Our source code is publicly available at https://anonymous.4open.science/r/FG-TRACER-CB5A/.

2026

A Benchmark Study of Gene Fusion Prioritization Tools

Authors: Miccolis, F.; Lovino, M.; Ficarra, E.

Published in: LECTURE NOTES IN COMPUTER SCIENCE

A gene fusion is a chromosomal aberration from juxtaposing separate genes. Since some gene fusions are involved in tumorigenesis, proper … (Read full abstract)

A gene fusion is a chromosomal aberration from juxtaposing separate genes. Since some gene fusions are involved in tumorigenesis, proper gene fusion investigation and analysis are crucial in the literature. After DNA/RNA sample extraction, detecting gene fusions requires first gene fusion detection tools, which usually provide many false positives. Given the high experimental costs in wet lab validation of a single fusion, gene fusion prioritization tools were made available over the years to significantly narrow down candidate gene fusions for validation (e.g., Oncofuse, Pegasus, DEEPrior, ChimerDriver). Although a few reviews about gene fusion detection tools are available, a benchmark on prioritization tools is not available yet in the literature. The aim of this paper is twofold: 1. to provide a curated dataset for a fair gene fusion prioritization tool evaluation. 2. to develop a proper comparison based on time, resources, and tool confidence on selected gene fusions. Based on this benchmark, it can be stated that ChimerDriver is the most reliable tool for prioritizing oncogenic fusions.

2025 Relazione in Atti di Convegno

Accurate 3D Medical Image Segmentation with Mambas

Authors: Lumetti, Luca; Pipoli, Vittorio; Marchesini, Kevin; Ficarra, Elisa; Grana, Costantino; Bolelli, Federico

Published in: PROCEEDINGS INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local … (Read full abstract)

CNNs and Transformer-based architectures are recently dominating the field of 3D medical segmentation. While CNNs face limitations in the local receptive field, Transformers require significant memory and data, making them less suitable for analyzing large 3D medical volumes. Consequently, fully convolutional network models like U-Net are still leading the 3D segmentation scenario. Although efforts have been made to reduce the Transformers computational complexity, such optimized models still struggle with content-based reasoning. This paper examines Mamba, a Recurrent Neural Network (RNN) based on State Space Models (SSMs), which achieves linear complexity and has outperformed Transformers in long-sequence tasks. Specifically, we assess Mamba’s performance in 3D medical segmentation using three widely recognized and commonly employed datasets and propose architectural enhancements to improve its segmentation effectiveness by mitigating the primary shortcomings of existing Mamba-based solutions.

2025 Relazione in Atti di Convegno

Context-guided Prompt Learning for Continual WSI Classification

Authors: Corso, Giulia; Miccolis, Francesca; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like … (Read full abstract)

Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like the gigapixel scale of WSIs and the lack of pixel-level annotations, privacy restrictions further complicate their analysis. For instance, in a hospital network, different facilities need to collaborate on WSI analysis without the possibility of sharing sensitive patient data. A more practical and secure approach involves sharing models capable of continual adaptation to new data. However, without proper measures, catastrophic forgetting can occur. Traditional continual learning techniques rely on storing previous data, which violates privacy restrictions. To address this issue, this paper introduces Context Optimization Multiple Instance Learning (CooMIL), a rehearsal-free continual learning framework explicitly designed for WSI analysis. It employs a WSI-specific prompt learning procedure to adapt classification models across tasks, efficiently preventing catastrophic forgetting. Evaluated on four public WSI datasets from TCGA projects, our model significantly outperforms state-of-the-art methods within the WSI-based continual learning framework. The source code is available at https://github.com/FrancescaMiccolis/CooMIL.

2025 Relazione in Atti di Convegno

Deep Learning for Classifying Anti-Shigella Opsono- Phagocytosis-Promoting Monoclonal Antibodies

Authors: Pianfetti, Elena; Cardamone, Dario; Roscioli, Emanuele; Ciano, Giorgio; Maccari, Giuseppe; Sala, Claudia; Micoli, Francesca; Rappuoli, Rino; Medini, Duccio; Ficarra, Elisa

Published in: LECTURE NOTES IN COMPUTER SCIENCE

Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates … (Read full abstract)

Shigellosis is an acute small intestine infection caused by different species of Shigella. Worldwide, the emergence of antibiotic-resistant strains aggravates the impact of Shigella infections. In this context, human monoclonal antibodies (mAbs) offer an alternative to traditional antimicrobials. However, identifying a potent candidate mAb requires intense and meticulous efforts. Here, we show the potential of Deep Learning to screen mAbs rapidly. We measured the phagocytosis-promoting activity of mAbs by analyzing images collected with a high-throughput and high-content confocal fluorescence microscope. We acquired images of S. sonnei and S. flexneri infecting THP-1-derived macrophages and evaluated the effect of different mAbs and of a wide selection of Deep Learning tools. We found that our model can generalize on strains and mAbs not encountered in training. Importantly, our approach enables the screening and characterization of multiple anti-Shigella mAbs at the same time, facilitating the identification of potent antibacterial candidates. Our code is available on the GitHub repository vOPA_Shigella.

2025 Relazione in Atti di Convegno

Foundation Models for Hepatocellular Carcinoma: Challenges in Generalization under Data Scarcity

Authors: Corso, Giulia; Lovino, Marta; Akpinar, Reha; Di Tommaso, Luca; Ficarra, Elisa; Ranzini, Marta

2025 Relazione in Atti di Convegno

IM-Fuse: A Mamba-based Fusion Block for Brain Tumor Segmentation with Incomplete Modalities

Authors: Pipoli, Vittorio; Saporita, Alessia; Marchesini, Kevin; Grana, Costantino; Ficarra, Elisa; Bolelli, Federico

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities … (Read full abstract)

Brain tumor segmentation is a crucial task in medical imaging that involves the integrated modeling of four distinct imaging modalities to identify tumor regions accurately. Unfortunately, in real-life scenarios, the full availability of such four modalities is often violated due to scanning cost, time, and patient condition. Consequently, several deep learning models have been developed to address the challenge of brain tumor segmentation under conditions of missing imaging modalities. However, the majority of these models have been evaluated using the 2018 version of the BraTS dataset, which comprises only $285$ volumes. In this study, we reproduce and extensively analyze the most relevant models using BraTS2023, which includes 1,250 volumes, thereby providing a more comprehensive and reliable comparison of their performance. Furthermore, we propose and evaluate the adoption of Mamba as an alternative fusion mechanism for brain tumor segmentation in the presence of missing modalities. Experimental results demonstrate that transformer-based architectures achieve leading performance on BraTS2023, outperforming purely convolutional models that were instead superior in BraTS2018. Meanwhile, the proposed Mamba-based architecture exhibits promising performance in comparison to state-of-the-art models, competing and even outperforming transformers. The source code of the proposed approach is publicly released alongside the benchmark developed for the evaluation: https://github.com/AImageLab-zip/IM-Fuse.

2025 Relazione in Atti di Convegno

Impact of Embedding Methods on Weakly Supervised Lymph Node Classification with MIL on the Camelyon16 Dataset

Authors: Miccolis, Francesca; Riccomi, Olivia; Lovino, Marta; Ficarra, Elisa

2025 Relazione in Atti di Convegno

Location Matters: Harnessing Spatial Information to Enhance the Segmentation of the Inferior Alveolar Canal in CBCTs

Authors: Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino

Published in: LECTURE NOTES IN COMPUTER SCIENCE

The segmentation of the Inferior Alveolar Canal (IAC) plays a central role in maxillofacial surgery, drawing significant attention in the … (Read full abstract)

The segmentation of the Inferior Alveolar Canal (IAC) plays a central role in maxillofacial surgery, drawing significant attention in the current research. Because of their outstanding results, deep learning methods are widely adopted in the segmentation of 3D medical volumes, including the IAC in Cone Beam Computed Tomography (CBCT) data. One of the main challenges when segmenting large volumes, including those obtained through CBCT scans, arises from the use of patch-based techniques, mandatory to fit memory constraints. Such training approaches compromise neural network performance due to a reduction in the global contextual information. Performance degradation is prominently evident when the target objects are small with respect to the background, as it happens with the inferior alveolar nerve that develops across the mandible, but involves only a few voxels of the entire scan. In order to target this issue and push state-of-the-art performance in the segmentation of the IAC, we propose an innovative approach that exploits spatial information of extracted patches and integrates it into a Transformer architecture. By incorporating prior knowledge about patch location, our model improves state of the art by ~2 points on the Dice score when integrated with the standard U-Net architecture. The source code of our proposal is publicly released.

2025 Relazione in Atti di Convegno

MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models

Authors: Pipoli, Vittorio; Saporita, Alessia; Bolelli, Federico; Cornia, Marcella; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita; Ficarra, Elisa

Recently, Multimodal Large Language Models (MLLMs) have emerged as a leading framework for enhancing the ability of Large Language Models … (Read full abstract)

Recently, Multimodal Large Language Models (MLLMs) have emerged as a leading framework for enhancing the ability of Large Language Models (LLMs) to interpret non-linguistic modalities. Despite their impressive capabilities, the robustness of MLLMs under conditions where one or more modalities are missing remains largely unexplored. In this paper, we investigate the extent to which MLLMs can maintain performance when faced with missing modality inputs. Moreover, we propose a novel framework to mitigate the aforementioned issue called Retrieval-Augmented Generation for missing modalities (MissRAG). It consists of a novel multimodal RAG technique alongside a tailored prompt engineering strategy designed to enhance model robustness by mitigating the impact of absent modalities while preventing the burden of additional instruction tuning. To demonstrate the effectiveness of our techniques, we conducted comprehensive evaluations across five diverse datasets, covering tasks such as audio-visual question answering, audio-visual captioning, and multimodal sentiment analysis.

2025 Relazione in Atti di Convegno
2 3 »

Page 1 of 16 • Total publications: 156