Andrea Santilli

PhD Student in Computer Science

GLADIA, Sapienza University of Rome

Biography

I’m currently a Research Scientist at Nous Research. I hold a PhD in Computer Science from GLADIA at Sapienza University of Rome — a top-ranked AI university in Europe. My doctoral work focused on building effective, efficient, and reliable Large Language Models. Previously, I was a Research Scientist Intern at Apple, working with the MLR team (see Experiences for the full list).

My current interests are improving language models’ robustness and reliability through uncertainty estimation and mechanistic interpretability (see pub1 and pub2). In the past, I’ve worked on diverse topics like syntax in transformers (see our publication KERMIT), efficient decoding techniques (we introduced Parallel Jacobi Decoding to double decoding speed - now adopted by lmsys), instruction-tuning in LLMs (we introduced instruction tuning, now adopted in every LLM training pipeline), instruction-tuning for the Italian language (see Camoscio). I’ve also worked on other tangential topics like preserving privacy in LLMs, audio LLMs, and multimodal neural databases (check out my publications).

If you’d like to connect or have an exciting project, feel free to reach out on X, LinkedIn, or through the contact form below!

Interests

Large Language Models
Natural Language Processing
Representation Learning

Education

PhD in Computer Science, 2024
Sapienza University of Rome
MSc in Computer Science, 2020
University of Roma Tor Vergata
BSc in Computer Science, 2018
University of Roma Tor Vergata

Experience

Research Scientist

Nous Research

Mar 2025 – Present Remote

Conducting post-training research on LLMs with a focus on enhancing their robustness, reliability, and alignment.

MLR Research Scientist

Apple

Apr 2024 – Oct 2024 Barcelona

Researcher in the Apple MLR team, worked on robustness and reliability of foundation models via uncertainty estimation. Two upcoming publications.

Open Science Researcher

Hugging Face - BigScience

Jun 2021 – Jun 2022 Remote

Researcher at Hugginface’s workshop on large language models. Worked in the prompt-engineering working group, introducing the now popular instruction-tuning training paradigm. Three publications: T0, BLOOM, PromptSource.

Research Engineer

Pi School, School of Artificial Intelligence

Oct 2019 – Dec 2019 Rome

Worked on a European Commission project to promote entrepreneurship and tech transfer in the R&D area (“Started Project”) via NLP-based tools.

Publications

Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces

CVPR 2025

Recent works have shown that, when trained at scale, uni-modal 2D vision and text encoders converge to learned features that share …

Souhail Hadgi, Luca Moschella, Andrea Santilli, Diego Gomez, Qixing Huang, Emanuele Rodolà, Simone Melzi, Maks Ovsjanikov

Accelerating Transformer Inference for Translation via Parallel Decoding

ACL 2023

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network …

Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, Emanuele Rodolà

Multimodal Neural Databases

SIGIR 2023

The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. …

Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, Alon Halevy, Fabrizio Silvestri

Latent Autoregressive Source Separation

AAAI 2023

Autoregressive models have achieved impressive results over a wide range of domains in terms of generation quality and downstream task …

Emilian Polostache, Giorgio Mariani, Michele Mancusi, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

Bloom: A 176b-parameter open-access multilingual language model

arXiv

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language …

BIG-Science contributors including, Andrea Santilli

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their …

BIG-bench contributors including, Andrea Santilli, Antonio Norelli, Emanuele Rodolà, Giambattista Parascandolo, Giorgio Mariani, Luca Moschella, Simone Melzi

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

ACL 2022

PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a …

Stephen Bach, Victor Sanh, Zheng Xin Yong, Albert Webson, Colin Raffel, BIG-Science contributors including, Andrea Santilli

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Multitask Prompted Training Enables Zero-Shot Task Generalization

ICLR 2022

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., …

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, BIG-Science contributors including, Andrea Santilli

Multitask Prompted Training Enables Zero-Shot Task Generalization

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

EMNLP 2020

Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality …

Fabio Massimo Zanzotto, Andrea Santilli, Leonardo Ranaldi, Dario Onorati, Pierfrancesco Tommasino, Francesca Fallucchi

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Unsupervised Source Separation via Bayesian Inference in the Latent Domain

arXiv preprint

State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling …

Michele Mancusi, Emilian Postolache, Giorgio Mariani, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

Unsupervised Source Separation via Bayesian Inference in the Latent Domain

Explanatory Learning: Beyond Empiricism in Neural Networks

arXiv preprint

We introduce Explanatory Learning (EL), a framework to let machines use existing knowledge buried in symbolic sequences – e.g. …

Antonio Norelli, Giorgio Mariani, Luca Moschella, Andrea Santilli, Giambattista Parascandolo, Simone Melzi, Emanuele Rodolà

Explanatory Learning: Beyond Empiricism in Neural Networks

A Kernel-based Approach for Irony and Sarcasm Detection in Italian

EVALITA 2018

This paper describes the UNITOR system that participated to the Irony Detection in Italian Tweets task (IronITA) within the context of …

Andrea Santilli, Danilo Croce, Roberto Basili

A Kernel-based Approach for Irony and Sarcasm Detection in Italian

SyntNN at SemEval-2018 Task 2: is Syntax Useful for Emoji Prediction? Embedding Syntactic Trees in Multi Layer Perceptrons

SemEval2018

In this paper, we present SyntNN as a way to include traditional syntactic models in multilayer neural networks used in the task of …

Andrea Santilli, Fabio Massimo Zanzotto

SyntNN at SemEval-2018 Task 2: is Syntax Useful for Emoji Prediction? Embedding Syntactic Trees in Multi Layer Perceptrons

Browse all

Grants Awarded

Incremental Parallel Inference for Machine Translation

translated Feb 2022

Our project on efficient Machine Translation (MT) was selected as the winner of the category ‘Machine Learning Algorithms For Translation’ among different proposals submitted by world experts and professors (7% acceptance rate). We develop a novel decoding algorithm to speedup autoregressive transformers up to 2x and published the results at ACL 2023. PI: Andrea Santilli. Budget: 20.000€

Multimodal Artificial Intelligence for 3D shape analysis, modeling and applications

ufi Dec 2021

Joint project on multimodal 3D and NLP applications between our research group GLADIA at Sapienza and Maks Ovsjanikov’s group at Ecole Polytechnique. PI: Simone Melzi, Maks Ovsjanikov. Budget: 10.000€