Andrea Santilli

Andrea Santilli

PhD Student in Computer Science

GLADIA, Sapienza University of Rome

Biography

I’m on the job market, please reach out if you have an interesting position/project

I’m a soon-to-graduate Ph.D. student in Computer Science at GLADIA, Sapienza University of Rome (top-ranking university in AI in Europe) - I’ve just submitted my thesis on “Effective, Efficient and Reliable Large Language Models” (soon available after my defense). Previously, I was a research scientist intern at Apple in the MLR team (see section Experiences for a full list).

My current interests are improving language models’ robustness and reliability through uncertainty estimation and mechanistic interpretability (see pub1 and pub2). In the past, I’ve worked on diverse topics like syntax in transformers (see our publication KERMIT), efficient decoding techniques (we introduced Parallel Jacobi Decoding to double decoding speed - now adopted by lmsys), instruction-tuning in LLMs (we introduced instruction tuning, now adopted in every LLM training pipeline), instruction-tuning for the Italian language (see Camoscio). I’ve also worked on other tangential topics like preserving privacy in LLMs, audio LLMs, and multimodal neural databases (check out my publications).

If you’d like to connect or have an exciting project, feel free to reach out on X, LinkedIn, or through the contact form below!

Interests
  • Natural Language Processing
  • Representation Learning
  • Machine Intelligence
Education
  • PhD in Computer Science, 2024

    Sapienza University of Rome

  • MSc in Computer Science, 2020

    University of Roma Tor Vergata

  • BSc in Computer Science, 2018

    University of Roma Tor Vergata

Experience

 
 
 
 
 
Apple
MLR Research Scientist
Apr 2024 – Oct 2024 Barcelona
Researcher in the Apple MLR team, worked on robustness and reliability of foundation models via uncertainty estimation. Two upcoming publications.
 
 
 
 
 
Hugging Face - BigScience
Open Science Researcher
Jun 2021 – Jun 2022 Remote
Researcher at Hugginface’s workshop on large language models. Worked in the prompt-engineering working group, introducing the now popular instruction-tuning training paradigm. Three publications: T0, BLOOM, PromptSource.
 
 
 
 
 
Pi School, School of Artificial Intelligence
Research Engineer
Oct 2019 – Dec 2019 Rome
Worked on a European Commission project to promote entrepreneurship and tech transfer in the R&D area (“Started Project”) via NLP-based tools.

Publications

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their …
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Grants Awarded

Our project on efficient Machine Translation (MT) was selected as the winner of the category ‘Machine Learning Algorithms For Translation’ among different proposals submitted by world experts and professors (7% acceptance rate). We develop a novel decoding algorithm to speedup autoregressive transformers up to 2x and published the results at ACL 2023. PI: Andrea Santilli. Budget: 20.000€
ufi
Multimodal Artificial Intelligence for 3D shape analysis, modeling and applications
Joint project on multimodal 3D and NLP applications between our research group GLADIA at Sapienza and Maks Ovsjanikov’s group at Ecole Polytechnique. PI: Simone Melzi, Maks Ovsjanikov. Budget: 10.000€

Contact