Theofanis Karaletsos

AI Researcher & Executive; Co-founder, Achira.ai

My work focuses on epistemologically grounded AI systems that can reason scientifically to understand and control the physical world. To that end, I conduct research, define research programs, build organizations, and collaborate.

Career. I am a co-founder of Achira.ai, at the interface of statistical physics, AI, and biomolecular simulation for drug discovery. Most recently, I served as Head of AI for Science at the Chan Zuckerberg Initiative, where I built and led the AI for Science organization around virtual cell modeling. Previously: VP AI at Insitro, Staff Research Scientist at Meta, founding member of Uber AI Labs, and researcher at Geometric Intelligence. View more →

Research. My research is driven by a central question: what does it mean for an AI system to have a model of the world that is expressive, calibrated, and useful for decision-making? I work across the full AI stack, from the mathematical foundations of probabilistic inference and deep generative models to scalable algorithms and programming abstractions (I co-created Pyro), to their deployment as world models for scientific discovery in biology and physics. I am drawn to problems where methodology and application genuinely constrain each other. The deeper aim is AI capable of scientific reasoning, in service of human discovery. View publications →

Research Interests

Probabilistic Reasoning & Generative AI

The architecture of knowledge: what it means for AI to know something. Probabilistic reasoning is the formal language for representing knowledge and uncertainty in AI, with approximate Bayesian inference providing a powerful framework for learning and decision-making under uncertainty. This domain spans inference, uncertainty quantification, causal modeling, and the computational methods that make principled reasoning practical at scale. A core epistemological question is how a system represents what it knows and does not know, and how it reasons about interventions and counterfactuals. Compute and data efficiency are first-class properties of intelligent systems: how much a system needs to learn and reason is a direct measure of the quality of its design.

Deep Learning & Representations

Architectures that capture and organize knowledge about the world. What kinds of architectures produce representations that genuinely reflect the underlying structure of complex data? The empirical foundations span transformers and domain-specific models across language, vision, sequences, and scientific data, with inductive bias matched to the geometry and compositional structure of each domain. The central challenge is to learn representations whose latent structure is identifiable, whose factors are disentangled, and whose abstractions transfer across regimes, compose across scales, and support reasoning in new conditions. A frontier of modern AI is building models whose internal representations are rich, scalable, and capable of generalizing beyond the setting in which they were trained.

Scientific World Models

From molecules to cells: AI that understands life. Computational world models built at two scales. At the molecular scale, they take the form of virtual chemistry, molecular world models grounded in statistical physics and biomolecular simulation, encoding molecular interactions into learnable representations. At the cellular scale, virtual cell models capture behavior across single-cell biology, perturbation response, gene expression, and organismal variation. Both are generative systems that simulate virtual experiments, reason about interventions, and produce testable hypotheses.

Scientific Reasoning and Adaptive Intelligence

Scientific reasoning is among the highest expressions of intelligence in AI. It requires more than a world model: a system must identify what it does not know, seek evidence, update its beliefs, and turn understanding into action. This demands adaptive intelligence that can both model and interrogate the world in a continuous loop, learning from each interaction with environments, simulations, and users. Realizing this requires agentic systems that can accumulate knowledge through interaction, revise their beliefs, and use world models to guide exploration and decision-making. The goal is AI that not only models the world, but improves itself through experience, becoming more capable over time.

Recent

May 2026

paper

Publication update: A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model published in Science

May 2026

paper

4 papers accepted at ICML 2026: Calibrated Test-Time Guidance for Bayesian Inference, Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models, Position: Agentic AI Systems Should Be Making Bayes-Consistent Decisions, and Toward Identifiable Sparse Autoencoders (preprint coming soon).

Mar 2026

preprint

New preprint: Species-Specific Small Models for Cell Type Classification Approach the Performance of Large Single Cell Foundation Models

Jan 2026

paper

3 papers accepted at ICLR 2026: Statistical and Structural Identifiability in Representation Learning, Parallel Token Prediction for Language Models, and Learning Explicit Single-Cell Dynamics Using ODE Representations.

Dec 2025

paper

Publication update: BayesRVAT Enhances Rare-Variant Association Testing through Bayesian Aggregation of Functional Annotations published in Genome Research

Dec 2025

paper

Publication update: rbio1: Training Scientific Reasoning LLMs with Biological World Models as Soft Verifiers accepted at NeurIPS 2025 Workshop AI4D3

Nov 2025

preprint

New preprint: A Roadmap for Predictive Human Immunology

Press

Science

Can AI capture the mind-boggling complexity of a human cell?

Oct 2025

TIME

Why AI Companies Are Racing to Build a Virtual Human Cell

Oct 2025

Nature

Can AI build a virtual cell? Scientists race to model life's smallest unit

Jul 2025

Endpoints

Achira raises $33M Nvidia-backed seed round, blending AI and physics in biotech

Feb 2025

View all news & press →

Career

2024 – present

Co-founder

Achira.ai

Achira is building virtual chemistry: AI-based molecular world models grounded in statistical physics and biomolecular simulation for drug discovery.

2024 – 2026

Head of AI for Science

Chan Zuckerberg Initiative

Developed the scientific vision and roadmap for virtual cell modeling, built the AI for Science organization from inception, and drove the research program through to execution, producing foundational results across single-cell biology, genomics, and AI-native scientific reasoning.

Nov 2021 – Nov 2023

VP of AI

Insitro

Joined as VP to expand and lead the ML and data science organization at a company integrating large-scale biological experimentation with AI for drug discovery. Drove generative and causal ML approaches to understanding disease mechanisms in cells and human cohorts, advancing the company's data-driven drug discovery platform.

Aug 2020 – Nov 2021

Technical Lead & Staff Research Scientist

Meta / Facebook

Led the UncertainT team on probabilistic machine learning and uncertainty quantification for large scale neural networks.

2016 – 2020

Co-founder & Research Scientist

Uber AI Labs

Co-founded the lab following Uber's acquisition of Geometric Intelligence. Led research in probabilistic ML, Bayesian deep learning, and co-created Pyro.

– 2016

Research Scientist

Geometric Intelligence

Early-stage AI research startup; acquired by Uber in 2016.

Angel Investments

Windscape.ai

AI for wind farm optimization and energy efficiency.

Studio Atelico

On-device AI engine for lifelike characters in video games.

Reasonable.io

AI training systems for superhuman coding using formal verification.

Stealth biotech

Undisclosed.

Selected Publications View all →

AI for Science

Building the Virtual Cell with Artificial Intelligence

C. Bunne*, Y. Roohani*, Y. Rosen*, ..., T. Karaletsos✶, A. Regev✶, E. Lundberg✶, J. Leskovec✶, S.R. Quake✶

Cell, 2024 * equal contribution ✶ co-seniorpaper arXiv '24

A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model

J.D. Pearce, S.E. Simmonds, G. Mahmoudabadi, L. Krishnan, G. Palla, A. Istrate, A. Tarashansky, B. Nelson, O. Valenzuela, D. Li, S.R. Quake✶, T. Karaletsos✶

Science, 2026 ✶ co-seniorScience '26 bioRxiv '25

VariantFormer: A Hierarchical Transformer Integrating DNA Sequences with Genetic Variations and Regulatory Landscapes for Personalized Gene Expression Prediction

S. Ghosal, Y. Barhomi, T. Ganapathi, A. Krystosik, L. Krishnan, S. Guntury, D. Li, F.P. Casale, T. Karaletsos

bioRxiv, 2025bioRxiv '25

rbio1: Training Scientific Reasoning LLMs with Biological World Models as Soft Verifiers

A. Istrate, F. Milletari, F. Castrotorres, J.M. Tomczak, M. Torkar, D. Li, T. Karaletsos

NeurIPS 2025 Workshop AI4D3bioRxiv '25

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

M. Bereket, T. Karaletsos

NeurIPS 2023paper arXiv '23

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

G. Palla, S. Babu, P. Dibaeinia, J.D. Pearce, D. Li, A.A. Khan, T. Karaletsos✶, J.M. Tomczak✶

NeurIPS 2025 Workshop AI4D3 ✶ co-seniorarXiv '25

Machine Learning

Pyro: Deep Universal Probabilistic Programming

E. Bingham, J.P. Chen, M. Jankowiak, N. Pradhan, T. Karaletsos, et al.

Journal of Machine Learning Research, 2019paper website arXiv '18

Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights

T. Karaletsos✶, T.D. Bui✶

NeurIPS 2020 ✶ equal contributionpaper arXiv '20

Bayesian Unsupervised Representation Learning with Oracle Constraints

T. Karaletsos, S. Belongie, G. Rätsch

ICLR 2016arXiv '15

Generalized Hidden Parameter MDPs: Transferable Model-Based RL in a Handful of Trials

C. Perez, F. Such, T. Karaletsos

AAAI 2020 (oral)paper arXiv '20

Black-Box Coreset Variational Inference

D. Manousakas✶, H. Ritter✶, T. Karaletsos ✶ equal contribution

NeurIPS 2022paper arXiv '22

Variational Control for Guidance in Diffusion Models

K. Pandey, F. Marouf Sofian, F. Draxler, T. Karaletsos, S. Mandt

ICML 2025paper arXiv '25

View all publications → · Google Scholar

Blog Theo-splaining →

Essays on probabilistic reasoning, world models, and AI for science. Writing soon. Theo-splaining →