Publications
Full list. Also on Google Scholar. ✶ co-senior * equal contribution
Domains
Topics
AI for Science
Virtual Cell & Cellular Biology
Building the Virtual Cell with Artificial Intelligence
A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model
bioRxiv, 2025 ✶ co-seniorbioRxiv '25
Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models
NeurIPS 2025 Workshop AI4D3 ✶ co-seniorarXiv '25
GREmLN: A Cellular Graph Structure Aware Transcriptomics Foundation Model
bioRxiv, 2025bioRxiv '25
SubCell: Proteome-Aware Vision Foundation Models for Microscopy Capture Single-Cell Biology
bioRxiv, 2024bioRxiv '24
Learning Explicit Single-Cell Dynamics Using ODE Representations
ICLR 2026arXiv '25
Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder
scGenePT: Is Language All You Need for Modeling Single-Cell Perturbations?
bioRxiv, 2024bioRxiv '24
VariantFormer: A Hierarchical Transformer Integrating DNA Sequences with Genetic Variations and Regulatory Landscapes for Personalized Gene Expression Prediction
bioRxiv, 2025bioRxiv '25
RiboDiff: Detecting Changes of mRNA Translation Efficiency from Ribosome Footprints
Bioinformatics, Vol. 33, No. 1, 2017paperbioRxiv '15
A Roadmap for Predictive Human Immunology
arXiv, 2025arXiv '25
MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging
arXiv, 2025arXiv '25
Species-Specific Small Models for Cell Type Classification Approach the Performance of Large Single Cell Foundation Models
bioRxiv, 2026bioRxiv '26
A Path Towards AI-Scale, Interoperable Biological Data
arXiv, 2025arXiv '25
AI: A Transformative Opportunity in Cell Biology
Molecular Biology of the Cell, 2024paper
Deep Learning Analysis on Images of iPSC-derived Motor Neurons Carrying fALS-Genetics Reveals Disease-Relevant Phenotypes
bioRxiv, 2024bioRxiv '24
ShapePheno: Unsupervised Extraction of Shape Phenotypes from Biological Image Collections
Bioinformatics, Vol. 28, No. 7, 2012paper
JigPheno: Semantic Feature Extraction From Biological Images
NIPS Workshop on Machine Learning in Computational Biology, 2010 (oral)talk
Genetics
BayesRVAT Enhances Rare-Variant Association Testing through Bayesian Aggregation of Functional Annotations
Genome Research, 2025paper
An Allelic-Series Rare-Variant Association Test for Candidate-Gene Discovery
The American Journal of Human Genetics, 2023paperbioRxiv '22
EmbedGEM: A Framework to Evaluate the Utility of Embeddings for Genetic Discovery
Bioinformatics Advances, 2024paper
Pitfalls in Performing Genome-Wide Association Studies on Ratio Traits
Human Genetics and Genomics Advances, 2025paper
Clinical & Healthcare AI
Knowledge Transfer with Medical Language Embeddings
SDM Workshop on Data Mining for Medicine and Healthcare / NIPS ML in Healthcare, 2015arXiv '16
Probabilistic Disease Progression Models For Retrospective Analysis Of Cancer Health Records
NIPS Workshop on Machine Learning in Healthcare, 2015
Poisson Matrix Factorization For Joint Modeling Of Genetics and Medical Text
NIPS Workshop on Machine Learning in Healthcare, 2015
Towards an Integrated Dynamic Model of Temporal Structure of Clinical Textnotes and Interactions with Genetic Profiles
NIPS Workshop on Machine Learning for Clinical Data Analysis in Healthcare, 2013
An Empirical Analysis of Topic Modeling for Mining Cancer Clinical Notes
ICDM Workshops, 2013
Scientific Reasoning
rbio1: Training Scientific Reasoning LLMs with Biological World Models as Soft Verifiers
NeurIPS 2025 Workshop AI4D3bioRxiv '25
How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset
FLLM 2025arXiv '25
Drug Discovery & Molecular AI
Compositional Deep Probabilistic Models of DNA-Encoded Libraries
DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
Bayesian Active Drug Discovery
Real-World ML Workshop, ICML 2020paper
Probabilistic & Bayesian ML / Generative AI
Generative Models & Diffusion
Calibrated Test-Time Guidance for Bayesian Inference
arXiv, 2026arXiv '26
Likelihood-Free Inference with Emulator Networks
Bayesian Unsupervised Representation Learning with Oracle Constraints
ICLR 2016arXiv '15
Bayesian Deep Learning & Uncertainty
Position: Agentic AI Systems Should Be Making Bayes-Consistent Decisions
SSRN preprint, 2026SSRN
Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
Probabilistic Meta-Representations of Neural Networks
UAI Workshop on Uncertainty in Deep Learning, 2018arXiv '18
Deep Learning, Representation Learning & RL
Language Models & Sequence Modeling
Transformers for Mixed-type Event Sequences
NeurIPS 2025 (Spotlight)paper
Vision & Representation Learning
Adjusting Pretrained Backbones for Performativity
arXiv, 2024arXiv '24
Contextual Vision Transformers for Robust Representation Learning
arXiv, 2023arXiv '23
Stochastic Aggregation in Graph Neural Networks
arXiv, 2021arXiv '21
Reinforcement Learning & Transfer
Generalized Hidden Parameter MDPs: Transferable Model-Based RL in a Handful of Trials
Efficient Transfer Learning and Online Adaptation with Latent Variable Models for Continuous Control
NeurIPS Workshop on Continual Learning, 2018arXiv '18
Patents
Synthon Embeddings for Modeling DNA-Encoded Libraries
US20250131979A1, Insitro, 2025patent
Predicting Cellular Responses to Perturbations
WO2024238984A3, Insitro, 2025patent
Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
WO2024118605A1, Insitro, 2024patent
Intelligent Regularization of Neural Network Architectures
US20240013049A1, Uber Technologies, 2024patent
Model Based Reinforcement Learning Based on Generalized Hidden Parameter Markov Decision Processes
US20200372410A1, Uber Technologies, 2020patent
Representations of Units in Neural Networks
US20190286970A1, Uber Technologies, 2019patent
Event Detection Using Sensor Data
US20190205785A1, Uber Technologies, 2019patent