
I am a Research Scientist at Luma AI, where I work on unified understanding and generation models.
Previously, I was a Senior Applied Scientist at Amazon AGI, where I worked on the Amazon Nova family of generative models and Amazon Titan Image Generator.
I received my PhD in Computer Vision & Deep Learning from the University of Cambridge where I focused on 3D reconstruction of human and animal categories.
Outside the lab, I am a pianist, singer, theatre-goer and somewhat reluctant runner.
Research Scientist, Luma AI
San Francisco, California, United States
My full publication list is available on Google Scholar.
A unified understanding and generation model that combines reasoning with visual imagination in a single autoregressive transformer. Enables structured reasoning during image synthesis, fine-grained visual understanding, and temporally consistent scene generation.
A unified multimodal model that processes text, image, video, and audio inputs and generates text and images. Supports up to 1M token contexts, enabling analysis of extensive codebases, long documents, and hours of video in a single prompt.
A family of foundation models spanning video generation (Nova Reel), image generation (Nova Canvas), and multimodal understanding (Nova Pro, Lite, Micro) across text, image, video, and document processing.
A method for merging text-to-image diffusion models trained on sharded data. Enables training-free continual learning and unlearning with no extra memory or inference cost, achieving up to 30% improvement over a paragon model.
A generative model for creating and editing high-quality images from natural language prompts, with built-in safeguards and invisible watermarking.
Recovering sets of plausible 3D human reconstructions from single and partially occluded views, using a best-of-M loss with a normalizing flow-based quantization scheme.
A fully automatic system for 3D dog reconstruction from weak 2D supervision, using SMBLD -- a deformable template model with a shape prior refined via expectation maximization. Also introduces the StanfordExtra dataset.
Recovering 3D shape and motion of quadruped animals from monocular video. Trained on synthetic silhouettes to overcome limited animal motion capture data and generalize to real-world sequences.
Improving 3D body shape estimation for diverse body types via new loss functions and a test-time optimization routine for parametric human reconstruction pipelines.
Using deep implicit functions to reconstruct large-scale driving scenes, with LiDAR-approximated occupancy labels to avoid requiring watertight training meshes.
A virtual try-on method that generates a realistic digital model from an image and applies clothing using a layer mask. Built as part of Amazon Style, an ML-powered physical fashion store.
Behaviour and key point predictions at ~15fps by a deep learning architecture we refer to as RodentNet. Results shown on validation sequences from the SCORHE dataset.
Computer vision application for verifying regulatory gowning procedures in collaboration with GlaxoSmithKline. Won departmental award for best third year dissertation at the University of Warwick.