DropCode seeks a computational protein scientist to lead ML and protein modelling, owning the sequence–function stack from processing deep mutational scanning data to training and deploying generative models that steer the next experiments. This foundational role requires building the ML infrastructure from the ground up and delivering a closed-loop active learning approach that speeds the design build test learn cycle, in close collaboration with biology and engineering. In your application, highlight breakthroughs in protein design or sequence–function modelling, work with protein language models, familiarity with deep mutational scanning and epistasis, and a track record of active learning delivering pipelines in early-stage R&D. PhD depth preferred; a passion for turning proprietary functional data into scalable, impactful models is essential.
About DropCode
DropCode is building the data engine for protein function. Starting with enzymes, we use our patented droplet microfluidics platform to capture exponentially more data on protein function than conventional methods, linking genotype to phenotype at per-droplet resolution, making every droplet a micro test tube. This data fuels machine learning models that learn in ever greater detail how sequence determines function. Our wedge is enzyme engineering for biocatalysis and industrial biotechnology, but our ambition is to make DropCode the definitive platform for protein function prediction.
We are Cambridge PhDs with deep expertise across microfluidics, biochemistry, machine learning, optics, and engineering. We believe the language of biology is machine learning, and that the fastest path to transformative models is not just better AI, it is better inputs.
The Role
We are looking for an exceptional computational scientist to lead our machine learning and protein modelling efforts. You will own the sequence–function modelling stack end to end: from processing large-scale functional datasets generated in our microfluidic runs, to training and deploying generative and predictive models that drive the next round of experiments. You will work in a tight loop with the biology and engineering teams, turning quantitative phenotypic data into closed-loop active learning systems that continuously improve our models.
This is a foundational role. You will be building the ML infrastructure from the ground up, and your architectural choices will shape DropCode for years.
What You'll Do
Design and train sequence–function models on deep mutational scanning datasets and high-throughput screening outputs from our microfluidics platform
Develop and iterate generative models (transformers, diffusion models, or equivalent) for enzyme sequence design and optimisation
Build closed-loop active learning pipelines that couple ML predictions with experimental design, shortening the design–build–test–learn cycle
Model protein fitness landscapes, including epistatic interactions, to navigate high-dimensional sequence space intelligently
Partner with the biology team to define the data collection strategy and ensure experimental outputs are ML-ready
Establish best practices for model evaluation, benchmarking, and uncertainty quantification in the context of functional prediction
Own and grow the computational stack as the team scales
What We're Looking For
Demonstrated contribution to a meaningful breakthrough in protein design or sequence–function modelling
Proven hands-on experience with protein language models or generative models applied to biological sequences
Deep familiarity with deep mutational scanning, large-scale functional datasets, or comparable high-throughput data modalities
Strong understanding of fitness landscape theory and epistasis in the context of sequence optimisation
Experience building active learning or Bayesian optimisation systems that integrate ML with experimental feedback
Excitement at the prospect of working with large volumes of proprietary, quantitative functional data unavailable anywhere else
Comfortable operating in the ambiguity of early-stage R&D and motivated by the challenge of building foundational infrastructure
PhD in machine learning, computational biology, biophysics, or a related field (or equivalent depth of experience)
Who You Are
You are frustrated by the slow, artisanal nature of current biological engineering and believe the field needs a step-change in data scale and quality. You think quantitatively, treat every experiment as a data point for a model, and have strong opinions about what it takes to build the best protein design systems in the world. You thrive in collaborative, fast-moving environments where the pace is set by scientific urgency, not process.