Site Reliability Engineer

Adaptyv

1 day ago

Full-time

On-site

Lausanne, VD

JobsCloseBy Editorial Insights

Adaptyv is hiring a Site Reliability Engineer in Lausanne to own the health of the LabOS and customer platform that run a 24/7 automated biology lab. You’ll keep APIs, edge functions, databases, data pipelines, and hardware integrations reliable, with observability, alerting, and automation that prevent firefighting. They’re looking for strong production ownership, Python or TypeScript software skills, and real SRE experience with on-call, incident response, and blameless postmortems, plus fluency in metrics, logs, traces, Grafana, Prometheus, and Loki. Bonus for hardware, IoT, or lab familiarity; curiosity about biology is a plus. Rolling applications; tailor your resume to show end-to-end impact and collaboration with software and lab teams.

Adaptyv is building an automated lab thats let AI agents run biology experiments.

We're entering the era of agentic science where AI models can now design novel proteins, propose hypotheses, and iterate on experimental results. But they can't run the experiments themselves - that's still a manual, months-long process. We're building the infrastructure that gives AI agents access to the physical world.

We are one of the fastest growing biotech companies, trusted by leading biopharmas, frontier AI labs, and the techbio companies pushing the field forward. This is a rare chance to help advance some of the most important work happening in biotech today.

Our automated lab is powered by a deep software + hardware stack: lab instruments worth millions of USD reverse-engineered into API-controllable hardware, dozens of devices orchestrated through complex workflows, full observability on everything that happens in the lab, processing pipelines for messy physical-world data, and AI systems that troubleshoot production results and accelerate assay development.

We’re growing rapidly and are hiring for talented people to scale and support the massive demand for AI-driven wet lab experimentation.

Adaptyv runs a physical lab through software. When our systems go down it isn't a page that fails to load — it's a liquid handler that stops mid-run, an instrument that loses its booking, or a customer's experiment that stalls with their protein already in a plate. Reliability here has physical-world consequences, and we need someone who owns it.

You'll be responsible for the health of the entire stack that keeps LabOS and our customer-facing platform running: the APIs, edge functions, databases, processing pipelines, job queues, and the integrations that connect our software to millions of dollars of lab hardware. You'll build the observability, alerting, and automation that let a small team run a 24/7 automated lab without living in firefighting mode — and when something does break, you're the person who makes sure it gets caught early, fixed fast, and never happens the same way twice.

In a given week, that might mean:

Building observability across the stack — metrics, logs, traces, and dashboards (Grafana) that make the state of the lab and the platform legible at a glance
Defining SLOs for the services that matter, instrumenting them, and setting up alerting that pages on real problems and stays quiet otherwise
Hardening our data and processing pipelines so messy physical-world data doesn't silently corrupt results or stall experiments
Owning incident response: triage, mitigation, and blameless postmortems that turn every outage into a permanent fix
Improving deploy safety and rollback across our services (Vercel, Supabase, Modal, edge functions) so shipping fast doesn't mean shipping fragile
Automating away toil — the manual recovery steps, the babysitting, the "just restart it" runbooks — so the lab runs itself as much as possible
Partnering with the software and lab-automation teams to make reliability a property of the system rather than an afterthought

What we're looking for

Strong systems and software engineering. You write production code (Python and/or TypeScript) and you're comfortable owning infrastructure, not just configuring it.
Real SRE / production ownership experience. You've run services that people depend on, carried a pager, and built the observability and automation that made on-call survivable.
Observability fluency. Metrics, logging, tracing, dashboards, alerting — you know how to make a complex distributed system legible, and you've used tools like Grafana / Prometheus / Loki (or equivalents) in anger.
Incident instinct. You stay calm when things break, find root cause fast, and you're allergic to the same incident happening twice.
Automation-first mindset. You'd rather spend a day automating a recurring 10-minute task than do it manually forever — and you build with coding agents like Claude Code as a default.
Pragmatic about reliability. You know the difference between what needs five nines and what doesn't, and you spend effort where it actually matters.
Bonus: where software meets the physical world. Hardware / lab / IoT, queues and pipelines, or cloud infra at scale — anything that has to keep running when there's something real on the other end.
Curious about biology. No background required, but you should find it genuinely interesting that the thing you're keeping alive is a lab where AI runs real experiments.

Application deadline

We are reviewing applicants on a rolling basis.

Apply now

Site Reliability Engineer

JobsCloseBy Editorial Insights

What we're looking for

More jobs

Gestionnaire administratif Entreprises 60%-70%

BCV

Gestionnaire help desk spécialisé

BCV