JobsCloseBy Editorial Insights
Onit is seeking an MLOps Engineer in Auckland to design scalable MLOps infrastructure, data pipelines, and production-grade AI platforms that power reliable training, evaluation and deployment. You will partner with AI Engineering and Data Science to operationalise workflows and ensure reproducible experiments, deploying on AWS with S3, ECS/EKS, Lambda, SQS and CloudWatch, including GPU clusters and cost optimisation. The role emphasises robust data foundations, governance, monitoring, and compliance, with knowledge graphs as a preferred plus. Requirements: 3+ years in MLOps, strong AWS experience, Docker/Kubernetes, and a CS/Engineering degree; Terraform is a plus. Apply with end-to-end ML lifecycle impact, measurable outcomes, and links to dashboards or code, plus evidence of cross-team collaboration.
We are seeking an MLOps Engineer to build and scale the infrastructure, pipelines, and operational foundations required for modern machine learning and large language model development. You will play a critical role in enabling reliable model training, evaluation, and deployment by establishing strong data and platform foundations.This role sits at the intersection of data engineering, cloud infrastructure, and applied machine learning, ensuring that AI teams can move efficiently from experimentation to production-ready systems.You will partner closely with AI Engineering and Data Science teams to operationalise model development workflows and maintain scalable, secure AI infrastructure.
Key Responsibilities
- MLOps Infrastructure & Platform Enablement
- Design and implement scalable MLOps infrastructure to support model development, training, evaluation, and deployment.
- Build reusable automation frameworks for model lifecycle management, including CI/CT for Large Language Models.
- Establish best practices for reproducible experimentation and production-grade AI system operations.
- Data Foundation for Model Development
- Develop and maintain robust data pipelines and storage foundations required for machine learning and LLM workflows.
- Ensure high-quality, well-governed datasets are available for training, fine-tuning, and benchmarking.
- Partner with Data and AI teams to enable dataset versioning, lineage, and repeatable refresh processes.
- Implement controls for privacy, anonymisation, and compliance when handling enterprise or client-derived training data.
- AWS-Based Model Operations
- Own the deployment and scaling of AI infrastructure on AWS, leveraging services such as S3, ECS/EKS, Lambda, SQS, and CloudWatch.
- Experience in other AWS backed settings on enabling and managing GPU clusters and distributed inference.
- Optimise training and inference environments for performance, reliability, and cost efficiency.
- Implement monitoring, alerting, and operational workflows for model-serving systems.
- Model Deployment & Production Readiness
- Support the deployment of machine learning and LLM models into production environments using modern MLOps practices.
- Collaborate with backend engineering teams to integrate AI services through APIs and enterprise workflows.
- Ensure model systems meet reliability, latency, and scalability requirements.
- Observability, Governance, and Compliance
- Establish monitoring and evaluation pipelines for model performance, drift detection, and operational health.
- Ensure infrastructure and workflows align with enterprise security requirements and responsible AI governance practices.
- Maintain auditability and documentation across datasets, pipelines, and model releases.
- Knowledge Systems and Graph Integration (Preferred)
- Support AI architectures that incorporate knowledge graphs and graph databases for retrieval, reasoning, and enterprise context enrichment.
- Collaborate with engineering teams to operationalise graph-backed pipelines alongside modern ML systems.
- Contribute to scalable integration patterns between graph data layers and LLM-based applications.
Qualifications
- Required
- Bachelor’s or Master’s degree in Computer Science, Engineering, Machine Learning, or a related field.
- 3+ years of experience in MLOps, ML infrastructure, or cloud-based data/AI platform engineering.
- Strong hands-on experience building and operating AI infrastructure on AWS.
- Experience developing data foundations and pipelines supporting model development workflows.
- Familiarity with containerisation and orchestration tools such as Docker and Kubernetes.
- Demonstrated ability to support ML systems moving from experimentation into production environments.
- Preferred
- Experience in enterprise software, legal tech, or other regulated domains.
- Familiarity with graph databases (e.g., Stardog) and knowledge graph-based AI architectures.
- Exposure to LLM pipelines, retrieval-augmented generation (RAG), or agent-based AI workflows.
- Experience with Infrastructure-as-Code tools such as Terraform or CloudFormation.
Success Metrics
- Reliable and scalable AWS-based infrastructure enabling efficient model development and deployment.
- Strong data foundations supporting compliant, repeatable training and evaluation workflows.
- Reduced friction in research-to-production transitions for AI engineering teams.
- High operational quality through monitoring, governance, and automation of ML systems.
- Successful enablement of advanced AI architectures, including graph-backed and retrieval-driven workflows.