Google Cloud is seeking a Senior Staff Software Engineer, Applied AI in Zürich to own the core engineering of metrics and evaluations for next generation AI agents. Candidates should have 8+ years of software development, a Bachelor’s degree or equivalent, and hands on experience with Large Language Models and building agents; Python proficiency and a solid grounding in core ML concepts, evaluation practices, and data analysis are highly valued. You will architect two benchmarking philosophies frontier and competency, create high quality evaluation datasets, define business driven metrics, and prototype solutions in close partnership with customers, product management and business development. This role offers the chance to influence model training and deployment decisions, collaborate with model owners and experts from DeepMind and Vertex, and deliver scalable evaluation tooling, all onsite in Zürich. To apply, highlight projects where you built evaluation datasets, defined metrics tied to business outcomes, and led cross functional work with model owners and customers.
Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
In this role, you will focus on core engineering of the metrics and evaluations, ensuring the quality, reliability, and commercial viability of next-generation AI agents. You will architect and deliver two benchmarking philosophies that drive both internal innovation and external cloud customer success. In frontier benchmarking, you will partner directly with model owners to define datasets and metrics that measure the ceiling of emerging capabilities. You will influence model training and fine-tuning to unlock new business opportunities for cloud customers. In competency benchmarking, you will define the floor of reliability required for real-world applications.
Applied AI builds conversational agents deployed at a large scale that achieve very meaningful results in the real world. Some examples include the customer agent built for large call center environments, to fast food ordering handled by our Food AI agent. The team is transforming how enterprises connect with customers through the power of AI. We also offer unique experiences for team members where you get to work directly with the model builders (Google DeepMind / Vertex), learn and work with brilliant AI leaders, and have access to Global 1000 customers via our existing Google Cloud relationships. The opportunity in this space is tremendous.