XAI is seeking a Software Engineer, Network C++ for Colossus to design and operate the datacenter network powering Grok and frontier AI models. Onsite in Seattle, you will own core networking software from design to deployment, handle routing and traffic engineering, and participate in architecture reviews with experiments at small and full cluster scales. To stand out, show production C/C++ expertise, deep networking knowledge (UDP, TCP/IP, RDMA), distributed systems, and real time or HPC experience, and demonstrate impact on training efficiency and model convergence. Highlight collaboration, concise communication, and a willingness to work extended hours in a fast paced environment.
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
SOFTWARE ENGINEER, NETWORK C++ (COLOSSUS)
At xAI, we design, build, and operate Colossus from the ground up. This includes the massive GPU clusters, high-speed interconnect fabric, and the software that makes it all work at unprecedented scale. Colossus powers Grok and our frontier AI models with a custom, high-performance datacenter network that delivers ultra-low latency and massive bandwidth across hundreds of thousands of GPUs.
As a Software Engineer on the Colossus Networking team, you will develop the core networking software that maximizes the performance and reliability of our datacenter fabric. Your work will directly impact training efficiency, model convergence, and the speed at which we can push the frontier of AI.
Our engineers own the full lifecycle of their software — from design and implementation to deployment, monitoring, and iteration based on real-world performance at scale. You will solve hard problems in distributed systems, high-performance networking, and real-time control of one of the largest AI supercomputers on Earth.
RESPONSIBILITIES:
BASIC QUALIFICATIONS:
PREFERRED SKILLS AND EXPERIENCE:
ADDITIONAL REQUIREMENTS:
xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.