RIVR logo

AI Intern – Vision-Language-Action (VLA)

RIVR
Internship
On-site
Zürich, ZH

JobsCloseBy Editorial Insights

RIVR's AI Intern Vision-Language-Action role in Zurich offers experience at the intersection of data engineering and model training for robotic systems. You will process and analyze sensor data, build tools to visualize data and debug model performance, and collaborate with senior engineers to implement data strategies that improve reliability while integrating software components to evaluate algorithms in simulation and on hardware. Requirements include a BSc in CS, Robotics, ML or related field, strong Python and PyTorch, and DL for computer vision, plus experience with NumPy, Pandas, and OpenCV. Bonus: large-scale image/video data, transformers, 3D geometry, sensor fusion, robotics. Note Schengen citizens only, except ETHZ/EPFL compulsory internships; in-person at our offices is required.


RIVR is a Swiss robotics company pioneering Physical AI and robotic solutions to revolutionize last-mile delivery, giving 1 human the power of 1000. Through the combination of artificial neural networks and innovative robot designs with wheels and legs, RIVR aims to enhance efficiency, sustainability, and scalability in last-mile delivery. Founded as Swiss-Mile, the company rebranded to RIVR in 2025 to better reflect its mission of driving the future of intelligent robotics.
As an AI Intern - VLA & Data, you will assist the AI engineering team in developing data pipelines and training Vision-Language-Action (VLA) models for robotic systems. In this role, you will sit at the intersection of data engineering and model training, assisting the team in solving the "data bottleneck" in embodied AI. Additionally, your responsibilities will include building tools to analyze datasets, visualize model predictions, and debug performance. You will work closely with senior engineers to understand, curate, and visualize the massive amounts of multi-modal data our fleet generates.
This role offers hands-on experience within a dynamic and collaborative environment. We are committed to finding and nurturing exceptional talent; our internships are a key pathway to recruiting outstanding graduates who can make a significant impact in our team.
Important Notice: For this position, we can unfortunately only accept applications from citizens of Schengen Area countries. This restriction does not apply to ETHZ and EPFL students who are required to complete compulsory internships as part of their studies.

What you’ll be doing

  • Support the team in processing, curating, and analyzing multi-modal sensor data for training VLA models.
  • Assist in developing software tools to visualize data and debug model performance.
  • Work closely with senior engineers to implement data strategies that improve the robustness of our robotic systems.
  • Help integrate software components to evaluate algorithms in simulation and on hardware.
  • Engage in continuous learning and gain exposure to recent developments in VLA, self-supervised learning, and generative AI.

What you must have

  • At least BSc in Computer Science, Robotics, Machine Learning, or a related field.
  • Proficiency in Python and experience with deep learning frameworks (preferably PyTorch).
  • Hands-on experience (either through coursework, previous internships, or other projects) with deep learning for computer vision.
  • Familiarity with data manipulation libraries (e.g., NumPy, Pandas, OpenCV).
  • Strong problem-solving skills and an eagerness to work with complex, real-world data.
  • Eagerness to learn and contribute in a collaborative team environment.

Get some bonus points

  • Previous experience working with large-scale image or video datasets.
  • Knowledge of transformers or Vision-Language Models.
  • Experience with 3D geometry, camera projections, or sensor fusion.
  • Experience working with robotic systems
RIVR is committed to building a diverse and inclusive team that values every perspective. If you’re passionate about driving innovation in robotics and creating meaningful impact, we encourage you to apply and bring your unique self to our team.
We believe the best work is done when collaborating and therefore require in-person presence in our office locations.