Senior Researcher – Vision-Language Models (VLM)

Full-time

On-site

Markham, 08

JobsCloseBy Editorial Insights

Huawei Canada is hiring a Senior Researcher for Vision-Language Models in Markham, a permanent onsite, full-time role with the Human-Machine Interaction Lab. You will design, train, evaluate and optimize on-device computer vision and multimodal models, prototype SOTA architectures for image understanding, visual search, detection and segmentation, and implement algorithms from scratch or with PyTorch, TensorFlow and industry libraries. The role includes managing large multimodal datasets, building data pipelines, deploying models to production, and guiding retraining and versioning. Ideal candidates have a PhD or MSc in CS with 3+ years in vision-language or multimodal AI, strong CV/ML toolkits, familiarity with transformers, diffusion models, CLIP/ALIGN, and experience with on-device deployment, open-source contributions, or commercial interactive VLMs are assets. To apply, tailor your resume to highlight relevant projects, provide a portfolio or publications, and demonstrate measurable impact and collaboration across researchers, engineers and designers, especially in mobile or embedded settings.

Huawei Canada has an immediate permanent opening for a Senior Researcher.

About the team:
The Human-Machine Interaction Lab unites global talents to redefine the relationship between humans and technology. Focused on innovation and user-centered design, the lab strives to advance human-computer interaction research. Our team includes researchers, engineers, and designers collaborating across disciplines to develop novel interactive systems, sensing technologies, wearable and IoT systems, human factors, computer vision, and multimodal interfaces. Through high-impact products and cutting-edge research, we aim to enhance user experiences and interactions with technology.

About the job:

Design, develop, train, evaluate, and optimize advanced Computer Vision, and Machine Learning models, and Vision-Language models (e.g., transformers, multimodal encoders, diffusion models), emphasizing on-device performance and efficiency
Prototype and optimize SOTA architectures for tasks such as image understanding, visual search, object detection, segmentation, multimodal grounding, etc.
Implement Computer Vision and Machine Learning algorithms from scratch or leverage existing libraries and frameworks (e.g., TensorFlow, PyTorch, scikit-learn, Keras)
Explore and apply techniques such as quantization, pruning, distillation, LoRA adapters to meet mobile/embedded constraints
Choose appropriate algorithms and techniques based on problem requirements, data characteristics, and business needs
Manage and process large multimodal datasets (images, videos, text)
Build and maintain data pipelines for model training and inference
Deploy Machine Learning models to production environments and maintain model retraining and versioning strategies

Requirements

About the ideal candidate:

Ph.D. or Master's degree in Computer Science or a related field with a focus on Computer Vision and Machine Learning
Minimum 3 years of research and development experience in Vision-Language or multimodal AI, with a strong portfolio of applied projects or publications
Proficiency in Computer Vision and Machine Learning frameworks (e.g., TensorFlow, PyTorch), and modern CV toolchains (OpenCV, MMDetection, Detectron2, etc.)
Familiarity with transformers, diffusion models, contrastive learning (e.g., CLIP, ALIGN), and prompt/adaptor-based fine-tuning techniques is an asset
On-device model deployment experience is an asset
Experience contributing to relevant open-source projects is an asset
Experience building commercial agent, conversational AI, or interactive VLM systems is an asset

Apply now

Senior Researcher – Vision-Language Models (VLM)

JobsCloseBy Editorial Insights

Requirements

More jobs

Utility Tree Trimmer [Markham]

The Davey Tree Expert Company

Caregiver

Home Care Association of America