Virtusa logo

Lead Data Engineer - Azure Databricks/Kafka

Virtusa
1 day ago
Full-time
On-site
Dubai, 03

JobsCloseBy Editorial Insights

Virtusa is hiring a Lead Data Engineer for Azure Databricks/Kafka in Dubai, a full-time onsite role. You will design streaming ingestion pipelines with Spark Structured Streaming and Databricks Auto Loader, ingesting data from cloud storage or Kafka/RabbitMQ/Confluent Cloud into Delta Lake with schema evolution and exactly-once semantics. Expect to implement CDC and deduplication with Debezium or native tools, apply watermarking, and build a config-driven framework using Airflow or Delta Live Tables to scale hundreds of tables. Monitor with Prometheus and Grafana, enforce RBAC and encryption, and participate in CI/CD workflows with Jenkins or GitHub Actions and Terraform. To apply, tailor your resume to highlight Spark/Databricks and Kafka, quantify impact, provide code samples, and be ready to discuss architecture, security, and deployment approaches.


Lead Data Engineer - Azure Databricks/Kafka - (CREQ262573)

Description

 

Design and develop streaming ingestion pipelines using Apache Spark (Structured Streaming) and Databricks Auto Loader to consume files from cloud storage or messages from Kafka/RabbitMQ/Confluent Cloud and ingest them into Delta Lake, ensuring schema evolution and exactly once semantics. Implement CDC and deduplication logic by capturing change events from source databases using Debezium, built-in CDC features of SQL Server/Oracle, or other connectors, and apply watermarking and drop duplicate strategies based on primary keys and event timestamps. Scale ingestion through configuration by building a config-driven framework such as using Airflow, DBX Jobs, or Delta Live Tables that iterates over metadata tables to deploy/update ingestion pipelines for hundreds of tables/sources without code duplication. Implement monitoring, observability, and security by capturing streaming query metrics and publishing them to monitoring platforms like Prometheus and Grafana, setting up dashboards for lag, files processed, and processing duration, and enforcing role-based access control, encryption, and data masking. Participate in DevOps processes by using CI/CD pipelines, such as Jenkins or GitHub Actions, to automate the deployment of jobs, managing infrastructure with Terraform or similar tools, and following best practices for version control and code reviews. This role requires 5–8 years of experience designing and building data pipelines using Apache Spark, Databricks, or equivalent big data frameworks, along with hands-on expertise with streaming and messaging systems such as Apache Kafka, Confluent Cloud, RabbitMQ, or Azure Event Hub, including creating producers, consumers, and topics and integrating them into downstream processing. Candidates should possess a deep understanding of relational databases and CDC, with proficiency in SQL Server, Oracle, or other RDBMSs and experience capturing change events using Debezium or native CDC tools; proficiency in programming languages such as Python, Scala, or Java; solid knowledge of SQL for data manipulation and transformation; cloud platform expertise, specifically with Azure or AWS services for data storage, compute, and orchestration; and knowledge of data Lakehouse architectures, Delta Lake, partitioning strategies, and performance optimization. Additionally, familiarity with Git, CI/CD pipelines, and infrastructure-as-code is essential,

 
  

Primary Location

: AE-DU-Dubai

Schedule

: Full Time

Employee Status

: Individual Contributor

Job Type

: Experienced

Travel

: No

Job Posting

: 02/07/2026, 7:49:57 AM