Lead Data Architect – Cloud Lakehouse (Azure | Databricks | Spark)

Job Overview

Job Description

411_3290861

Urgent requirement for Lead Data Architect – Cloud Lakehouse (Azure | Databricks | Spark) is required for our client in Abu Dhabi, UAE

Must-Have Experience

  • Strong experience on end-to-end architecture of a production data platform (lakehouse / warehouse / analytics)
  • Strong experience on Advanced PySpark optimization (joins, shuffles, skew handling, caching, AQE)
  • Strong experience on Databricks on Azure
  • Strong experience on Implementing lineage, metadata, and observability
  • Strong experience on CI/CD pipelines for data using Jenkins or GitLab CI/CD

Core Responsibilities

  • Own the end-to-end data architecture for cloud-native analytical platforms, from ingestion to consumption, with zero tolerance for brittle or over-engineered designs
  • Design and evolve enterprise-grade data lake and warehouse architectures on Azure that scale to billions of records and multiple consumption patterns (BI, ML, analytics)
  • Make irreversible architectural decisions around storage formats, partitioning strategies, schema evolution, and data modeling — and stand behind them
  • Define and enforce non-negotiable architectural standards for performance, cost efficiency, reliability, and security

Advanced Data Engineering Leadership

  • Architect and optimize high-throughput, low-latency data pipelines using Databricks, PySpark, and Azure-native services
  • Set the technical bar for ETL/ELT frameworks, orchestration, dependency management, and failure recovery patterns
  • Personally review and challenge pipeline designs, Spark jobs, and SQL logic — no rubber‑stamp approvals
  • Lead the transition from ad-hoc pipelines to fully productionized, observable, and automated data workflows

Data Quality, Governance & Observability

  • Design and implement enterprise-grade data quality frameworks (validation, anomaly detection, reconciliation)
  • Establish data lineage, metadata management, and monitoring as first-class architectural components
  • Ensure datasets are audit-ready, reproducible, and trustworthy for executive, regulatory, and ML use cases

CI/CD & Engineering Excellence

  • Architect CI/CD pipelines for data using Git-based workflows and tools such as Jenkins or GitLab CI/CD
  • Enforce automated testing strategies for data (unit, integration, data quality checks)
  • Eliminate manual deployments and fragile handoffs across environments

Cross-Functional & Strategic Influence

  • Translate ambiguous business requirements into clear, scalable data architectures
  • Partner deeply with ML engineers, analysts, product leaders, and executives to design data assets that directly enable business outcomes
  • Act as the final technical authority in data architecture discussions, tradeoffs, and escalations

Team Enablement (Without Micromanagement)

  • Mentor senior data engineers and technical leads, pushing them toward architectural thinking and ownership
  • Set expectations for engineering rigor, documentation, and decision-making clarity
  • Raise the technical maturity of the organization, not just deliver projects

Hard Requirements (Non-Negotiable)

  • 7+ years in Data Engineering / Data Architecture, with proven ownership of large-scale production data platforms
  • 3+ years making architectural decisions, not just implementing someone’s design
  • Deep, hands-on expertise with Databricks + PySpark in real-world, high-volume environments
  • Strong command of Microsoft Azure data services and cloud-native architecture patterns
  • Expert-level Python and strong Spark optimization skills (partitioning, joins, caching, tuning)
  • Proven experience designing fault-tolerant, highly available, cost-efficient data systems
  • Strong Git-based development practices and experience enforcing engineering standards
  • Demonstrated success implementing CI/CD for data pipelines
  • Ability to explain complex architectural tradeoffs clearly to both engineers and senior stakeholders

Skills: architecture, cloud lakehouse, data, spark

#J-18808-Ljbffr

2026-02-27 08:36:29