Lead Data Architect – Cloud Lakehouse (Azure | Databricks | Spark)
نظرة عامة على الوظيفة
-
تاريخ الإعلانمارس 8, 2026
-
الموقع
-
تاريخ إنتهاء الصلاحية--
المسمى الوظيفي
411_3290861
Urgent requirement for Lead Data Architect – Cloud Lakehouse (Azure | Databricks | Spark) is required for our client in Abu Dhabi, UAE
Must-Have Experience
- Strong experience on end-to-end architecture of a production data platform (lakehouse / warehouse / analytics)
- Strong experience on Advanced PySpark optimization (joins, shuffles, skew handling, caching, AQE)
- Strong experience on Databricks on Azure
- Strong experience on Implementing lineage, metadata, and observability
- Strong experience on CI/CD pipelines for data using Jenkins or GitLab CI/CD
Core Responsibilities
- Own the end-to-end data architecture for cloud-native analytical platforms, from ingestion to consumption, with zero tolerance for brittle or over-engineered designs
- Design and evolve enterprise-grade data lake and warehouse architectures on Azure that scale to billions of records and multiple consumption patterns (BI, ML, analytics)
- Make irreversible architectural decisions around storage formats, partitioning strategies, schema evolution, and data modeling — and stand behind them
- Define and enforce non-negotiable architectural standards for performance, cost efficiency, reliability, and security
Advanced Data Engineering Leadership
- Architect and optimize high-throughput, low-latency data pipelines using Databricks, PySpark, and Azure-native services
- Set the technical bar for ETL/ELT frameworks, orchestration, dependency management, and failure recovery patterns
- Personally review and challenge pipeline designs, Spark jobs, and SQL logic — no rubber‑stamp approvals
- Lead the transition from ad-hoc pipelines to fully productionized, observable, and automated data workflows
Data Quality, Governance & Observability
- Design and implement enterprise-grade data quality frameworks (validation, anomaly detection, reconciliation)
- Establish data lineage, metadata management, and monitoring as first-class architectural components
- Ensure datasets are audit-ready, reproducible, and trustworthy for executive, regulatory, and ML use cases
CI/CD & Engineering Excellence
- Architect CI/CD pipelines for data using Git-based workflows and tools such as Jenkins or GitLab CI/CD
- Enforce automated testing strategies for data (unit, integration, data quality checks)
- Eliminate manual deployments and fragile handoffs across environments
Cross-Functional & Strategic Influence
- Translate ambiguous business requirements into clear, scalable data architectures
- Partner deeply with ML engineers, analysts, product leaders, and executives to design data assets that directly enable business outcomes
- Act as the final technical authority in data architecture discussions, tradeoffs, and escalations
Team Enablement (Without Micromanagement)
- Mentor senior data engineers and technical leads, pushing them toward architectural thinking and ownership
- Set expectations for engineering rigor, documentation, and decision-making clarity
- Raise the technical maturity of the organization, not just deliver projects
Hard Requirements (Non-Negotiable)
- 7+ years in Data Engineering / Data Architecture, with proven ownership of large-scale production data platforms
- 3+ years making architectural decisions, not just implementing someone’s design
- Deep, hands-on expertise with Databricks + PySpark in real-world, high-volume environments
- Strong command of Microsoft Azure data services and cloud-native architecture patterns
- Expert-level Python and strong Spark optimization skills (partitioning, joins, caching, tuning)
- Proven experience designing fault-tolerant, highly available, cost-efficient data systems
- Strong Git-based development practices and experience enforcing engineering standards
- Demonstrated success implementing CI/CD for data pipelines
- Ability to explain complex architectural tradeoffs clearly to both engineers and senior stakeholders
Skills: architecture, cloud lakehouse, data, spark
#J-18808-Ljbffr
2026-02-27 08:36:29