Site Reliability Engineer
Job Overview
-
Date PostedMarch 29, 2026
-
Location
-
Expiration date--
Job Description
411_3426763
Core Skills & Responsibilities (Must-Have)
- ✔️ 6+ years of hands‑on experience as a Site Reliability Engineer (SRE)
- ✔️ Define and manage SLIs, SLOs, and Error Budgets across critical systems
- ✔️ Design and maintain highly reliable, scalable, and fault‑tolerant production environments
- ✔️ Drive toil reduction, automation, and self‑healing systems using Python / Bash
- ✔️ Strong Linux system administration experience in production environments
- ✔️ Hands‑on experience managing Kubernetes or OpenShift platforms
- ✔️ Implement Infrastructure as Code (Terraform / Ansible)
- ✔️ Build and optimize CI/CD pipelines using GitLab, Jenkins, or Azure DevOps
- ✔️ Implement safe deployment strategies (Blue‑Green, Canary, Rolling deployments)
- ✔️ Hands‑on experience with AWS, Azure, or Private Cloud infrastructure
- ✔️ Own incident management, on‑call rotations, RCA, and post‑incident reviews
- ✔️ Implement and manage observability & monitoring using Prometheus, Grafana, ELK/EFK
- ✔️ Reduce alert noise and improve alert quality to avoid alert fatigue
- ✔️ Collaborate with development and platform teams to improve system reliability
- ✔️ Ensure compliance with security, governance, and standards (ISO, NIST, ITIL)
- ✔️ Experience supporting high‑availability, mission‑critical systems
Preferred / Good to Have
- ➕ Experience in large‑scale government or regulated environments
- ➕ Exposure to microservices‑based architectures
- ➕ Familiarity with chaos engineering / resilience testing tools
- ➕ Strong stakeholder communication and documentation skills
Language skills: Arabic, English
#J-18808-Ljbffr
2026-03-28 08:10:44