Site Reliability Engineer

Job Overview

Job Description

411_3426763

Core Skills & Responsibilities (Must-Have)

  • ✔️ 6+ years of hands‑on experience as a Site Reliability Engineer (SRE)
  • ✔️ Define and manage SLIs, SLOs, and Error Budgets across critical systems
  • ✔️ Design and maintain highly reliable, scalable, and fault‑tolerant production environments
  • ✔️ Drive toil reduction, automation, and self‑healing systems using Python / Bash
  • ✔️ Strong Linux system administration experience in production environments
  • ✔️ Hands‑on experience managing Kubernetes or OpenShift platforms
  • ✔️ Implement Infrastructure as Code (Terraform / Ansible)
  • ✔️ Build and optimize CI/CD pipelines using GitLab, Jenkins, or Azure DevOps
  • ✔️ Implement safe deployment strategies (Blue‑Green, Canary, Rolling deployments)
  • ✔️ Hands‑on experience with AWS, Azure, or Private Cloud infrastructure
  • ✔️ Own incident management, on‑call rotations, RCA, and post‑incident reviews
  • ✔️ Implement and manage observability & monitoring using Prometheus, Grafana, ELK/EFK
  • ✔️ Reduce alert noise and improve alert quality to avoid alert fatigue
  • ✔️ Collaborate with development and platform teams to improve system reliability
  • ✔️ Ensure compliance with security, governance, and standards (ISO, NIST, ITIL)
  • ✔️ Experience supporting high‑availability, mission‑critical systems

Preferred / Good to Have

  • ➕ Experience in large‑scale government or regulated environments
  • ➕ Exposure to microservices‑based architectures
  • ➕ Familiarity with chaos engineering / resilience testing tools
  • ➕ Strong stakeholder communication and documentation skills

Language skills: Arabic, English

#J-18808-Ljbffr

2026-03-28 08:10:44