Senior SRE Engineer (R3386)
Shield AI
This job is no longer accepting applications
See open jobs at Shield AI.See open jobs similar to "Senior SRE Engineer (R3386)" Homebrew.What You'll Do:
- Design, implement, and maintain robust monitoring, logging, and alerting systems
- Define incident response procedures and participate in on-call rotations
- Identify and resolve reliability and performance issues across services
- Develop automation tools to streamline operations and reduce manual interventions
- Collaborate with engineering teams to ensure new services are production-ready
- Conduct root cause analyses and implement post-incident improvements
- Champion a culture of reliability, observability, and operational excellence
Required Qualifications:
- 5+ years of experience in Site Reliability Engineering, DevOps, or related roles
- Strong experience with AWS services (EC2, ECS/EKS, RDS, IAM, etc.)
- Deep understanding of Kubernetes and containerized deployments
- Proficiency with monitoring and observability tools (e.g. Prometheus, Grafana, Datadog, ELK)
- Strong scripting or programming skills (Python, Go, Bash, etc.)
- Experience with infrastructure-as-code (Terraform, CloudFormation, or similar)
- Solid understanding of networking, Linux systems, and distributed architectures
Preferred Qualifications:
- Experience with service meshes (e.g., Istio or Linkerd)
- Familiarity with security best practices in cloud environments
- Exposure to GitOps workflows and tools (e.g., ArgoCD or Flux)
This job is no longer accepting applications
See open jobs at Shield AI.See open jobs similar to "Senior SRE Engineer (R3386)" Homebrew.