Senior Cloud Engineer (R4222)
Shield AI
What you'll do:
- Engineering:
- ·Manage and optimize multi-cloud infrastructure (Azure, AWS) for performance, reliability, and scalability.
- ·Support and optimize cloud and virtual machine environments, assisting with capacity planning, performance monitoring, security compliance, and vulnerability remediation.
- ·Assist in implementing and maintaining infrastructure systems, including servers, storage, backup solutions, and disaster recovery processes, for both public and private clouds.
- ·Continuously learn and adapt to emerging technologies and platforms, leveraging automation wherever possible.
- ·Author and produce the necessary documentation for engineered and maintained systems along with associated processes that supporting teams can leverage.
- ·Assist in researching, recommending, and developing innovative solutions for complex requirements and issue resolution.
- ·Collaborate cross-functionally with AI, DevOps, and Security teams to ensure compliance, observability, and resilience in mission-critical environments.
- ·Participate in Agile methodologies and sound engineering principles.
- Operations and Support:
- ·Perform daily system monitoring, verifying the integrity and availability of all server resources, systems and key processes, reviewing system and application logs.
- ·Support system maintenance and upgrades, including OS patching, software configuration, hardware updates, and performance tuning to ensure optimal cloud infrastructure performance.
- ·Provide escalated support for operational issues possibly during and after normal business hours for systems, workloads, and Kubernetes AI infrastructure.
- ·Analyze, troubleshoot and resolve system infrastructure and software issues.
- ·Ability to participate in on-call, emergency, or maintenance roles
Required qualifications:
- Bachelor’s degree in Computer Science or related field, or equivalent experience (4+ years) plus an engineer level certification, Azure/AWS Associate, or another similar level certification.
- 4 years’ experience supporting applications and systems in a production environment in high-availability, mission-critical, or defense-grade environments preferred.
- Comfortable with operational efficiencies utilizing Infrastructure as Code (IaC) solutions (e.g., Terraform, Ansible).
- Strong understanding of networking concepts (VPCs, VPNs, subnets, routing, firewalls).
- Experience in automating repetitive tasks using scripting languages such as PowerShell, Python, or Bash.
- Experience with deployment and systems administration of at least one type of Linux distribution (i.e. RHEL, Ubuntu)
- Experience with concepts of Microsoft Windows Server administration, Azure and Active Directory environments
- Possesses organizational skills, with a process-oriented mindset, attention to detail, and effective verbal and written communication abilities.
- Ability to work independently to accomplish assigned tasks.
- Solution-oriented, constructive approach to problem-solving.
- Preferred locations include San Diego or San Mateo, CA, or Dallas, TX; Washington, D.C., and Boston are also potential options.
Preferred qualifications:
- Experience deploying and maintaining workloads in Azure public cloud environments.
- Hands-on experience with containerization and Kubernetes-based workloads.
- Strong understanding of virtualization and private cloud platforms (e.g., VMware, Hyper-V, KVM).
- Background in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure roles.
- Proficiency with configuration management and automation tools (e.g., Ansible, Chef, Puppet, Terraform).
- Experience building and optimizing CI
110000 - 170000 USD a year