Senior Cloud Engineer (SD/TX/DC)
Shield AI
What you'll do:
- Engineering:
- Oversee the day-to-day management and optimization of cloud-based infrastructure (e.g., Azure, AWS).
- Support and optimize cloud and virtual machine environments, assisting with capacity planning, performance monitoring, security compliance, and vulnerability remediation.
- Assist in implementing and maintaining infrastructure systems, including servers, storage, backup solutions, and disaster recovery processes, for both public and private clouds.
- Demonstrate a willingness to learn and work with familiar or unfamiliar operating systems and workloads with the desire to leverage automation tasks for repeatable tasks.
- Author and produce the necessary documentation for engineered and maintained systems along with associated processes which supporting teams can leverage.
- Assist in researching, recommending, and developing innovative solutions for complex requirements and issue resolution.
- Participate in Agile methodologies and sound engineering principles.
- Operations and Support:
- Perform daily system monitoring, verifying the integrity and availability of all server resources, systems and key processes, reviewing system and application logs.
- Support system maintenance and upgrades, including OS patching, software configuration, hardware updates, and performance tuning to ensure optimal cloud infrastructure performance.
- Provide escalated support for operational issues possibly during and after normal business hours for systems, workloads, and Kubernetes AI infrastructure.
- Analyze, troubleshoot and resolve system infrastructure and software issues.
- Possess the capacity to participate in on-call, emergency, or maintenance roles.
Required Qualifications:
- Bachelor’s degree in a technical discipline, or at least 4 years of experience plus an engineer level certification, Azure/AWS Associate, or another similar level certification.
- 4 years’ experience supporting applications and systems in a production environment, preferably for a software and/or manufacturing development company.
- Comfortable with operational efficiencies utilizing Infrastructure as Code (IaC) solutions (e.g., Terraform, Ansible).
- Experience in automating repetitive tasks using scripting languages such as PowerShell, Python, or Bash.
- Experience with deployment and systems administration of at least one type of Linux distribution (i.e. RHEL, Ubuntu)
- Experience with concepts of Microsoft Windows Server administration, Azure and Active Directory environments
- Ability to work independently to accomplish assigned tasks.
- Possesses organizational skills, with a process-oriented mindset, attention to detail, and effective verbal and written communication abilities.
- Solution-oriented, constructive approach to problem-solving.
- Local to San Diego, CA, Dallas, TX and Washington D.C.
Preferred qualifications:
- Proven engineering experience with deploying and maintaining workloads in Azure public cloud
- Fundamental understanding of at least one type of virtualization platform for private cloud (i.e. VMware, Hyper-V, KVM, etc.).
- Experience in DevOps, Site Reliability Engineering, or cloud infrastructure roles.
- Familiarity with configuration management tools like Ansible, Chef, or Puppet.
- Experience building robust monitoring and alerting systems for mission-critical applications.
- Solid understanding of CI/CD pipelines and possesses the ability to optimize.