Senior Software Engineer - ML Infrastructure
Plaid
Responsibilities
- Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems.
- Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development.
- Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring.
- Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency.
- Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration.
- Contribute to technical strategy and architecture discussions within the team.
- Mentor and support other engineers through code reviews, design discussions, and technical guidance.
Qualifications
- 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems.
- Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks).
- Proven experience delivering reliable and scalable infrastructure in production.
- Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability.
- Strong communication skills and ability to collaborate across teams.
- [Nice to have] Experience with ML Ops tools such as MLFlow, SageMaker, or model registries.
- [Nice to have] Exposure to modern AI infrastructure environments (LLMs, real-time inference, agentic models).
- [Nice to have] Background in scaling ML infrastructure in fast-paced product environments.