Machine Learning Ops Engineer (SME)
CAtegory:
Clearance:
Location:
Telecommute:
About Peraton
Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees solve the most daunting challenges that our customers face. Visit peraton.com to learn how we’re keeping people around the world safe and secure.
Program Overview
About The Role
Peraton is seeking an experienced Machine Learning Ops Engineer (SME) to support U.S. Customs and Border Protection (CBP) by ensuring the secure, reliable, and scalable operation of machine learning systems within CBP’s analytics and intelligence support programs.
This role operationalizes AI solutions by building the platforms, pipelines, monitoring, and governance controls that move models from research into mission-ready production environments. The ideal candidate combines strong reliability engineering, AI/ML lifecycle expertise, security awareness, cost optimization discipline, and cross-functional collaboration skills.
Support will be provided across multiple mission locations:
- Ashburn, VA
- Sterling, VA
- Washington, D.C.
Key Responsibilities
- Design, deploy, and maintain scalable ML platforms supporting model training, batch processing, and real-time inference.
- Build and manage CI/CD pipelines for machine learning code, data, and model artifacts.
- Deploy and manage containerized workloads using Kubernetes and cloud-native infrastructure.
- Implement model lifecycle management, including versioning, retraining, and automated validation workflows.
- Develop monitoring solutions for system health, model performance, latency, drift, and reliability.
- Define and maintain SLOs/SLAs and support incident response for production ML systems.
- Collaborate with data scientists, engineers, and platform teams to productionize machine learning models.
- Ensure secure system configurations including IAM/RBAC, encryption, secrets management, and audit logging.
- Support data governance, model reproducibility, and Responsible AI practices in compliance with federal security requirements.
- Develop documentation, runbooks, and reusable workflows to improve operational efficiency and platform reliability.
**Position is contingent upon contract award**
Qualifications
Required Qualifications
- Minimum of 12 years with BS/BA; Minimum of 10 years with MS/MA. 16 years with a HS diploma/equivalent can be considered in lieu of a degree.
- 8+ years in SRE, DevOps, Platform Engineering, or ML Engineering supporting production systems.
- Experience with Kubernetes, Docker, and cloud platforms (AWS, Azure, or GCP).
- Proficiency in Python (and/or Java/Go).
- Experience implementing CI/CD, monitoring, and secure deployment practices.
- Knowledge of model lifecycle management, drift monitoring, and data pipeline operations.
Clearance Requirements
- Ability to obtain and maintain required CBP BI suitability.
- U.S. Citizenship required.
Preferred Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- Experience with ML platforms (MLflow, Kubeflow, SageMaker, Azure ML, Vertex AI).
- Familiarity with distributed training, GPU optimization, or LLMOps workflows.
- Experience in regulated or federal environments.
- Relevant cloud or Kubernetes certifications.
SCA / Union / Intern Rate or Range
Details
EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.