Site Reliability Engineer, Supervisor

2026-166102

CAtegory:

Information Technology

Clearance:

Public Trust

Location:

,

Telecommute:

Remote work allowed 100%
About Peraton

Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees solve the most daunting challenges that our customers face. Visit peraton.com to learn how we’re keeping people around the world safe and secure.

About The Role

Peraton is seeking a Site Reliability Engineer (SRE), Supervisor - an experienced technical professional responsible for ensuring the availability, performance, and scalability of complex software systems and infrastructure. This role blends software engineering and systems administration expertise to design, automate, and maintain resilient production environments that support business-critical applications.

 

Site Reliability Engineer, Supervisor works closely with development, infrastructure, and product teams to build reliability frameworks, improve observability, and drive system health improvements. They lead efforts to automate manual processes, manage incident responses, and implement SLOs to maintain a seamless user experience. This position is ideal for a technically skilled and collaborative professional eager to lead reliability initiatives in complex environments, ensuring architectural, technical excellence and high service availability.

 

This opportunity will support the modernization of a large-scale multi-tenant cloud ecosystem, providing critical enterprise-wide support for more than 40 million users in a complex stakeholder environment. This position requires senior level leadership skills combined with modern cloud and industry leading technical capabilities including product development, strict security compliance, latest technology cloud solutions, reliable application delivery with SaaS and Artificial Intelligence integrations and rapid continuous delivery.   

 

Core Responsibilities

  • System Reliability and Performance: Ensure high availability and responsiveness of services by designing and implementing monitoring, alerting, and automated remediation tools. Analyze system metrics and logs to identify areas for improvement and optimize system performance.
  • Automation and Tooling: Develop and maintain scripts, configuration management, and infrastructure-as-code to automate deployment, scaling, and management of infrastructure. Lead efforts to reduce toil through automation and reliability engineering best practices.
  • Incident Management: Participate in on-call rotations to respond to incidents promptly. Conduct thorough root cause analysis and collaborate with engineering teams to implement preventive measures.
  • Collaboration and Consulting: Partner with software developers, product managers, and infrastructure teams to embed reliability into the software development lifecycle. Provide guidance on system architecture, capacity planning, and disaster recovery strategies.

Leadership and Mentorship

  • Technical Leadership: Mentor junior SREs and engineers on reliability engineering principles, tools, and technical excellence. Lead by example in coding standards, system design, and incident response.
  • Cross-functional Communication: Articulate technical issues and reliability impacts to non-technical stakeholders. Drive alignment on priorities and continuous improvements across teams.
  • Project Management: Lead reliability-related projects and initiatives, managing timelines, resources, and stakeholder communication to deliver impactful results.
  • Agile Practices and Continuous Improvement: Promote agile practices to enhance team efficiency. Advocate for continuous learning and process refinement in system reliability.

**Position could support /work across multiple enterprise- wide efforts within Peraton.**

Qualifications

Key Skills and Qualifications:

 

  • 6 years of experience, may have lead experience
  • Strong software engineering background with proficiency in languages such as Python, Go, or similar.
  • Deep understanding of distributed systems, cloud infrastructure (AWS, Azure, GCP), container orchestration (Kubernetes), and monitoring tools (Prometheus, Grafana, OpenTelemetry).
  • Experience defining and implementing SLOs, SLIs, and error budgets to measure and maintain service reliability.
  • Excellent problem-solving skills with a proactive approach to incident prevention and resolution.
  • Strong communication skills to effectively collaborate with diverse teams and present reliability insights.
  • 5+ years of experience in site reliability engineering, systems engineering, or related roles with a proven track record of delivering scalable, reliable systems.

Clearance Requirements:

  • U.S. Citizenship required
  • Ability to obtain agency clearance (public trust)

Preferred Qualifications:

  • Top Secret clearance preferred

 

Details

Target Salary Range: $104,000 - $166,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay.

Benefits Statement: Peraton offers eligible employees a variety of benefits including medical, dental, vision, life, health savings account, short/long term disability, EAP, parental leave, 401(k), paid time off (PTO) for vacation, and company paid holidays. A full listing of available benefits can be viewed at <a href="https://www.careers.peraton.com/benefits" target="_blank" rel="noopener">https://www.careers.peraton.com/benefits.&nbsp;

Application Duration Statement: The application period for the job is estimated to be 30 days from the job posting date. However, this timeline may be shortened or extended depending on business needs and the availability of qualified candidates.&nbsp;

EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.

VIEW
SAVED
JOBS