Senior Software Engineer - Site Reliability (Federal Operations) at Abnormal
, , United States -
Full Time


Start Date

Immediate

Expiry Date

13 Jan, 26

Salary

207000.0

Posted On

15 Oct, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

DevOps, Kubernetes, Terraform, Python, Go, AWS, Distributed Systems, Redis, Kafka, PubSub, Relational Databases, Incident Management, Operational Reliability, Collaboration, Communication, Process Improvement

Industry

Computer and Network Security

Description
About the Role We're looking for a Senior Software Backend Engineer with strong DevOps expertise to lead and support our team managing restricted government cloud environments for federal customers. This role involves both building scalable infrastructure and supporting distributed systems, with a focus on reliability, performance, and compliance. As a Senior Software Backend and DevOps Engineer, you'll troubleshoot production issues, enhance infrastructure, and ensure a smooth, secure experience for federal users. You’ll also collaborate with cross-functional teams to drive continuous improvements in deployment, monitoring, and system design. The ideal candidate: Brings a combination of technical depth, problem-solving skills, and leadership experience Navigates ambiguity and works effectively in complex, distributed systems Collaborates across teams and escalates issues when needed to maintain progress Identifies and addresses inefficiencies in workflows and operations Contributes to clarity, accountability, and process improvement Balances individual technical work with big picture system and team needs Key Responsibilities Automate and build tools to eliminate repetitive operational tasks and reduce toil Maintain and scale reliable software applications using DevOps best practices Build and enhance CI/CD pipelines for automated testing, builds, and deployments Optimize and maintain Kubernetes-based orchestration systems for performance and reliability Troubleshoot complex production issues across application, infrastructure, and distributed system layers Participate in on-call rotations and support incident response Mentor junior engineers in software development and operational best practices Collaborate with stakeholders and product teams on infrastructure and deployment requirements Ensure compliance with government cloud standards across applications and infrastructure Must-Have Qualifications Proven ability to maintain 99.99% uptime in production environments 10+ years of overall experience, including 6+ years in software development and 3+ years in DevOps practices. 3+ years of experience with Kubernetes, Terraform, Python or Go, and AWS 4+ years of experience working with distributed systems Familiarity with Redis, Kafka/PubSub, and relational databases Experience in fast-paced or startup-like environments Strong collaboration and communication skills across cross-functional teams and divisions Ability to ramp up quickly and contribute in complex, large-scale environments Demonstrated leadership in incident management and operational reliability Nice-to-Have Qualifications Experience with FedRAMP compliance and government security requirements Track record of implementing secure CI/CD pipelines in restricted or regulated environments #LI-NT1 At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons. Base pay range: $176,000—$207,000 USD San Francisco/New York Base pay range: $195,000—$230,000 USD Abnormal AI is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status or other characteristics protected by law. For our EEO policy statement please click here. If you would like more information on your EEO rights under the law, please click here.
Responsibilities
The role involves automating and building tools to eliminate repetitive operational tasks, maintaining and scaling reliable software applications, and troubleshooting complex production issues. Additionally, the engineer will mentor junior engineers and collaborate with stakeholders on infrastructure and deployment requirements.
Loading...