Senior Site Reliability Engineer at AMBA

Austin, Texas, United States -

Full Time

Start Date

Immediate

Expiry Date

17 Feb, 26

Salary

0.0

Posted On

19 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Site Reliability Engineering, DevOps, Scripting, Linux Administration, Cloud Platforms, Containerization, Orchestration, Infrastructure as Code, Monitoring Tools, CI/CD Processes, On-Premises Infrastructure, Automated Testing Frameworks, Disaster Recovery Strategies, Incident Response, Performance Optimization, Security Best Practices

Industry

Insurance

Description

Description AMBA is seeking an experienced Senior Site Reliability Engineer to join our IT Team! About AMBA Since 1981, AMBA has been a trusted provider of essential coverage for retired public servants nationwide. Our reach extends to diverse groups, including hardworking public employees, state retirees, educators, military personnel, trade professionals, firefighters, law enforcement, Unions, Alumni groups, Allied Healthcare, and other non-profit associations. As a full-service marketing and membership development company, we proudly offer outstanding insurance services to our vast network of 44 million members across 450+ associations in all 50 states. Benefits Comprehensive benefits package including medical, dental, and vision insurance, spending accounts, and other voluntary benefits. Annual Bonus Program. Corporate 401k Matching. Generous time off including vacation days, 10 paid company holidays, and paid parental leave. Sick time that can be used for both physical and mental wellness days. Community Involvement perks, including 1 paid day off each year to volunteer with a local charity of your choice and company volunteer events. Free, confidential counseling and support through our Employee Assistance Program (EAP). Support & development to cultivate your knowledge and continuing education to support your professional designations. Business casual dress code. Hybrid work arrangement. About the Role As a Senior Site Reliability Engineer (SSRE), you will play a key role in building, scaling, and maintaining systems that are reliable, performant, and secure. You’ll bring a strong software engineering mindset to infrastructure and operations, ensuring our services remain highly available, scalable, and cost-efficient. In this role, you’ll collaborate closely with developers, product teams, and cross-functional engineering partners to enhance system reliability through effective automation, proactive monitoring, performance optimization, and structured incident response. Your work will directly influence the stability and efficiency of the platforms that power our business. Day to Day Design, implement, and manage scalable, highly available systems and infrastructure. Develop tools and automation to eliminate manual operations and improve system reliability. Monitor system performance, availability, and reliability using observability tools. Respond to and resolve incidents, participate in root cause analysis, and implement long-term fixes. Collaborate with development teams to improve application performance, CI/CD pipelines, and deployment processes. Maintain and enforce service level objectives (SLOs) and error budgets. Participate in on-call rotations and create clear runbooks for operational procedures. Implement security and compliance best practices across infrastructure and deployments. Requirements Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience. 5+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role. Strong scripting or programming skills (e.g., Python, Go, Bash, or similar). Deep understanding of Linux/Unix systems administration. Hands-on experience with cloud platforms (e.g., AWS, GCP, Azure). Experience with containerization and orchestration (e.g., Docker, Kubernetes). Proficiency with infrastructure-as-code tools (e.g., Terraform, Ansible, Pulumi). Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK, Datadog). Solid understanding of CI/CD processes and tools (e.g., Jenkins, GitLab CI, GitHub Actions). Solid understanding of on-premises HCI infrastructure (Eg: Vmware, Nutanix, Azure Local) Demonstrated ability to design and implement automated testing frameworks to validate network configurations, service availability, and end-to-end system behaviors. Demonstrated ability to develop, document, and execute comprehensive DR strategies and playbooks aligned with organizational RTO/RPO objectives. AMBA is an equal opportunity employer committed to providing a workplace free from harassment and discrimination. We celebrate the unique differences of our employees because that is what drives curiosity, innovation, and the success of our business. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate. We value diversity and the skills, knowledge, and experience that difference brings to our culture, attracting top talent with shared values and forming the foundation for a great place to work.

Responsibilities

As a Senior Site Reliability Engineer, you will design, implement, and manage scalable systems while collaborating with cross-functional teams to enhance system reliability. Your role will involve developing automation tools, monitoring system performance, and responding to incidents.