Senior Site Reliability Engineer at StabilityAi
, , United States -
Full Time


Start Date

Immediate

Expiry Date

23 Dec, 25

Salary

0.0

Posted On

24 Sep, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Site Reliability Engineering, Cloud Infrastructure, AWS, Infrastructure as Code, Terraform, Monitoring, Logging, Alerting, Incident Management, Root Cause Analysis, Kubernetes, Container Scaling, Software Development, Automation Scripting, Grafana, Cloud Security

Industry

Research Services

Description
< Remote - United States > Job Description: Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering, IT, security, and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape. Responsibilities: Developing and enforcing SRE best practices and standards across the organization. Architecting and managing scalable systems in AWS and other cloud environments, focusing on high availability and resilience. Implementing and maintaining infrastructure as code using Terraform. Setting up and refining monitoring, logging, and alerting systems. Driving incident management and root cause analysis to improve system reliability. Championing SRE principles and mentoring junior team members. Qualifications: Collaborating with development teams to enhance CI/CD pipelines. Experience scaling resource intensive systems, be it storage, networking, or compute. Knowledge and experience with Kubernetes or other container scaling solutions Background in software development or automation scripting. Knowledge and experience with Grafana, ELK stack, or similar tools. Cloud security experience. Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
The Senior Site Reliability Engineer will develop and enforce SRE best practices and standards while architecting and managing scalable systems in cloud environments. They will also drive incident management and root cause analysis to enhance system reliability.
Loading...