Senior Site Reliability Engineer at Latitude.sh

, , -

Full Time

Start Date

Immediate

Expiry Date

31 Jan, 26

Salary

0.0

Posted On

03 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Linux/Unix Systems, Kubernetes, Container Orchestration, Infrastructure Automation, Terraform, Ansible, Observability Stacks, Prometheus, Grafana, Loki, ELK, Scripting, Programming Languages, Bash, Python, Go, Ruby

Industry

IT Services and IT Consulting

Description

Summary At Latitude.sh, the Reliability team is responsible for the health and resilience of the infrastructure that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you’ll focus on building reliable, observable, and self-healing systems at scale. SREs at Latitude.sh work at the intersection of software engineering and infrastructure. You’ll design and implement tools that automate operations, improve incident response, and enhance system observability—ensuring our platform is always ready for the workloads of our customers. This might be a good opportunity if you’re passionate about reliability, automation, and creating cloud-like experiences for bare metal infrastructure. Key Responsabilities Continuously improve Latitude.sh’s platform reliability and performance Design, build, and maintain tools to automate operational tasks and incident response Implement and improve observability solutions, including monitoring, alerting, and tracing Collaborate with engineering and platform teams to design scalable and resilient systems Participate in on-call rotations and lead post-incident reviews with a focus on learning Develop and document processes and runbooks that ensure operational excellence Contribute to SLOs/SLIs definition and reliability metrics adoption across teams Skills and Qualifications Strong verbal and written English communication skills Advanced knowledge of Linux/Unix systems in production environments Experience with Kubernetes and container orchestration Proficiency with infrastructure automation tools (e.g., Terraform, Ansible) Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK) Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby Working knowledge of Git and CI/CD pipelines Solid understanding of incident management and root cause analysis processes Knowledge of cloud-native reliability and security best practices What do we offer? Contractor (PJ) Paid Time Off Competitive Compensation Wellhub (former Gympass) Annual Bonus based on company and team performance Flexible work hours Opportunities for professional growth and development Why Latitude.sh? We're a lean, agile team of passionate professionals who believe in the power of innovation and creative problem-solving. As part of our team, you won't be lost in the crowd – you'll be an essential contributor, making a real impact from day one. Our values at Latitude.sh guide us in all our work and partnerships. We're proud to be an inclusive company, and we welcome all applicants for our open positions, regardless of their background, religion, sexual orientation, gender identity, age, nationality, or disability. If these values speak to you, we'd love for you to become a part of our team.

Responsibilities

The Senior Site Reliability Engineer will focus on building reliable, observable, and self-healing systems at scale. Responsibilities include improving platform reliability, designing automation tools, and enhancing system observability.