Service Reliability Engineer - Catalyst

at  IO Global

Remote, Scotland, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate24 Aug, 2024Not Specified25 May, 2024N/ADevops,Infrastructure,Rust,Aws,Computer Science,Ansible,CodeNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

SUMMARY

The Service Reliability Engineer (SRE) at Project Catalyst plays a crucial role in ensuring the reliability, availability, and performance of our production systems supporting our open-source projects. Reporting to the Senior Service Reliability Engineer, this role engages closely with development teams and key stakeholders to integrate software engineering principles with systems engineering. The responsibilities include creating and maintaining tools, automations, and infrastructure code to enhance platform efficiency and resilience. Successful candidates will contribute significantly to our mission by improving service scalability and performance while fostering a culture of collaboration and continuous improvement.

EDUCATION / EXPERIENCE

  • BS degree in Computer Science or related technical field, or equivalent practical experience.
  • Extensive experience in DevOps, SysAdmin, or a similar role, with a strong background in Infrastructure as Code (using Terraform and Ansible).
  • Prior experience with Rust and additional cloud providers (AWS preferred, GCP, or Azure) is advantageous. Cloud certifications are a plus.

SPECIALIST SKILLS

  • Deep knowledge of Infrastructure as Code (IaC) principles.
  • Practical experience in designing and implementing cloud-based solutions.
  • Familiarity with Rust as a software development tool is a plus.

Responsibilities:

  • Design, write, and deliver tools and software using Go, Python, and Bash to enhance the availability, scalability, and efficiency of our services.
  • Manage the entire lifecycle of services—from inception and design, through deployment, operation, and refinement.
  • Conduct sustainable incident response and lead blameless postmortems.
  • Participate in on-call rotations, addressing service interruptions and technical challenges promptly.
  • Collaborate with development teams to design solutions that prioritize customer experience, scalability, and performance.
  • Analyze system performance and reliability to provide enhancement recommendations.
  • Establish and maintain service-level objectives (SLOs), service-level indicators (SLIs), and error budgets.
  • Implement and advocate for Security Best Practices.
    The above list of responsibilities is not an exhaustive list of duties and you will be expected to perform different tasks as necessitated by your changing role within the organisation.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

BSc

Computer Science

Proficient

1

Remote, United Kingdom