Technical Duty Officer / Sr. Site Reliability Engineer at Xero
San Mateo, California, USA -
Full Time


Start Date

Immediate

Expiry Date

13 Aug, 25

Salary

230000.0

Posted On

14 May, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Good communication skills

Industry

Information Technology/IT

Description

HOW YOU’LL MAKE IMPACT



    • As Xero grows there is a continued need for a keen focus on reliability to ensure customers receive service that exceeds their expectations. Xero’s Incident and Problem Management team are a part of the Site Reliability Engineering (SRE) organization and are responsible for the build, delivery and ongoing maintenance of robust process and tooling around Incident management. The team is responsible for driving enduring reliability at Xero through robust, consistent and fast response to high severity incidents. They are responsible for building a world class process and ensuring that process matures as the demands of the business grows.

    • This position requires an experienced SRE professional with a strong technical background, deep experience in SRE, a passion for building and delivering robust processes and extensive experience of leading technical response to high severity cloud issues. As a seasoned and relentless professional, they will drive best practice across the business and contribute to the ongoing transformation of the Xero SRE culture. As an expert communicator, they will lead technical discussions to identify and track actions associated with and identified during incident situations.
    Responsibilities

    OUR PURPOSE

    At Xero, we’re here to help you supercharge your business. We do this by automating routine tasks, surfacing actionable insights and connecting businesses with the right data, advisors and apps. When that happens, we’re not only making life better for small business, we’ll be building a stronger economy that can change the world.
    At Xero, we’re here to make running a business beautiful. By making small businesses more efficient every day, connecting them with big business technology and empowering a community behind them, their potential is limitless. When that happens, we’re not only helping small businesses, we’ll be building a stronger economy that can change the world.

    WHAT YOU’LL DO:



      • Owns the incident management process and ensures it drives enduring reliability across all products and services within Xero.

      • Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
      • Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
      • A customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
      • Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
      • Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
      • Provide ongoing training across the business to ensure the process is well understood and adhered to. This includes training appropriate engineering resources who will own Incident commander actions for lower priority issues.
      • Deep dives into causes of Incidents and examines, on a proactive basis, the potential causes of future incidents and works with engineering teams to remove the risk of that failure scenarioBuild playbooks and automated response to Business continuity and DR situations to ensure response is quick and effective.
      Loading...