Site Reliability Manager; Observability Team at Okta
Toronto, ON, Canada -
Full Time


Start Date

Immediate

Expiry Date

21 Sep, 25

Salary

147000.0

Posted On

21 Jun, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Automation, Perspectives, Kubernetes, Splunk, Aws, Devops, Authentication, Microservices, Reliability Engineering, App, Communication Skills

Industry

Information Technology/IT

Description

GET TO KNOW OKTA

Okta is The World’s Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.
At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences.
Join our team! We’re building a world where Identity belongs to you.

REQUIRED QUALIFICATIONS

  • 5+ years of experience in site reliability engineering, DevOps, or infrastructure roles
  • 3+ years of experience in a leadership or management role.
  • Hands-on experience with modern observability tools and standards (e.g. OpenTelemetry, Prometheus, Datadog, Splunk, ELK, etc).
  • Strong understanding of alerting strategies, telemetry pipelines, and distributed tracing.
  • Familiarity with cloud-native architectures (AWS, Kubernetes, Microservices)
  • Excellent communication skills

PREFERRED QUALIFICATIONS

  • Experience with high security computing environments
Responsibilities

ABOUT THE ROLE

We are seeking an experienced and motivated SRE Manager (Observability) to lead our observability team. This role is pivotal in shaping the strategy and execution of our monitoring, alerting, and observability stack - ensuring that we deliver highly available, performant, and resilient systems.
The ideal candidate is passionate about SRE principles, has a deep understanding of modern observability tools and practices, and is experienced in leading an engineering team through change and adaptation.

KEY RESPONSIBILITIES

Leadership & Strategy

  • Lead a global team of SREs focused on monitoring, alerting, and telemetry.
  • Define and drive the observability roadmap in alignment with business goals and best practices.
  • Collaborate with engineering, infrastructure, product, and customer teams to ensure visibility into system health, performance, and user experience.

Observability & Monitoring

  • Oversee the design, implementation, and maintenance of observability platform (e.g. Splunk, Grafana, Datadog, pganalyze, ThousandEyes, etc).
  • Drive improvements in alerting strategies to reduce noise and ensure actionable signals.
  • Continuously improve service offerings and drive self-service capabilities for our internal customers.

Incident Management

  • Support incident detection and response
  • Participate in post-incident reviews and drive long-term improvements
  • Guide the adoption of chaos engineering, synthetic monitoring, and proactive reliability practices.

People Management

  • Hire, mentor, and develop a high-performing team of engineers.
  • Promote a culture of ownership, collaboration, and operational excellence
  • Set clear goals and performance expectations

"THIS ROLE REQUIRES IN-PERSON ONBOARDING AND TRAVEL TO OUR TORONTO OFFICE DURING THE FIRST WEEK OF EMPLOYMENT."#LI-LSS1

Below is the annual salary range for candidates located in Canada. Your actual salary will depend on factors such as your skills, qualifications, and experience. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental, and vision insurance, RRSP with a match, healthcare spending, telemedicine, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program, please visit: https://rewards.okta.com/can.
The annual base salary range for this position for candidates located in Canada is between:$147,000—$221,000 CAD

Loading...