Senior Site Reliability Engineer 4 at PagerDuty
Atlanta, Georgia, USA -
Full Time


Start Date

Immediate

Expiry Date

04 Nov, 25

Salary

252000.0

Posted On

05 Aug, 25

Experience

8 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Reliability Engineering, Emerging Trends, Azure, New Relic, Capacity Planning, Infrastructure, Commission, Devops, Code, Splunk, Documentation, Architecture

Industry

Information Technology/IT

Description

PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management. Half of the Fortune 500 and nearly 70% of the Fortune 100 trust PagerDuty as essential infrastructure.
Join us. At PagerDuty, you’ll tackle complex problems, collaborate with kind and ambitious people, and help build a more equitable world—all in a flexible, award-winning workplace.
We are seeking a Senior Site Reliability Engineer 4 to join our Release Engineering team. As a senior member of our Release Engineering, you will drive our release and deployment developer experience, observability, and platform engineering initiatives to enable teams across PagerDuty engineering. In this role you will lead technical decisions, and mentor team members while building robust, scalable infrastructure solutions that enhance our developer experience and platform reliability.

BASIC QUALIFICATIONS

  • 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Deep expertise in Kubernetes administration and architecture
  • Strong track record of leading CI/CD and platform engineering initiatives
  • Demonstrated experience leading technical projects and mentoring engineers
  • Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, Grafana)
  • Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)

PREFERRED QUALIFICATIONS

  • Experience with GitOps practices and tools like ArgoCD
  • Experience building and maintaining platform engineering solutions at scale
  • Experience implementing and managing observability solutions
  • Experience with cost optimization and capacity planning
  • Knowledge of emerging trends in platform engineering and DevOps practices
  • Strong technical writing skills for documentation and knowledge sharing
  • Experience with developer portals and internal platform products
    PagerDuty is a flexible, hybrid workplace. We embrace and encourage in-person working as an integral part of our culture. Both our employees and external research tells us that co-located collaboration strengthens connections, drives innovation, and accelerates learning.
    This role is expected to come into our Atlanta office 1 day per month, so you can thrive in your new role and fully embrace being a Dutonian!
    The base salary range for this position is 150,000 - 252,000 USD. This role may also be eligible for bonus, commission, equity, and/or benefits.
    Our base salary ranges are determined by role, level, and location. The range, which is subject to change based on primary work location, reflects the minimum and maximum base salary we expect to pay newly hired employees for the position. Within the range, we determine pay for an individual based on a number of factors including market location, job-related knowledge, skills/competencies and experience.
    Your recruiter can share more about the specific offerings for this role, as well as the salary range for your primary work location during the hiring process.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
  • Lead the design and implementation of complex platform engineering solutions
  • Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
  • Mentor junior team members and provide technical leadership in platform engineering practices
  • Develop and implement strategic initiatives to improve developer experience and platform reliability
  • Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
  • Lead post incident reviews and drive systematic improvements to prevent recurring issues
  • Collaborate with other engineering teams globally to define and implement platform standards
  • Champion observability and monitoring best practices across the organization
  • Participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules
Loading...