Senior Site Reliability Engineer 4 at PagerDuty

Atlanta, Georgia, USA -

Full Time

Start Date

Immediate

Expiry Date

04 Nov, 25

Salary

252000.0

Posted On

05 Aug, 25

Experience

8 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Reliability Engineering, Emerging Trends, Azure, New Relic, Capacity Planning, Infrastructure, Commission, Devops, Code, Splunk, Documentation, Architecture

Industry

Information Technology/IT

Description

PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management. Half of the Fortune 500 and nearly 70% of the Fortune 100 trust PagerDuty as essential infrastructure.
Join us. At PagerDuty, you’ll tackle complex problems, collaborate with kind and ambitious people, and help build a more equitable world—all in a flexible, award-winning workplace.
We are seeking a Senior Site Reliability Engineer 4 to join our Release Engineering team. As a senior member of our Release Engineering, you will drive our release and deployment developer experience, observability, and platform engineering initiatives to enable teams across PagerDuty engineering. In this role you will lead technical decisions, and mentor team members while building robust, scalable infrastructure solutions that enhance our developer experience and platform reliability.

BASIC QUALIFICATIONS

8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
Deep expertise in Kubernetes administration and architecture
Strong track record of leading CI/CD and platform engineering initiatives
Demonstrated experience leading technical projects and mentoring engineers
Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, Grafana)
Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)

PREFERRED QUALIFICATIONS

Experience with GitOps practices and tools like ArgoCD
Experience building and maintaining platform engineering solutions at scale
Experience implementing and managing observability solutions
Experience with cost optimization and capacity planning
Knowledge of emerging trends in platform engineering and DevOps practices
Strong technical writing skills for documentation and knowledge sharing
Experience with developer portals and internal platform products
PagerDuty is a flexible, hybrid workplace. We embrace and encourage in-person working as an integral part of our culture. Both our employees and external research tells us that co-located collaboration strengthens connections, drives innovation, and accelerates learning.
This role is expected to come into our Atlanta office 1 day per month, so you can thrive in your new role and fully embrace being a Dutonian!
The base salary range for this position is 150,000 - 252,000 USD. This role may also be eligible for bonus, commission, equity, and/or benefits.
Our base salary ranges are determined by role, level, and location. The range, which is subject to change based on primary work location, reflects the minimum and maximum base salary we expect to pay newly hired employees for the position. Within the range, we determine pay for an individual based on a number of factors including market location, job-related knowledge, skills/competencies and experience.
Your recruiter can share more about the specific offerings for this role, as well as the salary range for your primary work location during the hiring process.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Lead the design and implementation of complex platform engineering solutions
Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
Mentor junior team members and provide technical leadership in platform engineering practices
Develop and implement strategic initiatives to improve developer experience and platform reliability
Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
Lead post incident reviews and drive systematic improvements to prevent recurring issues
Collaborate with other engineering teams globally to define and implement platform standards
Champion observability and monitoring best practices across the organization
Participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules