Staff Technical Program Manager (Reliability and Quality) - Remote at PayNearMe

Santa Clara, California, USA -

Full Time

Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

220000.0

Posted On

05 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Infrastructure, System Architecture, Distributed Systems, Decision Making, Operations, Atlas, Reporting, Computer Science, Fintech

Industry

Information Technology/IT

Description

Company Description
PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. Our modern and reliable platform lowers the total cost of payments by increasing acceptance rates, driving self-service and simplifying exceptions. We future-proof our clients’ payments roadmap by including all payment types and channels through a single contract and integration.
With PayNearMe, businesses can transform the outdated systems holding them back from achieving progress.
PayNearMe has over 200 employees, raised a $45M Series D round in June 2023, and processes billions in payments annually. Headquartered in Silicon Valley, our team is distributed across the U.S. Join us in solving our clients’ biggest payment challenges.
Job Description
We’re looking for an experienced, technical Staff-level Technical Program Manager (TPM) to lead our Quality and Reliability efforts across critical systems and services. This is a high-impact, individual contributor role for someone who has done this before, who can build structure where needed, navigate ambiguity, and drive outcomes across multiple teams and domains.
You’ll be responsible for leading cross-functional programs to improve system reliability, scalability, and operational quality, from improving incident response and production readiness to redefining the ways we test and deploy software. This is not a generalist role: we’re looking for a TPM with deep technical fluency and a track record of shaping system-level quality and delivery culture at scale.

QUALIFICATIONS

8+ years of program management experience, with at least 3 years in technical, reliability, or quality-focused domains.
Strong understanding of system architecture, distributed systems, and reliability engineering principles.
Familiarity with SDLC models, CI/CD pipelines, deployment automation, observability, and incident management tooling.
Demonstrated success defining and improving SLOs, SLIs, and production readiness processes.
Proven ability to lead large-scale, cross-functional programs across Engineering, Product, Operations, and Customer Success.
Skilled at translating complex technical goals into clear, actionable, and measurable outcomes.
Experienced in using Atlassian tools (e.g., Jira, Atlas) for program tracking, reporting, and executive communication.
Adept at navigating ambiguity, building alignment, and driving decision-making without formal authority.
Comfortable balancing technical depth with business priorities to influence outcomes.
Bachelor’s degree in Computer Science, Engineering, or related technical field, or equivalent practical experience.
Bonus: Experience in regulated or high-availability industries such as fintech, healthcare, or infrastructure.
Additional Information

Responsibilities

Own the Quality & Reliability Program: Define and drive the vision for quality—across proactive practices (testing, deployment, observability), reactive processes (incident response, external communications), and cultural expectations (quality ownership, readiness).
Lead Cross-Functional Programs: Drive reliability and quality initiatives across Engineering, Product, Operations, and Customer Success.
Production Readiness: Own the Production Readiness Review (PRR) process; ensure all releases meet reliability standards before they go live.
Define and Drive SLOs: Establish and track Service Level Objectives (SLOs). Build visibility into reliability metrics and lead efforts to meet or exceed targets.
Improve Incident Management: Streamline incident response and postmortems. Drive structural improvements in tooling, communication, and ownership.
Scale Tooling & Automation: Collaborate across teams to enhance observability, alerting, testing automation, and response tooling.
Mitigate System Risk: Identify risk vectors early, build mitigation plans, and drive resolution with urgency.
Drive Alignment: Influence across Eng, Product, Ops, and GTM teams to prioritize reliability and integrate quality into every initiative.
Track Progress: Use tools like Atlas, Jira, and internal dashboards to maintain clarity on goals, risks, and outcomes.
Embed Continuous Learning: Build programs that ensure we learn from every incident, test edge cases, and continuously harden our systems.