Staff Site Reliability Engineer at Addepar

New York, New York, USA -

Full Time

Start Date

Immediate

Expiry Date

28 Oct, 25

Salary

225000.0

Posted On

29 Jul, 25

Experience

7 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Good communication skills

Industry

Information Technology/IT

Description

WHO WE ARE

Addepar is a global technology and data company that helps investment professionals provide the most informed, precise guidance for their clients. Hundreds of thousands of users have entrusted Addepar to empower smarter investment decisions and better advice over the last decade. With client presence in more than 50 countries, Addepar’s platform aggregates portfolio, market and client data for over $7 trillion in assets. Addepar’s open platform integrates with more than 100 software, data and services partners to deliver a complete solution for a wide range of firms and use cases. Addepar embraces a global flexible workforce model with offices in New York City, Salt Lake City, Chicago, London, Edinburgh, Pune, and Dubai.

Responsibilities

THE ROLE

We are looking to add a highly experienced and impactful colleague to the organization to drive the transformation of Addepar’s Production Engineering and SRE team. This role focuses on evolving our platform towards enabling high-level declarative infrastructure orchestration and its operations. This platform closely integrates our Compute, Network, and Storage control planes, allowing us to develop highly efficient and fast-to-iterate-on services tailored to various product areas within the company, abstracting our developers from the nuances of underlying infrastructure.
The ideal candidate will play a staff, leading role in implementing, maintaining, and strategically evolving Addepar’s Production Infrastructure.. You will bring a robust combination of leading innovative solutions across functional teams and extensive hands-on development experience in AWS/cloud, Linux/Unix, networking, advanced scripting abilities, containerization, Kubernetes, Terraform, Information Security, deep debugging, and comprehensive monitoring/observability skills. This includes designing, deploying, monitoring, automating, and optimizing all operational aspects of Addepar’s platform with a focus on reliability, scalability, and efficiency.
Addepar takes a market-based approach to pay. A successful candidate’s starting pay will be determined based on the role, job-related skills, experience, qualifications, work location, and market conditions. The range displayed on each job posting reflects the minimum and maximum target base salary for roles in Colorado, California, and New York.
The current range for this role is $144,000 - $225,000 (base salary) + bonus + equity + benefits.
Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Additionally, these ranges reflect the base salary only, and do not include bonus, equity, or benefits.

WHAT YOU’LL DO

Lead the design, implementation, and operationalization of container infrastructure using Kubernetes (k8s), ensuring high availability, performance, and security.
Architect, build, and maintain advanced, automated CI/CD pipelines using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)..
Drive the adoption and evangelism of Infrastructure as Code (IaC) principles using Terraform, focusing on scaling the Addepar Platform across regions with a focus on cost optimization and operational efficiency.
Develop deep application-level knowledge to proactively inform and influence infrastructure requirements and constraints for Developers, QA, and Management, including implementing sophisticated dashboards for Cost and Inventory management, performance analysis, and capacity planning.
Perform advanced monitoring and troubleshooting of our infrastructure and application stack using a wide array of logging/monitoring tools, driving root cause analysis and implementing preventative measures.
Initiate and lead collaborations with cross-functional teams to identify and resolve complex Application or infrastructure issues, serving as a technical subject matter expert.
Serve as a primary on-call responder for critical incidents, demonstrating strong problem-solving skills under pressure and contributing to post-incident reviews to improve system resilience.
Highlight team-specific activities, followed by how this role will interact with other teams and groups