Senior Director, Site Reliability & Platform Engineering at Infoblox

Tacoma, WA 98467, USA -

Full Time

Start Date

Immediate

Expiry Date

28 May, 25

Salary

329670.0

Posted On

28 Feb, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Scripting, Iso, Business Analysis, Python, Azure, Aws, Interpersonal Skills

Industry

Information Technology/IT

Description

It’s an exciting time to be at Infoblox. Named a Top 25 Cyber Security Company by The Software Report and one of Inc. magazine’s Best Workplaces for 2020, Infoblox is the leader in cloud-first networking and security services. Our solutions empower organizations to take full advantage of the cloud to deliver network experiences that are inherently simple, scalable, and reliable for everyone. Infoblox customers are among the largest enterprises in the world and include 70% of the Fortune 500, and our success depends on bright, energetic, talented people who share a passion for building the next generation of networking technologies—and having fun along the way.
We are looking for a Senior Director, Site Reliability Engineering (SRE) and Platform Engineering to lead our SRE and DevOps teams globally, reporting to the Vice President of Engineering. In this role, you will foster a culture of product reliability across all of Engineering, drive and support the SRE team in conducting risk analyses, and work with Engineering leadership to ensure operational excellence of cloud-scale, high-availability systems. You will manage the SRE and DevOps teams, using your abilities to incorporate roadmap objectives from Product Management, Engineering, IT, and Product Security Engineering. This is an essential position in the Engineering organization with executive-level visibility, driving change with other senior leaders to achieve departmental and corporate goals.
You are the ideal candidate if you are a visionary who lives and breathes reliability at scale.

Responsibilities

Lead and mentor a team of reliability and platform engineers, championing a culture of reliability, scalability, and continuous improvement across all Infoblox customer products, both on-prem and SaaS
Establish a charter for best-in-class site reliability engineering, and drive Engineering teams toward achieving these best practices
Institute a set of tools and processes that ensure monitoring, observability, capacity planning, disaster recovery, and incident management systems can support 99.999 availability for critical services
Manage large-scale infrastructure and applications across multiple cloud providers using a mix of native cloud, open-source, and commercial off-the-shelf tools
Work with stakeholders, including Engineering, IT, Product Management, and Customer Support, to define and ensure customer-driven SLIs/SLOs exist for both new and existing functionality
Communicate progress by highlighting the accomplishments, risks, mitigation, and other pertinent key performance indicators that feed into Infoblox’s overarching business strategy
Facilitate continuous training programs for Engineering that reduce risk, including completion of annual reliability training for Engineering staff
Drive product reliability, operational, and efficiency metrics with automation, allowing management to understand the maturity and risk levels in various product areas