Senior Site Reliability Engineering Manager
at General Motors

Remote, Scotland, United Kingdom -

Start Date	Expiry Date	Salary	Posted On	Experience	Skills	Telecommute	Sponsor Visa
Immediate	06 Feb, 2025	Not Specified	06 Nov, 2024	N/A	Data Structures,Algorithms,Distributed Systems,Mitigation,Communication Skills,Collaboration,Kubernetes,Databases,Go,Java,Code,Root,Operating Systems,Azure	No	No

Add to Wishlist Apply All Jobs

Required Visa Status:

Citizen	GC
US Citizen	Student Visa
H1B	CPT
OPT	H4 Spouse of H1B
GC Green Card

Employment Type:

Full Time	Part Time
Permanent	Independent - 1099
Contract – W2	C2H Independent
C2H W2	Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

JOB DESCRIPTION

The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges.
As part of Site Reliability Engineering (SRE) at General motors, you’ll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed systems. We leverage engineering principles to manage operations effectively and build solutions that enable us to grow without sacrificing performance or quality. Our SREs work closely with software development teams, acting as specialists in reliability and production engineering, with a focus on automation, observability, and shared responsibility.
We are looking for individuals who are passionate about maintaining the health of our infrastructure while optimising for reliability and cost-efficiency. This role involves a blend of software engineering and systems engineering skills to keep our services resilient, robust, and scalable.
As an SRE Engineering Manager, you will be expected to not only lead your team in setting priorities and ensuring alignment with organizational goals but also to be deeply technical. We expect our managers to be able to contribute directly through coding, reviewing code, and mentoring engineers. While it’s unlikely that you’ll spend the majority of your time coding, having the capability and willingness to dive into technical details, solve problems hands-on, and support your team’s technical decisions is crucial. You’ll be a mentor, guide, and a partner, helping engineers grow, and ensuring the reliability and efficiency of the systems they are working on. We believe in setting a high bar for engineering managers who can lead by example in both technical expertise and people leadership.

Key Responsibilities :

Automation and Reliability Improvements :Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
Observability and Monitoring : Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents.
Incident Response : Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
Collaboration with Development Teams : Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a “You build it, you run it” culture.
Service Level Management : Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively.
Engineering for Reliability : Strong understanding of common application reliability patterns, with hands-on experience implementing them.
Failure Analysis and Post-Incident Reviews : Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement.
Cost Efficiency : Evaluate system performance and advocate for optimisations that reduce infrastructure costs while maintaining service reliability.

SKILLS AND QUALIFICATIONS:

Programming Skills : Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
Systems Knowledge : Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
Strong Understanding of System Fundamentals: Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures. Ability to optimize or troubleshoot code by understanding its execution and the impact on system resources.
Incident Management : Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures.
Communication and Collaboration : Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders. Commitment to collaborative problem-solving and shared ownership of services.
Automation Focus: Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.

PREFERRED EXPERIENCE:

Experience with cloud platforms (AWS, GCP, Azure).
Familiarity with container orchestration systems like Kubernetes.
A track record of managing or developing distributed systems.
Prior experience with Java in production.

Responsibilities:

Automation and Reliability Improvements :Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
Observability and Monitoring : Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents.
Incident Response : Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
Collaboration with Development Teams : Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a “You build it, you run it” culture.
Service Level Management : Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively.
Engineering for Reliability : Strong understanding of common application reliability patterns, with hands-on experience implementing them.
Failure Analysis and Post-Incident Reviews : Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement.
Cost Efficiency : Evaluate system performance and advocate for optimisations that reduce infrastructure costs while maintaining service reliability

REQUIREMENT SUMMARY

Experience:Min:N/AMax:5.0 year(s)

Industry:Information Technology/IT

Functional area of job:IT Software - Other

Domain:Other

Qualifications:Graduate

English Proficiency:Proficient

Number of posts:1

Address of job:Remote, United Kingdom

Senior Site Reliability Engineering Manager
at General Motors

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

INDIA

AUSTRALIA

UNITED ARAB EMIRATES

Senior Site Reliability Engineering Managerat General Motors

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

Senior Site Reliability Engineering Manager
at General Motors