Senior Manager, Site Reliability

at  Cambium Learning Group

Remote, Oregon, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate09 Jul, 2024Not Specified10 Apr, 20245 year(s) or aboveManagement Skills,Information Technology,Computer Science,Aws,Reliability EngineeringNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Overview:
As the Manager of Site Reliability, you will play a crucial role in ensuring the stability, performance, and security of our SaaS applications. You will lead a team of skilled professionals responsible for maintaining and enhancing the reliability of our systems through robust observability, monitoring, threat detection, and mitigation strategies. The ideal candidate will bring extensive experience in managing complex SaaS environments and a deep understanding of best practices in site reliability engineering.
Job Responsibilities
Team Leadership:
-
- Lead and mentor a team of site reliability engineers to ensure a high level of expertise and efficiency.
- Drive initiatives to enhance the technical skills and efficiency of the team.
- Foster a culture of collaboration, innovation, and continuous improvement.
Hands-On Technical Leadership:
-
- Actively contribute to the design, implementation, and maintenance of observability, monitoring, and security systems.
- Lead by example, working hands-on to troubleshoot issues and optimize system performance.
Observability and Monitoring:
-
- Develop and implement comprehensive observability and monitoring strategies to proactively identify and address potential issues before they impact system performance.
- Collaborate with development leadership to improve performance and scalability of services developed by providing relevant and actionable metrics in early stages of development.
- Utilize industry-leading tools and practices to maintain visibility into the health and performance of our systems.
Threat Detection and Mitigation:
-
- Design and implement robust security measures to detect and mitigate potential threats to our SaaS infrastructure.
- Stay informed about the latest cybersecurity threats and trends, and implement proactive measures to safeguard our systems.
Incident Response:
-
- Actively participate in incident response activities, leading the team to quickly resolve and learn from incidents.
- Develop and maintain incident response plans to ensure a rapid and effective response to any service interruptions or security incidents.
- Conduct post-incident analyses to identify root causes and implement preventive measures.
Infrastructure Optimization:
-
- Collaborate with cross-functional teams to optimize the performance and scalability of our infrastructure.
- Implement automation and efficiency improvements to enhance overall system reliability.

Job Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Proven hands-on experience (5+ years) in a site reliability engineering or similar role.
  • Leadership experience (3+ years) with a focus on technical mentorship and skill development.
  • In-depth knowledge of observability tools, monitoring systems, and security best practices.
  • Proven leadership and team management skills.
  • Excellent problem-solving and communication abilities.
  • In-depth experience with AWS.

To learn more about our organization and the exciting work we do, visit
https://www.lexialearning.com/
An Equal Opportunity Employer
We are dedicated to fostering a culture that celebrates unique backgrounds, ideas, and experiences. All qualified applicants will receive consideration for employment without discrimination on the basis of race, color, age, religion, sex, gender, gender identity/expression, sexual orientation, national origin, protected veteran status, or disability

Responsibilities:

Please refer the Job description for details


REQUIREMENT SUMMARY

Min:5.0Max:10.0 year(s)

Information Technology/IT

IT Software - Network Administration / Security

Other

Graduate

Computer science information technology or a related field

Proficient

1

Remote, USA