Start Date
Immediate
Expiry Date
09 Jul, 24
Salary
0.0
Posted On
10 Apr, 24
Experience
5 year(s) or above
Remote Job
No
Telecommute
No
Sponsor Visa
No
Skills
Management Skills, Information Technology, Computer Science, Aws, Reliability Engineering
Industry
Information Technology/IT
Overview:
As the Manager of Site Reliability, you will play a crucial role in ensuring the stability, performance, and security of our SaaS applications. You will lead a team of skilled professionals responsible for maintaining and enhancing the reliability of our systems through robust observability, monitoring, threat detection, and mitigation strategies. The ideal candidate will bring extensive experience in managing complex SaaS environments and a deep understanding of best practices in site reliability engineering.
Job Responsibilities
Team Leadership:
-
- Lead and mentor a team of site reliability engineers to ensure a high level of expertise and efficiency.
- Drive initiatives to enhance the technical skills and efficiency of the team.
- Foster a culture of collaboration, innovation, and continuous improvement.
Hands-On Technical Leadership:
-
- Actively contribute to the design, implementation, and maintenance of observability, monitoring, and security systems.
- Lead by example, working hands-on to troubleshoot issues and optimize system performance.
Observability and Monitoring:
-
- Develop and implement comprehensive observability and monitoring strategies to proactively identify and address potential issues before they impact system performance.
- Collaborate with development leadership to improve performance and scalability of services developed by providing relevant and actionable metrics in early stages of development.
- Utilize industry-leading tools and practices to maintain visibility into the health and performance of our systems.
Threat Detection and Mitigation:
-
- Design and implement robust security measures to detect and mitigate potential threats to our SaaS infrastructure.
- Stay informed about the latest cybersecurity threats and trends, and implement proactive measures to safeguard our systems.
Incident Response:
-
- Actively participate in incident response activities, leading the team to quickly resolve and learn from incidents.
- Develop and maintain incident response plans to ensure a rapid and effective response to any service interruptions or security incidents.
- Conduct post-incident analyses to identify root causes and implement preventive measures.
Infrastructure Optimization:
-
- Collaborate with cross-functional teams to optimize the performance and scalability of our infrastructure.
- Implement automation and efficiency improvements to enhance overall system reliability.
Job Requirements
To learn more about our organization and the exciting work we do, visit
https://www.lexialearning.com/
An Equal Opportunity Employer
We are dedicated to fostering a culture that celebrates unique backgrounds, ideas, and experiences. All qualified applicants will receive consideration for employment without discrimination on the basis of race, color, age, religion, sex, gender, gender identity/expression, sexual orientation, national origin, protected veteran status, or disability
Please refer the Job description for details