Lead Site Reliability Engineer
at Dexory
Wallingford, England, United Kingdom -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 28 Dec, 2024 | Not Specified | 29 Sep, 2024 | N/A | Good communication skills | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
BENEFITS
Joining our team and company isn’t just about expertise: it’s about an attitude that embraces uncertainty, a desire to solve challenging problems, and an opportunity to contribute to a technology platform that genuinely is state-of-the-art. As a company, we’re still in the scale-up phase of our life, so you’ll have a significant role in shaping the future of our products, culture, and engineering team. We offer a fun, flexible, and fast-paced environment that’s a great match for people looking for something out of the norm.
Responsibilities:
WHAT DOES THIS ROLE INVOLVE?
As the SRE (Site Reliability Engineering) Lead at Dexory, you will be at the helm of efforts to support the safety and reliability of our overall platform. This position involves providing SRE support for a globally-distributed, hardware-oriented product that integrates autonomous robot systems and data insights. Your role will be pivotal in developing and maintaining company-wide monitoring, alerting, and management systems. You will work across various teams to implement robust incident management strategies and support engineering teams in collecting and publishing critical metrics and alerts. Additionally, you will prepare comprehensive documentation and runbooks to handle changes and incidents efficiently.
YOUR KEY RESPONSIBILITIES WILL INCLUDE:
- Monitoring and maintaining our systems for metrics collection, alerting, and incident management.
- Working across teams to ensure a robust incident management strategy is in place.
- Preparing documentation and runbooks for handling changes and incidents.
- Providing support for all engineering teams to collect and publish useful metrics and alerts.
- Creating an infrastructure to report on key operational and uptime metrics and integrating these into the company’s OKR process.
- Preparing and maintaining a robust security posture and working with internal and external stakeholders to explain and validate this.
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Other
Software Engineering
Graduate
Proficient
1
Wallingford, United Kingdom