Site Reliability Engineer (SRE)

at  WorkNest Technologies LTD

zdalnie, województwo śląskie, Poland -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate05 Aug, 2024Not Specified08 May, 2024N/AItil,Distributed Systems,Database Systems,Kibana,Windows Administration,Azure,ElasticsearchNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

REQUIREMENTS

  • Experience with public cloud infrastructure (e.g., AWS, Azure) and related technologies (e.g., Docker, Kubernetes, Cloud Formation);
  • Good understanding of storage and database systems, caching and queueing, networking;
  • Experience of leading technical recoveries
  • Working knowledge of Service Management practices (ITIL).
  • Experience designing, analyzing, and troubleshooting distributed systems;
  • Ability to debug, optimize code, and automate routine operational tasks;
  • Solid foundation in Linux or Windows administration and troubleshooting;
  • Monitoring / observability technologies like Prometheus, Grafana, Kibana, Elasticsearch are a plus;
  • Understanding of Service level agreements and objectives;
  • Excellent command of the English language, both written and spoken;
  • Solid understanding of programming principles and good command of at least one programming language relevant for infrastructure work;

Responsibilities:

Design, develop and implement systems software that improves the stability, scalability, availability and robustness of Odido’s products and services

- - now and for years to come;
- Develop patterns for automation, instrumentation etc., that can be reused across teams and products;
- Take ownership of several services and products;
- Automate instead of fixing operational issues manually;
- Develop and implement strategies for effective and proactive monitoring and observability of our systems;
- Provide senior technical leadership on Major Incident calls. Take technical ownership of service outage recoveries. Drive internal and partner resources to rapidly restore service implementing best practice technical fixes and workarounds. Utilize technical expertise to shape and implement recovery plans;
- Manage cross functional technical resources following Major Incidents to ensure root cause is fully understood and documented, and that robust service protection measures are in place. Provide technical expertise at Incident Wash-ups ensuring that all appropriate actions are in place to prevent repeat Incidents, and to improve recovery times.
- Triage and fix system issues in a complicated distributed landscape;
- Participate in an on-call rotation, including weekend or after hours coverage;
- Oversee and continuously improve incident-response processes at Odido;
- Advocate engineering best practices across the company, mentor more junior engineers on automation and operational best practices;
- Contribute to Odido’s growth through interviewing and onboarding;


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

zdalnie, Poland