DevOps Track Sr.Engineer at HEXAWARE
, , India -
Full Time


Start Date

Immediate

Expiry Date

07 Sep, 26

Salary

0.0

Posted On

09 Jun, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

EKS, AWS, DataDog, Grafana, Prometheus, Incident Management, Problem Management, Microservices Architecture, Terraform, Ansible, Jenkins, Python, Shell Scripting, Kafka, Redis, NoSQL

Industry

IT Services and IT Consulting

Description
Job Role - Lead SRE (Site Reliability Engineer) Job Description: Technical / Behavioral - You have extensive knowledge on Production on-call support for Cloud Infrastructure which is running in EKS platform You have extensive experiences in Change, Incident, Problem Management & on-call support You have extensive knowledge on observability tools (Preferable - DataDog), Grafana & Prometheus You have experience in monitoring various aspects like Log, Metrics, APM, Event, Infrastructure & including of Dashboard creation You have experience in multiple AWS services like EC2, EBS, S3, NLB, IAM, Lambda, Cloud-Watch, Cloud Trail & VPC. Rehydration or Patching knowledges in cloud infrastructure You have experience in Microservices Architecture like API Gateway or APIGEE You should triage, execute root cause analysis and be decisive under pressure You have strong communication skills with the ability to put forth concepts and ideas clearly and concisely. You are capable to work with a variety of individuals and groups, both in-person and virtually, in a constructive and collaborative manner to build and maintain effective relationships The Skills that are Good To Have for this role You can do automation using Shell or Python You have exposure to CFM/IaaC like Ansible & Terraform You know on CI/CD tool like Jenkins You know Kafka/MQ administration skill set You are familiar with Redis would be a plus You have exposure to database administration (especially on NoSQL like Mongo or Maria-DB or CrDB) You have exposure to document creation for knowledge & process will be an added advantage"
Responsibilities
Lead the Site Reliability Engineering efforts focusing on production on-call support for cloud infrastructure running on EKS. Responsible for monitoring, observability, and executing root cause analysis to maintain system stability.
Loading...