Chief AWS Site Reliability Engineer (SRE)

at  Epam Systems

Desde casa, Río Negro, Argentina -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate29 Jan, 2025Not Specified30 Oct, 20242 year(s) or aboveCommunication Skills,Python,Software DevelopmentNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

EPAM Systems is looking for a Chief AWS SRE Engineer who fully understands and practices SRE activities and philosophy to join the global engineering team that ensures fleet services reliability and availability under the SRE model.
If you’re passionate about innovation, we invite you to apply and become part of our team!
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

REQUIREMENTS

  • 7+ years of cloud engineering experience, with a good track record of highly scalable, distributed systems projects in the past 5 years
  • Previous experience working as an SRE engaged with active development teams is a must, and the candidate should have a good understanding of SRE methodologies and philosophies
  • AWS cloud expertise
  • Ideally, has experience running multi-region workloads and has in-depth knowledge of most of the commonly used AWS services
  • Observability experience with distributed services, for example, experience of distributed tracing and similar concepts
  • Independent and self-directed people to work alongside client engineering teams under minimal supervision
  • Strong programming and automation experience: Python, Golang
  • Understanding of the software development lifecycle
  • Fluent English communication skills at a B2+ level

Responsibilities:

  • Collaborate with service teams to improve the reliability and efficiency of workloads and services using SRE practices
  • Develop and improve CI/CD processes to enhance release cadence and success
  • Build, consume toil backlog, automating toilsome tasks
  • Document knowledge and processes
  • Practice and promote sustainable incident response and blameless postmortems
  • Write code that improves scalability, performance, maintainability, and security
  • Implement distributed monitoring practices
  • Refine monitoring processes, configurations, and thresholds
  • Contribute towards the identification and implementation of service level indicators and objectives for workloads and services


REQUIREMENT SUMMARY

Min:2.0Max:7.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Desde casa, Argentina