Site Reliability / DevOps Engineer at eClerx Career Site

Raleigh, North Carolina, United States -

Full Time

Start Date

Immediate

Expiry Date

21 Sep, 26

Salary

137500.0

Posted On

23 Jun, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Observability, Azure, AWS, Python, Bash, Powershell, Docker, Kubernetes, CI/CD, Terraform, Azure Bicep, CloudFormation, Gremlin, Chaos Mesh, Kafka, Linux

Industry

IT Services and IT Consulting

Description

Site Reliability / DevOps Engineer Location: Raleigh, North Carolina, US Type: Full-time Department: BFSI Job Summary eClerx is seeking a motivated SRE/DevOps Engineer with strong observability experience to join our growing Platform Engineering team. This team is responsible for managing cloud infrastructure, advancing DevOps practices, improving platform reliability, and supporting highly available enterprise applications. The ideal candidate will have a deep understanding of cloud-native architectures, distributed systems, CI/CD automation, observability frameworks, and site reliability engineering principles. This individual will play a key role in improving platform resilience, operational efficiency, and system performance across a modern cloud-based technology ecosystem. Responsibilities * Design, implement, and enhance system observability and monitoring solutions. * Monitor system performance, create incident response plans, and implement observability practices to gain deeper insights into system behavior. * Define, implement, and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs). * Improve platform reliability, scalability, and resiliency. * Conduct post-incident reviews and implement corrective actions to prevent recurring issues. * Partner with engineering teams to implement observability tooling and leverage telemetry data to troubleshoot and resolve incidents. * Utilize observability and event management capabilities to improve key operational metrics, including Mean Time to Detect (MTTD) and Mean Time to Restore (MTTR). * Continuously optimize infrastructure, architecture, automation, CI/CD processes, and operational workflows. * Collaborate closely with software engineers to ensure applications are designed and deployed following DevOps and reliability best practices. * Participate in a rotating on-call schedule, including support for production releases and critical incidents outside normal business hours when required. Eligibility Requirements * 5+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. * 5+ years of work experience with Public Cloud (Azure (preferred)or AWS) * 3+ years of hands-on experience with observability platforms such as Datadog, Elasticsearch, Grafana, or similar solutions. * 5+ years of experience with scripting languages like Python, Bash, Powershell, etc. * 3+ years of experience with containerization and orchestration technologies, including Docker and Kubernetes. * 2+ years of experience developing and managing CI/CD pipelines using tools such as Azure DevOps, GitLab CI/CD, GitHub Actions, Jenkins, or similar. * 2+ years of experience with Infrastructure-as-Code (IaC) tools such as Terraform, Azure Bicep, AWS CloudFormation, or equivalent technologies. * 1+ years of experience using site reliability and resilience testing tools such as Gremlin, Chaos Mesh, or similar platforms. * Proven experience leveraging observability best practices, end-user monitoring, application performance monitoring, and infrastructure monitoring solutions. * Experience with event streaming and messaging platforms such as Kafka or Azure Event Hubs. * Strong understanding of Linux operating systems and administration. * Preferred Qualifications * Kubernetes certification * Cloud platform certifications (Azure, AWS, or GCP). * Experience working in Azure environments and/or Azure DevOps. * Experience implementing and managing Datadog or other modern observability platforms. * Experience supporting enterprise-scale applications within financial services, capital markets, fintech, or other highly regulated industries. In the US, the target base salary for this role is $120,000-$137,500. Compensation is based on a range of factors that include relevant experience, knowledge, skills, other job-related qualifications, and geography. We expect the majority of candidates who are offered roles at our company to fall throughout the range based on these factors How to Apply * Click "Apply Now" to submit your resume through our career site * Be sure to include any relevant experience that aligns with the role. * Qualified candidates will be contacted by a member of our recruitment team for next steps About eClerx eClerx is a leading provider of productized services, bringing together people, technology and domain expertise to amplify business results. The firm provides business process management, automation, and analytics services to a number of Fortune 2000 enterprises, including some of the world’s leading financial services, communications, retail, fashion, media & entertainment, manufacturing, travel & leisure, and technology companies. Incorporated in 2000, eClerx is traded on both the Bombay and National Stock Exchanges of India. The firm employs more than 19,000 people across Australia, Canada, France, Germany, Switzerland, Egypt. India, Italy, Netherlands, Peru, Philippines, Singapore, Thailand, the UK, and the USA. For more information, visit www.eclerx.com [http://www.eclerx.com/] You can also find us on: https://www.linkedin.com/company/eclerx/ [https://www.linkedin.com/company/eclerx/] https://www.indeed.com/cmp/Eclerx/about [https://www.indeed.com/cmp/Eclerx/about] https://www.glassdoor.com/eClerx [https://www.glassdoor.com/Overview/Working-at-eClerx-EI_IE230253.11,17.htm] eClerx is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. We are also committed to protecting and safeguarding your personal data. Please find our policy here [https://eclerx.com/privacy-policy/]

Responsibilities

Design and implement system observability and monitoring solutions to improve platform reliability and resilience. Collaborate with engineering teams to optimize CI/CD processes and manage incident response through SLOs and SLIs.