Site Reliability Engineer, India at Jobgether

, , India -

Full Time

Start Date

Immediate

Expiry Date

22 Feb, 26

Salary

0.0

Posted On

24 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Site Reliability Engineering, AWS, Python, Java, Rust, Observability Tools, Linux, Networking, CI/CD, Automation, CloudFormation, DynamoDB, Lambda, SQS, SNS, Security Principles

Industry

Internet Marketplace Platforms

Description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer, India in India. In this role, you will contribute to the stability, scalability, and resilience of a large cloud-native SaaS platform used by major global players in the media and broadcast sector. You will collaborate with high-performing engineering teams to enhance system reliability, improve observability, and automate workflows across a modern serverless environment. Working with cutting-edge AWS technologies, you will troubleshoot complex issues, optimize performance, and proactively strengthen platform health. The position offers the opportunity to innovate, experiment with new tools, and influence best practices across a rapidly evolving technical ecosystem. You will thrive in an environment that values creativity, ownership, and continuous learning. \n Accountabilities Strengthen the reliability, performance, and scalability of a multi-tenant SaaS platform hosted in AWS with a serverless-first architecture. Collaborate closely with engineering teams to diagnose incidents, conduct root-cause analysis, and implement sustainable long-term solutions. Enhance observability by leveraging monitoring, logging, and tracing tools to identify performance bottlenecks and prevent failures. Automate repetitive tasks and operational processes through tools, scripts, and well-designed software components. Contribute to defining, measuring, and improving SLOs and SLIs to drive operational excellence. Support CI/CD practices to ensure smooth, high-velocity releases in a distributed engineering environment. Participate in system improvements, platform modernization initiatives, and ongoing reliability-focused engineering efforts. Requirements Minimum 3 years of experience managing highly available, mission-critical production systems with a strong track record in reliability and uptime. Proficiency in at least one programming language such as Python, Java, or Rust, with experience building automation tools or software libraries. At least 3 years working with observability tools such as Datadog, CloudWatch, Honeycomb, Splunk, or New Relic, using metrics and logs to drive decisions. Strong analytical and debugging abilities, with a deep understanding of system flows, architecture, and potential failure modes. Hands-on experience translating SLOs and SLIs into platform improvements. Minimum 3 years of practical experience with AWS services including CloudFormation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, and Boto3. Solid grounding in Linux systems, networking fundamentals, and security principles. Familiarity with CI/CD systems such as Jenkins or AWS CodePipeline. Nice-to-have skills Experience architecting and deploying serverless cloud applications. Knowledge of IaC tools such as Terraform or CloudFormation. Previous participation in production on-call rotations and incident management processes. Expertise optimizing AWS services like Lambda, DynamoDB, API Gateway, SQS, EventBridge, and EC2. Experience supporting systems with frequent deployment cycles in fast-paced environments. Familiarity with security compliance frameworks such as OWASP, ISO, CSA, or PCI. Background in threat modeling, penetration testing, or security auditing. Knowledge of advanced deployment patterns (canary, blue/green, A/B testing, red/line). Hands-on experience with chaos engineering practices. Proven ability to champion reliability culture and operational excellence. Experience: 4 to 6+ years Education: Degree in Computer Science or Information Technology Work mode: Remote/Hybrid Office hours: 1 pm to 9 pm IST Benefits Flexible working hours supporting work–life balance. Opportunity to innovate and experiment with new technologies and tools. Collaborative, global, and low-bureaucracy engineering environment. International exposure working with modern cloud-native media technologies. Professional development opportunities including mentoring and educational support. Competitive compensation and comprehensive benefits package. \n Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching. When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly. 🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements. 📊 It compares your profile to the job’s core requirements and past success factors to determine your match score. 🎯 Based on this analysis, we automatically shortlist the three candidates with the highest match to the role. 🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed. The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role. Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team. Thank you for your interest! #LI-CL1

Responsibilities

You will contribute to the stability, scalability, and resilience of a large cloud-native SaaS platform. Collaborate with engineering teams to enhance system reliability, improve observability, and automate workflows.