Senior Site Reliability Engineer (SRE) – AWS at Sails Software Inc

Visakhapatnam, Andhra Pradesh, India -

Full Time

Start Date

Immediate

Expiry Date

13 Apr, 26

Salary

0.0

Posted On

13 Jan, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

AWS, Site Reliability Engineering, DevOps, Infrastructure, Terraform, CloudFormation, Linux, Networking, Kubernetes, Container Orchestration, Microservices, Scripting, Security, Monitoring, Automation, Performance Tuning

Industry

Software Development

Description

SRE- AWS 7+ years Vizag Only- Onsite We are looking for a highly experienced Senior SRE with strong expertise in AWS to help design, operate, and scale the infrastructure powering our product platforms. This is a mission-critical role in a fast-moving product development environment, where system reliability, automation, and performance are core business drivers. Key Responsibilities Reliability & Operations Own reliability, availability, and performance of large-scale production systems. Establish SLOs, SLAs, and error budgets for mission-critical services. Lead incident response, root cause analysis, and continuous improvement initiatives. Design fault-tolerant architectures and disaster recovery strategies. Cloud & Infrastructure Engineering Architect, deploy, and manage infrastructure on AWS using IaC (Terraform / CloudFormation). Optimize cloud costs while maintaining performance and reliability. Implement multi-region, highly available architectures. Manage container platforms (Docker, Kubernetes, EKS). Automation & DevOps Build automation pipelines for infrastructure provisioning, deployment, and scaling. Improve CI/CD pipelines and release engineering processes. Develop tools and scripts to reduce operational toil. Observability & Performance Implement comprehensive monitoring, logging, and alerting systems. Drive performance tuning and capacity planning. Lead chaos engineering and resilience testing practices. Leadership & Mentorship Mentor SREs and DevOps engineers. Partner with Engineering and Product teams to embed reliability into product design. Required Skills & Experience 7+ years in Site Reliability Engineering / DevOps / Infrastructure roles. Deep hands-on experience with AWS services (EC2, EKS, RDS, S3, Lambda, VPC, IAM, etc.). Expertise in infrastructure as code: Terraform, CloudFormation. Strong experience with Linux systems, networking, and distributed systems. Experience with Kubernetes, container orchestration, and microservices environments. Strong scripting skills (Python, Bash, Go). Knowledge of security best practices and compliance requirements. Soft Skills Strong problem-solving and decision-making ability under pressure. Excellent communication and stakeholder collaboration. High ownership and accountability mindset. Ability to thrive in an aggressively-paced product development culture. Education Bachelor’s degree in Computer Science, Engineering, or related field (preferred).

Responsibilities

The Senior SRE will own the reliability, availability, and performance of large-scale production systems while leading incident response and continuous improvement initiatives. They will also design fault-tolerant architectures and manage infrastructure on AWS.