Backend Ops Engineer at Weekday AI

, , India -

Full Time

Start Date

Immediate

Expiry Date

16 Jun, 26

Salary

3500000.0

Posted On

18 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

AWS, Terraform, Docker, CI/CD, Observability, DevOps, SRE, IaC, GitHub Actions, Prometheus, Grafana, OpenTelemetry, Sentry, ECS/Fargate, RDS, IAM

Industry

technology;Information and Internet

Description

This role is for one of the Weekday's clients Salary range: Rs 2000000 - Rs 3500000 (ie INR 20 - 35 LPA) Min Experience: 3 years Location: Remote (India) JobType: full-time We are looking for a Backend Ops Engineer to take ownership of infrastructure and operations, ensuring fast, reliable, and cost-efficient deployments. This role is critical in reducing operational overhead, improving system scalability, and enabling seamless product delivery. You will focus on building robust infrastructure, optimizing cloud costs, and integrating AI-driven automation into DevOps workflows. This position is ideal for someone who thrives in high-growth environments and enjoys solving complex infrastructure challenges. Why This Role Matters Centralizes infrastructure ownership to improve delivery speed and reliability Enables proactive scaling for growing user demand and traffic spikes Optimizes cloud costs and operational efficiency Lays the foundation for compliance frameworks such as SOC 2 and GDPR Introduces AI-driven automation into infrastructure management Key Responsibilities Initial Focus (First Quarter) Implement AI-driven operations, including log analysis, automated infrastructure updates, and predictive scaling alerts Benchmark cloud and edge services to improve performance and scalability Build self-healing infrastructure pipelines that demonstrate advanced AI capabilities Ongoing Responsibilities Design, automate, and manage infrastructure using Terraform and AWS services (ECS/Fargate, RDS, S3, IAM) Build and maintain CI/CD pipelines using GitHub Actions for efficient deployments Implement and manage observability tools such as Prometheus, Grafana, OpenTelemetry, and Sentry Handle containerization using Docker and troubleshoot performance issues under load Collaborate with backend teams to ensure low-latency, scalable, and cost-effective services Long-Term Growth Progress into a Staff Platform Engineer or Lead SRE role with end-to-end platform ownership Contribute to building scalable deployment frameworks for enterprise use cases Help define best practices for AI-driven DevOps and infrastructure management Requirements Must-Have Skills & Experience 2–3+ years of experience in DevOps or Site Reliability Engineering (SRE) roles, preferably in high-growth environments Strong expertise with AWS services (ECS/Fargate, RDS, S3, CloudWatch, IAM) Hands-on experience with Terraform and Infrastructure as Code (IaC) Proficiency in CI/CD pipelines (GitHub Actions) and Docker Experience with observability tools including Prometheus, Grafana, OpenTelemetry, and Sentry Proven ability to troubleshoot and optimize infrastructure under high load conditions AI-Driven Mindset Interest or experience in integrating AI into DevOps workflows Exposure to LLM APIs (e.g., OpenAI, Anthropic, Hugging Face) is a plus Nice-to-Have Familiarity with SOC 2 and GDPR compliance requirements Experience working with multi-cloud environments (GCP, Azure) Scripting skills in Python for automation Knowledge of infrastructure security best practices Experience with DigitalOcean services or cloud migration strategies Soft Skills Strong ownership mindset with the ability to independently drive outcomes Clear communication skills to explain technical trade-offs to cross-functional teams Proactive approach with a strong focus on automation and continuous improvement Core Skills AWS | Terraform | Docker | CI/CD | Observability | DevOps | SRE

Responsibilities

The engineer will focus initially on implementing AI-driven operations, benchmarking services for performance, and building self-healing infrastructure pipelines. Ongoing duties include automating and managing infrastructure with Terraform and AWS, maintaining CI/CD pipelines, and implementing observability tools.