Backend Ops Engineer at Weekday AI

, , India -

Full Time

Start Date

Immediate

Expiry Date

28 Jul, 26

Salary

0.0

Posted On

29 Apr, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

AWS, Terraform, CI/CD Pipelines, Docker, Observability, Monitoring, ECS, Fargate, GitHub Actions, Prometheus, Grafana, OpenTelemetry, Sentry, Python, Infrastructure as Code, DevOps

Industry

technology;Information and Internet

Description

This role is for one of the Weekday's clients Min Experience: 2 years Location: Remote (India) JobType: full-time We are building a next-generation AI-driven platform that enables developers, creators, and enterprises to design scalable, secure, and human-centered AI experiences. As adoption grows, we are looking for a Backend Ops Engineer to take full ownership of infrastructure, ensuring high reliability, performance, and cost efficiency. This role is critical in centralizing DevOps and infrastructure responsibilities—enabling faster deployments, reducing operational risks, optimizing cloud costs, and building a strong foundation for scalability and compliance. You will also play a key role in integrating AI into infrastructure operations, shaping the future of intelligent DevOps systems. Why This Role Matters Drive faster, safer, and more reliable deployments Proactively scale systems to handle traffic growth Optimize cloud costs and improve infrastructure efficiency Build foundations for compliance (SOC 2, GDPR) Introduce AI-driven automation into DevOps workflows Key Responsibilities Initial Focus (First Quarter) Implement AI-powered operations such as log analysis, automated infrastructure updates, and predictive scaling alerts Benchmark cloud and edge services and prototype scalability improvements Build self-healing infrastructure pipelines that demonstrate advanced AI capabilities Ongoing Responsibilities Design, automate, and manage cloud infrastructure using Terraform and AWS services (ECS/Fargate, RDS, S3, IAM) Build and maintain CI/CD pipelines using GitHub Actions for efficient and reliable deployments Implement observability and monitoring systems using tools like Prometheus, Grafana, OpenTelemetry, and Sentry Manage containerization using Docker and troubleshoot performance issues under load Collaborate with backend teams to ensure low-latency, cost-effective, and scalable systems Continuously improve system reliability, uptime, and incident response processes Growth Path Progress into a Staff Platform Engineer or Lead SRE role Own end-to-end platform infrastructure Contribute to building enterprise-ready deployment frameworks Help define best practices for AI-driven DevOps systems Required Qualifications 2–3+ years of experience in DevOps or Site Reliability Engineering roles Strong expertise in AWS (ECS/Fargate, RDS, S3, CloudWatch, IAM) Hands-on experience with Terraform and infrastructure-as-code practices Proven experience with CI/CD pipelines (GitHub Actions preferred) Strong knowledge of Docker and containerized environments Experience with observability tools such as Prometheus, Grafana, OpenTelemetry, or Sentry Demonstrated ability to debug and resolve infrastructure issues under load AI & Technical Mindset Interest or experience in integrating AI into DevOps workflows Exposure to LLM APIs or AI-driven automation tools is a plus Nice to Have Familiarity with compliance frameworks such as SOC 2 or GDPR Experience with multi-cloud environments (AWS, GCP, Azure) Proficiency in Python for scripting and automation Knowledge of infrastructure security best practices Experience with cloud platforms like DigitalOcean or cloud migration projects Soft Skills Strong ownership mindset with a proactive approach Clear communication skills to explain technical trade-offs Bias toward automation and continuous improvement Ability to work effectively in fast-paced, evolving environments Key Skills AWS ECS / Fargate Terraform CI/CD Pipelines Docker Observability & Monitoring

Responsibilities

The Backend Ops Engineer will take full ownership of infrastructure, focusing on reliability, performance, and cost efficiency. Responsibilities include designing and managing cloud infrastructure, building CI/CD pipelines, and integrating AI-driven automation into DevOps workflows.