Infrastructure Lead (DevOps & Cloud) at Weekday AI
Mumbai, maharashtra, India -
Full Time


Start Date

Immediate

Expiry Date

05 Jul, 26

Salary

0.0

Posted On

06 Apr, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Cloud Infrastructure, DevOps, SRE, AWS, Azure, Terraform, Kubernetes, Docker, CI/CD, Jenkins, GitLab CI, CircleCI, ArgoCD, Python, Bash, Observability

Industry

technology;Information and Internet

Description
This role is for one of the Weekday's clients Min Experience: 8 years Location: Mumbai JobType: full-time We are looking for an experienced Infrastructure Lead to drive the design, implementation, and optimization of scalable, secure, and highly available cloud infrastructure. This role will lead DevOps/SRE initiatives, establish best practices, and ensure reliability and performance of mission-critical systems. Key Responsibilities 1. Cloud Infrastructure & Architecture Design, develop, and maintain scalable cloud infrastructure on AWS and Azure platforms. Lead architectural decisions to ensure high availability, fault tolerance, and optimal performance. Promote infrastructure automation through Infrastructure as Code (Terraform). 2. DevOps & CI/CD Enablement Develop and enhance CI/CD pipelines using tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD. Adopt GitOps methodologies for consistent and dependable deployments. Increase deployment frequency, shorten lead times, and reduce failure rates. 3. Kubernetes & Containerization Oversee and scale Kubernetes clusters across EKS, AKS, and on-premises environments. Implement container orchestration, service mesh solutions, and cluster optimization techniques. Ensure platform reliability and conduct performance tuning. 4. Monitoring, Reliability & Incident Management Establish and uphold SLOs, SLAs, and reliability benchmarks. Deploy observability tools such as Prometheus, Grafana, Datadog, and ELK stack. Lead incident management processes including root cause analysis and reducing mean time to recovery (MTTR). 5. Automation & Operational Excellence Promote automation across infrastructure provisioning, monitoring, and recovery workflows. Create reusable infrastructure modules and accelerators. Minimize manual tasks through scripting using Python and Bash, along with supporting tools. 6. Security & Compliance Apply cloud security best practices involving IAM, network security, and policy enforcement. Maintain compliance via Kubernetes policies and governance frameworks. Champion secure-by-design principles in infrastructure development. 7. Cost Optimization Monitor cloud resource consumption and implement cost-saving strategies. Utilize right-sizing, auto-scaling, and efficient resource utilization methods. 8. Leadership & Stakeholder Management Lead and mentor DevOps and SRE teams. Collaborate effectively with engineering, product, and architecture teams. Promote infrastructure best practices across various projects and teams. 9. Innovation & AI-driven Operations (Preferred) Explore AI and machine learning-driven infrastructure enhancements and AIOps capabilities. Implement intelligent monitoring, anomaly detection, and automate root cause analysis. Required Skills & Experience At least 8 years of experience in Infrastructure, DevOps, or SRE roles. Strong expertise in AWS (preferred). Hands-on experience with Terraform (Infrastructure as Code). Comprehensive knowledge of Kubernetes and containerization (Docker). Experience working with CI/CD tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD. Strong understanding of monitoring and observability tools. Proficient in scripting languages including Python and Bash. Experience managing high-availability, large-scale systems. Skills Infrastructure as code Lead Infrastructure DevOps SRE Terraform Kubernetes Docker CI CD
Responsibilities
The Infrastructure Lead will design and maintain scalable cloud infrastructure while driving DevOps and SRE initiatives to ensure system reliability. They will also mentor teams, manage incident response, and implement automation to optimize operational performance.
Loading...