Lead Software Engineer at Weekday AI
Pune, maharashtra, India -
Full Time


Start Date

Immediate

Expiry Date

06 Mar, 26

Salary

5500000.0

Posted On

06 Dec, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Observability Engineering, OpenTelemetry, Prometheus, Grafana, Distributed Tracing, Metrics, Logs, SLOs, SLIs, Java, Python, Go, Node.js, Debugging, Technical Leadership, Documentation

Industry

technology;Information and Internet

Description
This role is for one of the Weekday's clients Salary range: Rs 4000000 - Rs 5500000 (ie INR 40-55 LPA) Min Experience: 14 years Location: Pune JobType: full-time This role focuses on leading observability engineering for large-scale, distributed systems. You will be responsible for designing and maintaining a robust observability ecosystem using OpenTelemetry, Prometheus, Grafana, and distributed tracing frameworks. Your work will directly influence system reliability by ensuring consistent instrumentation, actionable monitoring, and well-defined SLOs/SLIs across multiple product teams. This position is based in Pune and requires strong technical depth, leadership in observability adoption, and the ability to guide teams toward best practices in metrics, logs, and tracing. Key Responsibilities Design, build, and maintain end-to-end observability platforms with a strong emphasis on OpenTelemetry-based instrumentation. Ensure consistent and reliable emission of metrics, logs, and traces across multiple languages such as Java, Python, Go, and Node.js. Develop scalable observability strategies for engineering teams, enabling seamless monitoring and alerting across distributed systems. Lead adoption of OpenTelemetry, Prometheus, Grafana, and distributed tracing tools across the organization. Define, refine, and operationalize SLOs and SLIs in collaboration with product and engineering teams. Identify observability gaps and work with teams to improve monitoring coverage, alerting quality, and incident response workflows. Mentor engineers on advanced instrumentation, observability patterns, dashboard design, and effective troubleshooting. Evaluate emerging observability technologies and recommend enhancements to improve platform health and reliability. Document standards, best practices, and observability guidelines to drive consistent usage across teams. What Makes You a Great Fit Proven experience building and scaling observability systems for distributed, high-performance environments. Advanced knowledge of OpenTelemetry, Prometheus, Grafana, and modern tracing and logging platforms. Strong understanding of structured logging, metrics, and tracing across multiple programming languages. Ability to design and operationalize SLOs and SLIs using real-time observability data. Expertise in debugging complex distributed systems using traces, metrics, and logs. A passion for enabling engineering teams through guidance, documentation, and best-practice observability frameworks. Strong communication and technical leadership skills, with the ability to influence cross-functional teams. Curiosity and initiative to explore new observability tools and drive continuous improvement.
Responsibilities
Design, build, and maintain end-to-end observability platforms with a focus on OpenTelemetry-based instrumentation. Lead the adoption of observability tools and define operational SLOs and SLIs in collaboration with product and engineering teams.
Loading...