Observability Lead at Micron Technology
Hyderabad, Telangana, India -
Full Time


Start Date

Immediate

Expiry Date

18 Feb, 26

Salary

0.0

Posted On

20 Nov, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Observability, AIOps, SRE Principles, OpsRamp, Splunk, AppDynamics, NetBrain, ThousandEyes, BigPanda, ServiceNow AIOps, Python, PowerShell, SLIs, SLOs, ITIL Processes, Incident Management

Industry

Semiconductor Manufacturing

Description
Lead Observability Strategy: Define and execute the observability roadmap aligned with business and IT goals, integrating AIOps and SRE principles. Tool Ownership & Integration: Manage and optimize observability tools including OpsRamp, Splunk, AppDynamics, NetBrain, ThousandEyes, and explore new platforms like BigPanda and ServiceNow AIOps. Automation Leadership: Drive automation of L1/L2 operational tasks using Python and PowerShell, improving efficiency and reducing manual intervention. SRE Adoption: Collaborate with cross-functional teams to implement Site Reliability Engineering (SRE) practices, including SLIs/SLOs, error budgets, and incident response automation. Monitoring & Dashboarding: Design and maintain comprehensive dashboards and alerting mechanisms for infrastructure, applications, and network performance. Incident & Problem Management: Partner with ITSM teams to enhance incident detection, root cause analysis, and resolution workflows. Mentorship & Collaboration: Lead and mentor a team of observability engineers, fostering a culture of innovation, ownership, and continuous improvement. 8+ years of experience in IT operations, observability, or infrastructure monitoring. Strong hands-on experience with tools like Splunk, OpsRamp, AppDynamics, NetBrain, ThousandEyes. Experience with AIOps platforms (BigPanda, ServiceNow AIOps preferred). Proficiency in Python and PowerShell for automation and scripting. Familiarity with SRE principles and implementation strategies. Solid understanding of ITIL processes (Incident, Change, Problem Management). Excellent communication, leadership, and stakeholder management skills.
Responsibilities
Lead the observability strategy by defining and executing the observability roadmap aligned with business and IT goals. Collaborate with cross-functional teams to implement Site Reliability Engineering (SRE) practices and drive automation of operational tasks.
Loading...