Incident and Problem Management Lead -IT Infrastructure at Encora
, Kuala Lumpur, Malaysia -
Full Time


Start Date

Immediate

Expiry Date

24 Feb, 26

Salary

0.0

Posted On

26 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Incident Management, Problem Management, Root Cause Analysis, Continuous Improvement, Command Center Operations, Stakeholder Management, Process Governance, AI Adoption, Process Optimization, Operational Excellence, Knowledge Sharing, Automation, SLA Adherence, KPI Monitoring, Team Leadership, Communication

Industry

IT Services and IT Consulting

Description
The Incident and Problem Management Lead is responsible for ensuring the effective management of IT incidents and problems to minimize business impact and prevent recurrence. This role oversees the end-to-end process, drives timely resolution of incidents, root cause analysis, and continuous improvement initiatives. Additionally, the role manages a 24x7 Command Center operation with a team of 12 staff, ensuring continuous monitoring, rapid response, and operational excellence. As a leader in both operational IT service operations supporting digital transformation, this role champions AI adoption, process optimization, and shift-left strategies to enhance service delivery while reducing manual overhead whilst maintaining operational stability. Responsibilities Incident Management: Lead the incident management process to ensure rapid restoration of services. Coordinate major incident response, including communication with stakeholders and escalation management. Ensure adherence to SLAs and KPIs for incident resolution. Maintain accurate incident records and reporting. Problem Management: Drive root cause analysis for recurring incidents and major problems. Provide oversight to permanent fixes and preventive measures. Maintain the knowledgebase of problems and ensure effective knowledge sharing. Collaborate with engineering and operations teams to reduce recurring incident volume. Review Incident Trends for preventive measures to incident occurrence. Command Center Operations: Manage a 24x7 Command Center with 12 staff across rotating shifts. Ensure continuous monitoring of critical systems and proactive detection of issues. Establish clear escalation protocols and ensure timely response to alerts. Optimize staffing schedules and maintain high team performance. Implement automation and tools to improve operational efficiency. Process Governance & Continuous Improvement: Define and enforce incident and problem management policies and procedures, ensuring annual review is performed. Monitor process performance and identify improvement opportunities. Provide training and guidance to teams and partners on best practices. Prepare and present regular reports to senior management. Implement shift-left strategies to streamline Infra Operations responses to common alerts and incidents. Act as the point of contact for audits related to Incident and Problem Management Stakeholder Management: Act as the escalation point of contact for incident and problem management. Communicate effectively with business units, vendors, and leadership during critical events. Ensure transparency and timely updates throughout the incident lifecycle, including post-incident reporting to Group Risk Management. Champion culture and conduct behavioral expectations within the Department/Division Ensure compliance with IT policies and contribute to risk culture and audit participation About Encora Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others. At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality
Responsibilities
The Incident and Problem Management Lead is responsible for managing IT incidents and problems to minimize business impact and prevent recurrence. This includes overseeing the incident management process, coordinating major incident responses, and driving root cause analysis for recurring issues.
Loading...