Incident Response Manager - Data System Infrastructure (DSI) - Dublin

at  ByteDance

Dublin, County Dublin, Ireland -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate03 Sep, 2024Not Specified04 Jun, 20245 year(s) or aboveServers,Lenel,Program Management,Ticketing,Comptia Server+,Data Analytics,Visualization,Sensitive Information,ItilNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Founded in 2012, ByteDance’s mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
Why Join
Creation is the core of ByteDance’s purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.
Together, we inspire creativity and enrich life - a mission we aim towards achieving every day.
To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At ByteDance, we create together and grow together. That’s how we drive impact - for ourselves, our company, and the users we serve.
Join us.
The Data Systems Infrastructure (DSI) team sits within the ByteDance global technology structure and supports the company’s fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services, making sure they are scalable and are reliable.
The Incident Response Center (IRC) is the first layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conducting thorough investigation of alerts, classification and triage. The Incident Response Manager is responsible for delivering operations within the IROC across all ByteDance datacenter sites in the respective regions. IRC team is expected to respond to all alarms/alerts set in Server Automation Operations System (SAOS), Data Center Infrastructure Management (DCIM) to quickly discover anomalies and engage Subject Matter Expert (SME) teams to start issue triage. The IRC team provides business intelligence through rigorous analysis of alerts and issues which reduce and prevent recurring incidents .

Responsibilities

  • Delivering global operations within the IROC (Incident Response Operation Center) ByteDance datacenter.
  • First responder and layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conduct thorough investigation of alerts, classification and triage.
  • Respond to all infrastructure, facilities, security, and safety events notified via various means, such as alarms/alerts set in Server Operations and Maintenance, Datacenter Infrastructure Management, Network & Grafana, and other functions.
  • Respond to incidents and critical situations in a calm, problem-solving manner, and conduct in-depth investigation of alerts.
  • Provide insights into the effectiveness of the incident response and recovery process through regular reports
  • Analyze trends and patterns in events to identify opportunities for improvement and optimization
  • Monitor the performance of incident response against the agreed-upon SLAs by alerting and notifying stakeholders
  • Escalation Management notifying or initiating discussions with higher-level support teams engaging in resolution processes
  • Identify, assess and communicate potential risks arising through event monitoring that could affect customer’s service
  • Support program managers and facilitate project deliverables, improve overall operational security and engineering initiatives

Minimum Qualifications

  • 5+ years of experience in service center, or similar 24x7 operations center environment.
  • 3+ years of experience in a technology company or experience as a team lead, and experience in operation program management.
  • Strong knowledge of technical elements associated with systems such as Server Health, Datacenter Environment and IP Networks.
  • Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure.

Preferred Qualifications

  • 5 years experience as an incident and problem analyst.
  • Works well under pressure and within time constraints to solve problems and complete deliverables.
  • Experience with Ticketing, Grafana, Servers and Data Center Systems.
  • Working knowledge and/or certifications in ITIL, CompTIA Server+, Schneider Electric Data Center Certified Associate (DCCA), Data Analytics and Visualization.
  • Knowledge of Cybersecurity, Lenel and Avigilon systems is a plus.

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too

Responsibilities:

  • Delivering global operations within the IROC (Incident Response Operation Center) ByteDance datacenter.
  • First responder and layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conduct thorough investigation of alerts, classification and triage.
  • Respond to all infrastructure, facilities, security, and safety events notified via various means, such as alarms/alerts set in Server Operations and Maintenance, Datacenter Infrastructure Management, Network & Grafana, and other functions.
  • Respond to incidents and critical situations in a calm, problem-solving manner, and conduct in-depth investigation of alerts.
  • Provide insights into the effectiveness of the incident response and recovery process through regular reports
  • Analyze trends and patterns in events to identify opportunities for improvement and optimization
  • Monitor the performance of incident response against the agreed-upon SLAs by alerting and notifying stakeholders
  • Escalation Management notifying or initiating discussions with higher-level support teams engaging in resolution processes
  • Identify, assess and communicate potential risks arising through event monitoring that could affect customer’s service
  • Support program managers and facilitate project deliverables, improve overall operational security and engineering initiative


REQUIREMENT SUMMARY

Min:5.0Max:10.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Dublin, County Dublin, Ireland