Site Reliability Specialist at Airbus Helicopters Polska Sp z oo
Gdańsk, pomorskie, Poland -
Full Time


Start Date

Immediate

Expiry Date

22 May, 25

Salary

0.0

Posted On

18 Apr, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Operating Systems, Scripting, Communication Skills, Root Cause, Automation, Infrastructure, Cloud

Industry

Information Technology/IT

Description

JOB SUMMARY:

The Solution, Reliability and Monitoring departments main objective is to define, provide and support the production environments used by all NAVBLUE’s customers.
As part of this team the Site Reliability Specialist purpose is to ensure that the Production Infrastructure is up-and-running 24x7 troubleshooting the issues encountered by the infrastructure and providing continuous improvement to our ways-of-working.
The Site Reliability Specialist is also interfacing with our Contractor who is managing the L1 monitoring and troubleshooting of our solution on a daily basis.
The Site Reliability Specialist will need to have a solid knowledge of Operating Systems, Virtualization, Networking and Cloud based infrastructure, such as Amazon Web Services (AWS).
This position is expected to use and create established procedures to analyze and resolve problems. This position is expected to perform moderately complex and varied tasks with minimal supervision.

EXPERIENCE:

  • Solid experience (2-4 years) in a technical support team or in a similar technical support environment
  • Some experience in scripting and automation an asset
  • Some Experience in Cloud based infrastructure preferred.

KNOWLEDGE, SKILLS, DEMONSTRATED CAPABILITIES & COMPETENCIES:

  • Solid knowledge of Operating Systems & ability to perform troubleshooting required.
  • Solid knowledge of Cloud Technology concepts & ability to perform troubleshooting required.
  • Solid knowledge of networking for enterprise environments required.
  • Solid knowledge of Virtual Machine concepts and management of infrastructure
  • Demonstrated ability to identify root cause of issues and to recommend permanent, long term, fixes.
  • Demonstrated ability to perform standard troubleshooting in AWS environment and providing guidance to other teams.
  • Proactive, confident self-starter with effective interpersonal and communication skills

EXPERIENCE LEVEL:

Professional

Responsibilities
  • Contractor activities definition and follow-up:
  • Create / review / update the Standard Operating Procedures (SOPs) for our Contractor,
  • Train our contractor for new SOP, monitoring or new components onboarding,
  • Follow-up the Contractor daily activities - report any failure through RCA / improve SOPs / review good/bad escalations and report,
  • Support escalations.
  • Support customer Disaster Recovery testing activities periodically.
  • Hosting operation process documentation and improvements.
  • Triage the tickets reaching the team from Monitoring, Customers or Projects.
  • Support activities:
  • Revert software deployments in case of issue on a software after a deployment.
  • Answer to escalations fixing immediate issues critical for customers
  • Monitoring tool management:
  • Adjust thresholds and alert levels,
  • Measure system performance / responsiveness,
  • Add Checks/Hosts in the different system,
  • Associate the SOPs to the checks created.
  • Perform deployments and support the associated failures.
  • Automate activities related to daily operations and deployments.
  • Report on main events / activities - provide summary to management about our system status.
  • Ensure the support continuity between the different timezones.
  • Contribute to Service Level Objectives (SLO) definitions & monitoring.
  • Internal release validation including upgrades & rollbacks
  • Internal process exercises such as:
  • Disaster Recovery / Failover processes
  • Backup / Restore validation
  • COTS installation and management
Loading...