Windows Site Reliability Engineer at Luxoft

zdalnie, województwo śląskie, Poland -

Full Time

Start Date

Immediate

Expiry Date

29 Apr, 25

Salary

0.0

Posted On

30 Jan, 25

Experience

0 year(s) or above

Remote Job

Telecommute

Sponsor Visa

Skills

Git, Splunk, Cyberark, Powershell, Reliability Engineering, Active Directory, Vmware Esx, Cloud, Emc Networker, Confluence, Storage, Kubernetes, Teamcity, Performance Measurement, Microsoft Sql Server, Code, Jira, Servicenow, Interpersonal Skills, Ibm, Programming Languages

Industry

Information Technology/IT

Description

PROJECT DESCRIPTION

About the client:
Our client is a UK subsidiary of a global financial house working in multiple markets and asset classes.
About the project:
We are looking to expand our team with a Windows Site Reliability Engineer. Successful candidate will become a part of a client team responsible for all aspects of the Windows Server estate across the client’s organization
About our team: Rapidly expanding group developing and supporting a range of client projects.
About working environment:
We are working remotely in client’s environment. Currently, the team operates fully in WFH mode.
Role Overview:
Site Reliability Engineering are responsible for delivering continuous improvement, automation and self-service offerings to operational teams across Bank EMEA and Securities International
Role Purpose:
Responsible for the reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil operations must perform. Member of L3 Engineering team providing subject matter expertise and ultimate escalation.

SKILLS

Must have
Essential:
Exceptional skills in Microsoft Windows Server internals and related technologies
Excellent skills in managing and maintaining Active Directory, DHCP, DNS, LDAP and Kerberos
Extensive experience in hardware performance monitoring and tuning complex low latency systems.
Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices
Exceptional knowledge of scripting and programming languages such as PowerShell, Python and C#
Fluent in Backup and Recovery processes and procedures
Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques
Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability
Excellent Performance Tuning skills, in-depth knowledge of system internals, performance counters and performance measurement and analysis tools.
Ability to interpret and implement CIS security hardening recommendations in a controlled manner
Acute awareness of Security and Auditing requirements in a regulated environment
“Infrastructure as Code” Principles and practices.
“Continuous Integration (CI) and Continuous Development (CD)” Principles and practices
Git, Ansible, Terraform and TeamCity
Serena Deployment Automation (SDA) and Jenkins
Nice to have
Highly Desirable:
Experience on writing, managing plays/playbooks on AWX / Ansible Tower
Advance working knowledge of Kubernetes and Docker container orchestration
Microsoft SQL Server, Oracle, Sybase ASE, MongoDB and Snowflake
IBM Tivoli / Netcool
Nutanix HCI and VMWare ESX
Networking Protocols (TCP/IP, DNS, DHCP, VLAN’s)
RHEL, Oracle Linux, Oracle Solaris and related technologies
Cloud computing
IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle
Knowledge of data security governance and regulations such as GDPR and SOX
Desirable:
Dell EMC PowerStore (SAN) and Isilon (NAS)
Rubrik, EMC Networker, Data Domain and IBM Tivoli Storage Manager
CyberArk
Splunk
Qualys
Cisco Tetration
ServiceNow
JIRA and Confluence
Personal Specifications:
Excellent communication and interpersonal skills
Ability to handle pressure during outages and systematically resolve issues
Excellent problem-solving skills
Results driven, with a strong sense of accountability
A proactive, motivated approach
The ability to operate with urgency and prioritise work accordingly
A structured and logical approach to work
Attention to detail and accuracy
Ability to perform well in a pressurised environment
Ability to manage constructive conflict effectively
The ability to manage large workloads and tight deadlines
Able to communicate complex technical concepts to non-technical persons at all levels

Responsibilities

Primary:
Develop software to make infrastructure services self-managing and self-service
Deliver continuous service improvement by developing Infrastructure as Code
Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value
Improve system performance, make effective use of resources, distribute load and reduce latency
Identify SLO’s (Service Level Objectives) to meet availability and latency objectives
Develop pro-active monitoring solutions that alert on symptoms and not just on outages
Perform detailed root cause analysis (RCA’s) on incidents and outages to prevent future
Partner with development teams to improve services via rigorous testing and release procedures
Identify technical debt and partner with application teams to build remediation plans
Develop standard operational procedures and produce effective documentation
Analyse workloads and devise suitable cloud migration strategies where appropriate
Ensure all project / investment workloads are delivered according to plans and budget defined
Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests
Deputise for team lead when required to do so and act-up accordingly
Identify cost saving and optimisation opportunities across the group
Build strong working relationships across the organisation
Adhere to the core values of the bank
Secondary:
Perform daily health and compliance checks for all systems as required
Ensure all systems are backed up successfully and any issues are promptly resolved
Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved
Ensure sufficient capacity is available to accommodate drive growth
Respond to emails sent to the team distribution list / mailboxes in a timely manner
Handle incidents and requests with efficiency and a “customer first” mindset
Maintain infrastructure in a highly available, reliable, secure and performant manner
General Server / Database / Virtualisation Administration maintenance activities
Provide technical support to application support and development teams
Provide consultancy to application support and development teams
Take part in On-Call & weekend work rotation; triaging and addressing production issues as they arise