Site Reliability Engineer - NS London at BAE Systems
London, England, United Kingdom -
Full Time


Start Date

Immediate

Expiry Date

22 Nov, 25

Salary

0.0

Posted On

23 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Good communication skills

Industry

Information Technology/IT

Description

LOCATION(S): LONDON

BAE Systems Digital Intelligence is home to 4,500 digital, cyber and intelligence experts. We work collaboratively across 10 countries to collect, connect and understand complex data, so that governments, nation states, armed forces and commercial businesses can unlock digital advantage in the most demanding environments.
Site Reliability Engineering is a rapidly growing concept in industry, with a remit to drive the quality, reliability and performance of essential systems. As a Site Reliability Engineer you’ll be part of a team in BAE Systems at the forefront of this, delivering these benefits to a key national security customer. We are in the process of building our team and tools, and with your help will create a culture of continual improvement to revolutionise the way our customer’s systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission.

Responsibilities

As an SRE, fundamentally you will be doing work that has historically been done by an operations team, but using software and systems engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team’s time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability to support the customer mission.
Role accountabilities include:
 Supporting and maintaining essential service that support core mission applications, proactively enhancing their availability, performance and stability.
 Being part of the 24/7 on call rota, supporting critical production systems out of business hours, for which additional on call allowances and overtime benefits will be paid.
 Finding innovative solutions to problems rather than undertaking repetitive work, automating everything you can. You will work alongside development teams, advising them of good practice in how to design and build systems, learning from what you know works well.
 You will design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations to meet the customer requirements and demonstrate the improvements the team are making on a daily basis. You will be well versed in the relationship between software and infrastructure, understanding the characteristics of systems that enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to.
 Participating in the wider DevOps/SRE community within the organisation.

Loading...