Start Date
Immediate
Expiry Date
22 Nov, 25
Salary
0.0
Posted On
23 Aug, 25
Experience
0 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Good communication skills
Industry
Information Technology/IT
LOCATION(S): LONDON
BAE Systems Digital Intelligence is home to 4,500 digital, cyber and intelligence experts. We work collaboratively across 10 countries to collect, connect and understand complex data, so that governments, nation states, armed forces and commercial businesses can unlock digital advantage in the most demanding environments.
Site Reliability Engineering is a rapidly growing concept in industry, with a remit to drive the quality, reliability and performance of essential systems. As a Site Reliability Engineer you’ll be part of a team in BAE Systems at the forefront of this, delivering these benefits to a key national security customer. We are in the process of building our team and tools, and with your help will create a culture of continual improvement to revolutionise the way our customer’s systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission.
As an SRE, fundamentally you will be doing work that has historically been done by an operations team, but using software and systems engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team’s time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability to support the customer mission.
Role accountabilities include:
Supporting and maintaining essential service that support core mission applications, proactively enhancing their availability, performance and stability.
Being part of the 24/7 on call rota, supporting critical production systems out of business hours, for which additional on call allowances and overtime benefits will be paid.
Finding innovative solutions to problems rather than undertaking repetitive work, automating everything you can. You will work alongside development teams, advising them of good practice in how to design and build systems, learning from what you know works well.
You will design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations to meet the customer requirements and demonstrate the improvements the team are making on a daily basis. You will be well versed in the relationship between software and infrastructure, understanding the characteristics of systems that enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to.
Participating in the wider DevOps/SRE community within the organisation.