Senior Data Engineer (NHS) at Our Future Health

London, England, United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

28 Aug, 25

Salary

0.0

Posted On

29 May, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Pipelines, Primary Care, Version Control, Python, Code Review, Data Standards, Feeds, Spark, Computing

Industry

Information Technology/IT

Description

We’re looking for a Senior Data Engineer to join our supportive and mission-driven Data Team. This is an exciting opportunity to work on a nationally significant programme powered by NHS health data — helping researchers solve complex challenges on a truly industrial with global significance.
In this role, you’ll bring your experience working with NHS datasets to help design, build, and maintain data pipelines that enable trusted, high-quality insights. You’ll collaborate closely with colleagues across multiple disciplines — from Researchers and Epidemiologists to Software Engineers and Product Leads — contributing to a shared code base that delivers real-world health data for discovery.

REQUIREMENTS

Experience building and maintaining robust, scalable and efficient data pipelines. Capable of processing very large amounts of data based on feeds from multiple systems using a range of different technologies.
You’re an empathetic communicator, comfortable bridging technical and non-technical perspectives
You’re confident working with NHS health data and understand the nuances of secondary and primary care datasets (Hospital Episodes Statistics, Death registry data, A&E data etc) as well as Primary care (GP data) would be advantageous.
Highly proficient in Python with solid command line knowledge and Unix skills.
Good understanding of cloud environments (ideally Azure), distributed computing and optimising workflows and pipelines.
Understanding of common data transformation and storage formats, e.g. Apache Parquet, Delta tables.
Understanding of containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
Working knowledge using Spark, Databricks, Data Lakes.
Follow best practices like code review, clean code and unit tests.
You’re comfortable working in an agile development team, familiar with version control and Git/GitHub.
Awareness/interest of data standards such as GA4GH ( https://www.ga4gh.org/) and FAIR (https://www.go-fair.org/fair-principles/).
You’re experienced in contributing to and navigating shared codebases within multi-person teams

Responsibilities

Support the build of data pipelines from data providers to our primary data store and trusted research environment. Support the design, scoping and build of data flows.
Produce logic for data transformation steps as code, which meets the requirements for our end users and builds well curated, accessible and quality controlled data for analysis.
Contribute to code base for multiple data pipelines while ensuring best coding practises are used.
The opportunity to work with Data Scientists and Epidemiologists to understand their data requirements and collaborate with them to deliver the data needed for their projects.
Keep abreast of best practice in data engineering across industry, research and Government and facilitating the adoption of these standards.