Senior Data Engineer (NHS) at Our Future Health
London, England, United Kingdom -
Full Time


Start Date

Immediate

Expiry Date

28 Aug, 25

Salary

0.0

Posted On

29 May, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Pipelines, Primary Care, Version Control, Python, Code Review, Data Standards, Feeds, Spark, Computing

Industry

Information Technology/IT

Description

We’re looking for a Senior Data Engineer to join our supportive and mission-driven Data Team. This is an exciting opportunity to work on a nationally significant programme powered by NHS health data — helping researchers solve complex challenges on a truly industrial with global significance.
In this role, you’ll bring your experience working with NHS datasets to help design, build, and maintain data pipelines that enable trusted, high-quality insights. You’ll collaborate closely with colleagues across multiple disciplines — from Researchers and Epidemiologists to Software Engineers and Product Leads — contributing to a shared code base that delivers real-world health data for discovery.

REQUIREMENTS

  • Experience building and maintaining robust, scalable and efficient data pipelines. Capable of processing very large amounts of data based on feeds from multiple systems using a range of different technologies.
  • You’re an empathetic communicator, comfortable bridging technical and non-technical perspectives
  • You’re confident working with NHS health data and understand the nuances of secondary and primary care datasets (Hospital Episodes Statistics, Death registry data, A&E data etc) as well as Primary care (GP data) would be advantageous.
  • Highly proficient in Python with solid command line knowledge and Unix skills.
  • Good understanding of cloud environments (ideally Azure), distributed computing and optimising workflows and pipelines.
  • Understanding of common data transformation and storage formats, e.g. Apache Parquet, Delta tables.
  • Understanding of containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
  • Working knowledge using Spark, Databricks, Data Lakes.
  • Follow best practices like code review, clean code and unit tests.
  • You’re comfortable working in an agile development team, familiar with version control and Git/GitHub.
  • Awareness/interest of data standards such as GA4GH ( https://www.ga4gh.org/) and FAIR (https://www.go-fair.org/fair-principles/).
  • You’re experienced in contributing to and navigating shared codebases within multi-person teams
Responsibilities
  • Support the build of data pipelines from data providers to our primary data store and trusted research environment. Support the design, scoping and build of data flows.
  • Produce logic for data transformation steps as code, which meets the requirements for our end users and builds well curated, accessible and quality controlled data for analysis.
  • Contribute to code base for multiple data pipelines while ensuring best coding practises are used.
  • The opportunity to work with Data Scientists and Epidemiologists to understand their data requirements and collaborate with them to deliver the data needed for their projects.
  • Keep abreast of best practice in data engineering across industry, research and Government and facilitating the adoption of these standards.
Loading...