Data Manager for Machine Learning Datasets
at ECMWF
Reading RG2 9AX, , United Kingdom -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 20 Jan, 2025 | GBP 71451 Annual | 21 Oct, 2024 | N/A | Python,Big Data Analytics,Fundamentals,Development Projects,Numerical Weather Prediction,Languages,International Exchange,Machine Learning | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
JOB SUMMARY
ECMWF is building a world-leading, machine learning based probabilistic weather forecasting system (AIFS), to complement our existing physics-based system (IFS). We are pioneering the operationalisation of machine learning forecasting models in this domain.ECMWF now runs both deterministic and probabilistic AIFS forecasts daily, providing open data and products to users around the world.Within the Destination Earth initiative, AIFS workflows are being expanded towards an Earth-system model capturing land, ocean sea-ice and wave processes.
Data is the lifeblood of machine learning, with well-curated datasets being vital for learning accurate models. In this position you will play a leading role in the management of training datasets for machine learning models including the AIFS. You will manage machine learning datasets for ECMWF activities, such as for operational configurations, Destination Earth applications and ECMWF’s Member and Cooporating State undertakings. This involves liasing with users inside and outside of ECMWF, understanding the requirements for new datasets, and life-cycle management and curation of datasets between HPC systems across Europe.
This role is in the Data Archives and Dissemination Services Team of the Production Services Section. The team is responsible for archiving of operational and research data into the MARS archive and Fields DataBase (FDB) and the generation and dissemination of ECMWF’s products. The Production Services Section is responsible for the operational production services of ECMWF, including in the framework of DestinE and Copernicus services, working closely with teams across the organisation to maintain, develop and manage the operational forecasting systems and associated data services.
EDUCATION:
- An advanced university degree (EQF Level 7 or above) or equivalent professional experience is required
EXPERIENCE:
- Experience curating large datasets (terabytes to petabytes) required
- Programming skills in Python and/or scripting languages in a UNIX/Linux environment required.
- Experience handling scientific data formats in High Performance Computing environments is desirable.
- Experience of working collaboratively on software development projects is desirable
KNOWLEDGE AND SKILLS (INCLUDING LANGUAGE):
- Experience working in a scientific environment
- Experience working in a High Performance Computing environment
- Familiarity with big-data analytics, cloud technologies and machine learning fundamentals
- An understanding of numerical weather prediction or meteorological applications
- Knowledge of standardised data formats for international exchange
The following skills and experience would be an advantage. However, you are encouraged to apply even if you feel you don’t precisely meet all the requirements.
Responsibilities:
- Manage and support the data handling requirements of ML applications.
- Taking the responsibility for relevant elements in terms of creating, storing and serving of datasets to be used for machine learning applications, in close collaboration with ECMWF ML experts and with experts in ECMWF’s Member and Cooperating States.
- Collaborate with research and technical teams at ECMWF on ML developments, including the gathering of future requirements, producing projections of usage and performing resource capacity planning.
- Act as Data Governance (DGov) facilitator for matters related to machine-learing, working collaboratively with ECMWF’s existing DGov Facilitators and building expertise in the data format standards involved.
- Act as Data Curator for ML datasets, with emphasis on the maintenance of catalogues and ensuring data accuracy and completeness.
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Other
Software Engineering
Graduate
Proficient
1
Reading RG2 9AX, United Kingdom