Member of Technical Staff, Pre-Training Data Engineer at Cohere
Toronto, ON, Canada -
Full Time


Start Date

Immediate

Expiry Date

25 Sep, 25

Salary

0.0

Posted On

27 Jun, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Good communication skills

Industry

Information Technology/IT

Description

Location
Toronto, Ottawa, San Francisco, New York, London, Paris
Employment Type
Full time
Location Type
Remote
Department
Modelling
Modeling

Responsibilities

As a Pre-Training Data Engineer, you will play a pivotal role in developing the data infrastructure that underpins Cohere’s advanced language models. Your responsibilities will encompass the end-to-end management of training data, including ingestion, cleaning, filtering, and optimization, as well as data modeling to ensure datasets are structured and formatted for optimal model performance. You will work with diverse data sources—such as web data, code data, multilingual corpora, and synthetic data—to ensure their quality, diversity, and reliability.
In this role, you will design and implement scalable, robust pipelines for data processing, conduct data ablations to evaluate quality, and experiment with data mixtures to enhance model performance. By combining research and engineering, you will bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization.
Your work will be essential to Cohere’s mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.
Please Note: We have offices in London, Paris, Toronto, Ottawa, San Francisco and New York but also embrace being remote-friendly! There are no restrictions on where you can be located for this role.

Loading...