Data Scientist (ML, Speech, NLP & Multimodal Expertise) | Manchester at TransPerfect

Manchester, England, United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

01 Dec, 25

Salary

0.0

Posted On

01 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Infrastructure, Deep Learning, Evaluation Design, Glue, Machine Learning

Industry

Information Technology/IT

Description

We are looking to hire a Data Scientist with strong expertise in machine learning, speech and language processing, and multimodal systems. This role is essential to driving our product roadmap forward, particularly in building out our core machine learning systems and developing next-generation speech technologies.
The ideal candidate will be capable of working independently while effectively collaborating with cross-functional teams. In addition to deep technical knowledge, we are looking for someone who is curious, experimental, and communicative.

Responsibilities

KEY RESPONSIBILITIES:

Create maintainable, elegant code and high-quality data products that are modeled, well-documented, and simple to use.
Build, maintain, and improve the infrastructure to extract, transform, and load data from a variety of sources using SQL, Azure, GCP and AWS technologies.
Perform statistical analysis of training datasets to identify biases, quality issues, and coverage gaps.
Implement automated evaluation pipelines that scale across multiple models and tasks.
Create interactive dashboards and visualization tools for model performance analysis.

ADDITIONAL RESPONSIBILITIES:

Design and implement robust data ingestion pipelines for massive-scale text and speech corpora including automated data preprocessing and cleaning pipelines.
Create data validation frameworks and monitoring systems for dataset quality.
Develop sampling strategies for balanced and representative training data.
Implement comprehensive experiment tracking and hyperparameter optimization frameworks.
Conduct statistical analysis of training dynamics and convergence patterns.
Design A/B testing frameworks for comparing different training approaches.
Create automated model selection pipelines based on multiple evaluation criteria.
Develop cost-benefit analyses for different training configurations.
Design comprehensive benchmark suites with statistical significance testing.
Develop fairness metrics and bias detection systems.
Build real-time monitoring systems for model performance in production.
Implement feature drift detection and data quality monitoring.
Design feedback loops to capture user interactions and model effectiveness.
Create automated retraining pipelines based on performance degradation signals.
Develop business metrics and ROI analysis for model deployments.