Data Engineer at Bjak
Deutschland, , Germany -
Full Time


Start Date

Immediate

Expiry Date

27 Nov, 25

Salary

0.0

Posted On

27 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Sql, Scripting, Datasets, Machine Learning, Transformation, Data Cleaning, Python

Industry

Information Technology/IT

Description

TRANSFORM LANGUAGE MODELS INTO REAL-WORLD APPLICATIONS

We’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world.
This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.

REQUIREMENTS

  • Proven experience preparing datasets for machine learning or fine-tuning large models
  • Strong skills in data cleaning, preprocessing, and transformation for both text and image data
  • Hands-on experience with data labeling workflows and quality assurance for labeled data
  • Familiarity with building and maintaining moderation datasets (safety, compliance, and filtering)
  • Proficiency in scripting (Python, SQL) and working with large-scale data pipelines
Responsibilities

WHY THIS ROLE MATTERS

You’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.

WHAT YOU’LL DO

  • Collect, clean, and preprocess user-generated text and image data for fine-tuning large models
  • Design and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teams
  • Build and maintain automated datasets for content moderation (e.g., safe vs unsafe content)
  • Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needs
Loading...