Data Engineer - Legal Data AI Processing at Omnilex

Zurich, Zurich, Switzerland -

Full Time

Start Date

Immediate

Expiry Date

06 Jun, 26

Salary

13000.0

Posted On

08 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Data Ingestion, Data Modeling, Schema Design, Citation Handling, RAG, Data Enrichment, Tagging, Classification, Summarization, Embeddings, Entity Extraction, Graph Relationships, Indexing Strategies, Relevance Evaluation, Data Quality, TypeScript

Industry

Software Development

Description

🌟 About You Are you excited about turning messy, multi-jurisdiction legal content into clean, structured, and AI-ready data? Do you enjoy building reliable pipelines for extraction, normalization, chunking, citation handling, tagging, structuring, summarizing, and indexing; then measuring quality and cost? Do you thrive in a fast-paced startup where your work directly powers search, AI answer quality, and analytics? If so, we’d love to hear from you! 🚀 About Omnilex Omnilex is a young dynamic AI legal tech startup with its roots at ETH Zurich. Our passionate interdisciplinary team of 10+ people is dedicated to empowering legal professionals in law firms and legal teams by leveraging the power of AI for legal research and answering complex legal questions. We already stand out with our strong data engineering, including our combination of external data, customer-internal data and our own innovative AI-first legal commentaries. Tasks 🛠️ Your Responsibilities As a Data Engineer focused on AI data processing & integration, your primary focus will be building and owning data flows that make our AI features accurate, explainable, and scalable. Design and maintain ingestion for legal sources (APIs, scraping, bulk data) across jurisdictions with strong reliability and compliance Normalize and model heterogeneous sources into pragmatic, typed schemas (statutes, decisions, commentaries, citations, metadata) Implement citation-aware chunking, sectioning, and cross-referencing so RAG is precise, traceable, and cost-efficient Build enrichment pipelines for tagging, classification, summarization, embeddings, entity extraction, and graph relationships; using AI where it helps Improve search quality via better indexing strategies, analyzers, synonyms, ranking, and relevance evaluation Establish data quality, lineage, and observability (QA checks, coverage metrics, regression tests, versioning) Optimize performance, runtime complexity, DB query times, token usage, and overall pipeline cost Collaborate closely with users and customers to translate user problems and company requirements into robust data and SLAs Communicate your work and findings to the team for continuous feedback and improvement (in English) Requirements ✅ Minimum qualifications Degree in Computer Science, Data Science, or a related field; or equivalent practical experience Strong hands-on experience in data engineering with TypeScript Solid grasp of data structures, algorithms, regexes, and SQL (PostgreSQL) Experience using LLMs/embeddings for practical data tasks (chunking, tagging, summarization, RAG-ready pipelines) Ability to learn quickly and adapt to a dynamic startup environment, with strong ownership and product mindset Availability full-time. On-site in Zurich at least two days per week (hybrid). 🎯 Preferred qualifications You have a Swiss work permit or EU/EFTA citizenship Working proficiency in German (much of our legal data is in German) and proficiency in English Experience with Azure (incl. Azure AI/Cognitive Search), Docker, and CI/CD Familiar with modern scraping/parsing stacks (Playwright/Puppeteer, PDF tooling, OCR) Experience with vector indexing, relevance evaluation, and search ranking Familiar with our stack: Azure / NestJS / Next.js Knowledge and experience with legal systems, in particular Switzerland, Germany, USA 🧑‍⚖️ Benefits 🤝 Benefits Direct impact: your pipelines immediately improve search, answers, and user trust, transforming legal research Autonomy & ownership: Own across ingestion, processing, enrichment, and indexing Team: Professional growth at the intersection of legal, data, and AI with an interdisciplinary team Compensation: CHF 8’000–12’000 per month + ESOP (employee stock options), depending on experience and skills. We’re excited to hear from candidates who are passionate about data engineering and eager to make an impact in the legal tech space. Apply today by pressing on the Apply button.

Responsibilities

The primary focus is building and owning data flows to ensure AI features are accurate, explainable, and scalable, which involves designing ingestion for legal sources and normalizing heterogeneous data into typed schemas. Responsibilities also include implementing citation-aware processing, building enrichment pipelines using AI, improving search quality, and establishing data quality and observability.