Senior Data Scientist, Operations (GenAI) at Argus Media

London WC1X 8NL, , United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

07 Sep, 25

Salary

0.0

Posted On

08 Jun, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Data Extraction, Python, Models, Optimization Techniques, Communication Skills, Statistics, Aws, Academic Background, Cloud, Docker, Azure, Global Teams, Computer Science, Data Engineering, Addition, Containerization, Mathematics, Google Cloud, Collaboration, Nlp

Industry

Information Technology/IT

Description

Senior Data Scientist, Operations (GenAI)
Holborn, London, UK.
Argus is where smart people belong and where they can grow. We answer the challenge of illuminating markets and shaping new futures.

WHAT WE’RE LOOKING FOR

Join our Generative AI team to work on groundbreaking projects that shape the future of AI and data science. Your contributions will directly impact the development of innovative solutions used by global industry leaders. You’ll play a pivotal role in transforming how our data are seamlessly integrated with AI systems, paving the way for the next generation of customer interactions.
We are seeking an experienced Senior Data Scientist to join our Generative AI team. This role will focus on creating and maintaining AI-ready data, leveraging the deep technical knowledge already established within the London team. You will support text and numerical data extraction, curation, and metadata enhancements, accelerating development. You will also help ensure rapid response times, minimizing potential disruptions.

SKILLS AND EXPERIENCE

Academic Background: Advanced degree in AI, statistics, mathematics, computer science, or a related field.
Programming and Frameworks: Deep experience with Python, TensorFlow or PyTorch, and NLP libraries such as spaCy and Hugging Face.
GenAI Tools: Practical experience with LangChain, Hugging Face Transformers, and embedding models for building GenAI applications.
Prompt Engineering: Deep expertise in prompt engineering, including prompt tuning, chaining, and optimization techniques.
LLM Evaluation: Experience evaluating LLM outputs, including using LLM-as-a-judge methodologies to assess quality and alignment.
RAG and Knowledge Graphs: Practical understanding and experience using vector databases. In addition, familiarity with graph-based RAG architectures and the use of knowledge graphs to enhance retrieval and reasoning would be a strong plus.
Cloud: Practical experience with Gemini/OpenAI models and comfortable with cloud platforms such as AWS, Google Cloud, or Azure. Proficient with Docker for containerization.
Data Engineering: Strong understanding of data extraction, curation, metadata enrichment, and AI-ready dataset creation.
Collaboration and Communication: Excellent communication skills and a collaborative mindset, with experience working across global teams.

Responsibilities

AI-Ready Data Development: Design, develop, and maintain high-quality AI-ready datasets, ensuring data integrity, usability, and scalability to support advanced Generative AI models.
Advanced Data Processing: Lead hands-on efforts in complex data extraction, cleansing, and curation for diverse text and numerical datasets. Implement sophisticated metadata enrichment strategies to enhance data utility and accessibility for AI systems.
Algorithm Implementation & Optimization: Implement and optimize state-of-the-art algorithms and pipelines for efficient data processing, feature engineering, and data transformation tailored for LLM and GenAI applications.
GenAI Application Development: Apply and integrate frameworks like LangChain and Hugging Face Transformers to build modular, scalable, and robust Generative AI data pipelines and applications.
Prompt Engineering Application: Apply advanced prompt engineering techniques to optimize LLM performance for specific data extraction, summarization, and generation tasks, working closely with the Lead’s guidance.
LLM Evaluation Support: Contribute to the systematic evaluation of Large Language Models (LLMs) outputs, analysing quality, relevance, and accuracy, and supporting the implementation of LLM-as-a-judge frameworks.
Retrieval-Augmented Generation (RAG) Contribution: Actively contribute to the implementation and optimization of RAG systems, including working with embedding models, vector databases, and, where applicable, knowledge graphs, to enhance data retrieval for GenAI.
Technical Mentorship: Act as a technical mentor and subject matter expert for junior data scientists, providing guidance on best practices in coding, data handling, and GenAI methodologies.
Cross-Functional Collaboration: Collaborate effectively with global data science teams, engineering, and product stakeholders to integrate data solutions and ensure alignment with broader company objectives.
Operational Excellence: Troubleshoot and resolve data-related issues promptly to minimize potential disruptions, ensuring high operational efficiency and responsiveness.
Documentation & Code Quality: Produce clean, well-documented, production-grade code, adhering to best practices for version control and software engineering.