AI Data Engineer | 2026HP02005/#aaPng5f6 at Mindverse Consulting Services Limited
Hyderabad, Telangana, India -
Full Time


Start Date

Immediate

Expiry Date

27 May, 26

Salary

0.0

Posted On

26 Feb, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Vector Database Mastery, Advanced Python, Rust, Apache Spark, Flink, Kafka, LLM Data Tooling, Unstructured.io, LlamaIndex, LangChain, MLOps, DataOps, DVC, Airflow, Prefect, Embedding Models

Industry

IT Services and IT Consulting

Description
Job Summary We are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents. Job Responsibilities · Vector & Graph ETL: Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus). · Semantic Data Modeling: Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds. · Knowledge Graph Construction: Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses. · Automated Data Labeling & Synthetic Data: Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation. · Stream Processing for Agents: Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen. · Data Reliability & "Drift" Detection: Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale. Essential Skills · Vector Database Mastery: Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant. · Advanced Python & Rust: Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions. · Big Data Ecosystem: Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred). · LLM Data Tooling: Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization. · MLOps & DataOps: Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows. · Embedding Models: Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology. Additional qualifications: · Chunking Strategy Architect: You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance. · Cold/Warm/Hot Storage Strategy: Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold). · Privacy & Redaction Pipelines: Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data. Background Check required No criminal record Others Work mode- Hybrid model working (3 days work from office) Office Location-Rai Durg, Hyderabad Interview rounds-3-4 rounds of interviews.
Responsibilities
The role involves architecting high-performance data infrastructure for autonomous AI agents, focusing on building Dynamic Context Windows and managing Real-time Semantic Indexes. Key tasks include designing Vector & Graph ETL pipelines, constructing Knowledge Graphs, and implementing automated data labeling.
Loading...