Principal Platform Engineer at Weekday

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

10 Apr, 26

Salary

8500000.0

Posted On

10 Jan, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Backend Systems, Platform Engineering, Distributed Systems, Python, Cloud Infrastructure, Kubernetes, Kafka, Postgres, AI Observability, System Design, Reliability Engineering, API Design, Technical Leadership

Industry

Retail Apparel and Fashion

Description

This role is for one of our clients Industry: Software Development Seniority level: Mid-Senior level Min Experience: 10 years Location: Bengaluru JobType: full-time \n \n ₹55,00,000 - ₹85,00,000 a year We are building a next-generation AI observability and trust platform that enables enterprises to safely deploy, monitor, and improve AI systems at scale—across traditional ML models, LLMs, generative AI, and agentic workflows. As a Principal Platform Engineer, you will architect and develop the backbone of this platform: high-performance backend services, distributed systems, and cloud-native infrastructure that power AI evaluation, monitoring, and reliability at production scale. This is a hands-on, high-ownership role where you will shape platform architecture, influence engineering standards, and help define what “trustworthy AI” looks like in real-world enterprise environments. What You’ll Own Platform & Backend Architecture Design and build scalable backend services that power AI observability, evaluation, and governance workflows. Architect distributed systems capable of ingesting, processing, and querying high-volume AI telemetry and evaluation data. Develop APIs and services that expose AI performance, reliability, and risk signals to enterprise customers. Distributed Systems & Data Infrastructure Build systems that compute and store advanced AI evaluation metrics such as accuracy, relevance, drift, latency, and hallucination indicators. Design resilient data pipelines using event-driven and streaming architectures. Optimize storage and query layers for scale, performance, and cost efficiency. Reliability, Scale & Operations Define and improve operational standards across availability, latency, SLOs, observability, and incident response. Lead efforts around performance tuning, failure handling, capacity planning, and system resiliency. Embed best practices for testing, CI/CD, and production readiness into platform development. AI Platform Evolution Partner with product, ML, and customer teams to design new evaluation capabilities aligned with emerging AI risks and enterprise needs. Support observability for modern AI workloads including LLMs, GenAI pipelines, and agent-based systems. Contribute to the long-term technical roadmap for responsible and transparent AI systems. Technical Leadership Act as a technical multiplier by reviewing designs and code, raising engineering standards, and guiding architectural decisions. Mentor senior and mid-level engineers, helping them grow in systems thinking and execution. Influence platform direction without formal people management responsibilities. What We’re Looking For Core Experience 10+ years of professional experience building backend or platform systems in production environments. Strong hands-on expertise in Python and backend service development. Deep understanding of distributed systems, concurrency, fault tolerance, and performance optimization. Experience designing APIs, microservices, and data-intensive systems. Infrastructure & Cloud Solid experience with cloud-native architectures on AWS or GCP. Hands-on exposure to Kubernetes, containerized workloads, and modern CI/CD pipelines. Experience with technologies such as Postgres, Redis, Kafka, RabbitMQ, Ray, or similar systems. Familiarity with analytical data stores like ClickHouse or Druid is a strong plus. Leadership & Ownership Proven ability to work autonomously and drive complex initiatives from concept to production. Strong problem decomposition and decision-making skills in ambiguous environments. Excellent communication skills and comfort collaborating across distributed, cross-functional teams. A mentorship-oriented mindset with a passion for building durable systems and strong engineering culture. Bonus Points Experience supporting ML, LLM, or GenAI systems in production. Familiarity with modern LLM frameworks, evaluation tooling, or AI monitoring platforms. Background in developer platforms, infra tooling, or internal platform teams. Why This Role Stands Out Work on a category-defining AI platform at the intersection of backend engineering and responsible AI. High-impact, high-ownership role with architectural influence across the stack. Exposure to cutting-edge AI workloads without requiring ML research background. Opportunity to shape how enterprises build trust, transparency, and reliability into AI systems. Key Skills Backend Systems · Platform Engineering · Distributed Systems · Python · Cloud Infrastructure · Kubernetes · Kafka · Postgres · AI Observability · System Design · Reliability Engineering · API Design · Technical Leadership \n

Responsibilities

As a Principal Platform Engineer, you will architect and develop high-performance backend services and distributed systems for an AI observability platform. You will also define operational standards and lead efforts around performance tuning and system resiliency.