Tech Infra Engineer at Coupang, Inc.
, , -
Full Time


Start Date

Immediate

Expiry Date

22 Dec, 25

Salary

0.0

Posted On

23 Sep, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Kubernetes, Go, C/C++, Python, Java, Cloud Computing, Distributed Systems, Observability, Automation, Performance Tuning, Security, Control Plane, Data Management, Application Lifecycle, Scalability, Resilience

Industry

Software Development

Description
Please complete the attached Internal Transfer Request Form and submit. Please make sure to apply with your Coupang e-mail address. As a Staff Systems Engineer in Developer Platform, you will partner with leaders of multiple platform teams. You will work closely with product to define and implement simple solutions to complex orchestration problems, building a highly scalable, reliable, and efficient platform for our customers. You will engineer and develop Kubernetes controllers, operators, and node-level daemons for the application runtime; drive performance tuning and scaling; and design multi-cluster control-plane capabilities that scale to millions of pods across thousands of clusters. What You Will Do Engineer and develop a unified application platform for hybrid (multi-cluster, multi-region, multi-cloud) application management using Kubernetes controllers and feedback-driven control systems to meet SLOs. Deliver end-to-end automation for application lifecycle (deployments, rollouts, failovers, policy enforcement) to minimize manual work for users. Drive fleet-wide optimization for cost, performance, and latency through data-informed controls and capacity management, improving $/RPS and tail latency. Build resilient, multi-tenant control planes and workflows that safely scale to millions of pods across thousands of clusters. Ensure reliability, security, and governance with clear guardrails, safe defaults, and automated remediation. Partner with product and customers to turn complex orchestration problems into simple, reusable platform primitives and great developer experiences. Champion observability and continuous improvement with measurable, outcome-focused metrics. Basic Qualifications Bachelor’s degree in Computer Science, Electrical Engineering, Math, or a closely related field (or equivalent experience) 10+ years in backend software development and operations Recent experience designing and operating large-scale distributed systems (last 3 years) • Fluency in one or more among Go, C/C++, Python, or Java Proven track record of delivering mission-critical systems Experience with cloud computing using AWS or Azure or GCP Preferred Qualifications Kubernetes API machinery and semantics: SSA, SMP, server-side dry-run, watches/informers/listers, rate-limited workqueues, finalizers, owner references, leader election, API Priority and Fairness Controllers/operators and node daemons in Go: client-go/controller-runtime, reconciliation patterns, backoff and retry, idempotency, partitioned/sharded controllers, HA and failover CRDs and webhooks: versioning, conversion functions/webhooks, validating/mutating admission webhooks, policy frameworks and best practices Pod/runtime semantics: sidecars, init/ephemeral containers, probes (readiness/liveness/startup), lifecycle hooks, termination behavior, PDBs, QoS classes, ResourceQuota/LimitRange, topology spread, affinity/anti-affinity Scaling systems: HPA (resource/custom/external metrics), VPA, cluster autoscaler; multi-dimensional scaling, health-aware/autopilot-style policies; external metrics adapters and SLO-driven scaling Federated and multi-cluster: placement/propagation, failover, drift detection, reconciliation strategies; consistent hashing and partitioning for scale Distributed systems: CRDTs and eventual consistency paradigms; Raft/memberlist/gossip; deep familiarity with etcd, Kafka, Redis and their operational characteristics (compaction, backpressure, retention, failover) Observability and data: Prometheus (cardinality control, recording rules), tracing; experience with vector databases for search and diagnostics; strong time-series forecasting (classical + ML) and statistical modeling for proactive optimization Languages and interfaces: Go (primary), Java/Python as needed; gRPC/protobuf; JSON/YAML/Jsonnet Leadership: ability to handle multiple competing priorities in a fast-paced environment and lead the delivery of large-scale services for complex business offerings Recruitment Process Application Review - Phone Interview - Onsite (or Virtual Onsite) Interview – Offer The exact nature of the recruitment process may vary according to the specific job and may be changed due to scheduling or other circumstances. Interview schedules and the results will be informed to the applicant via the e-mail address submitted at the application stage. Details to Consider This job posting may be closed prior to the stated end date for application if all openings are filled. Coupang has the right to rescind an offer of employment if a candidate is found to have submitted false information as part of the application process. Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws. Privacy Notice​ Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: https://www.coupang.jobs/privacy-policy/ Please complete the attached Internal Transfer Request Form and submit. Please make sure to apply with your Coupang e-mail address.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
Engineer and develop a unified application platform for hybrid application management using Kubernetes. Deliver end-to-end automation for application lifecycle to minimize manual work for users.
Loading...