Senior Software Engineer, HPC Platform Modernization

at  Zoox

Foster City, California, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate30 Apr, 2025USD 275000 Annual31 Jan, 20257 year(s) or aboveLeveling,Amazon Web Services,Health Insurance,Disability Insurance,Python,Azure,Training,Aws,Long Term Care Insurance,Life Insurance,KubernetesNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Zoox is looking for an experienced Software Engineer to work on key new frameworks and infrastructure modernization for our custom High-Performance Computing infrastructure and its supporting ecosystem of tools and services. Zoox HPC services combine industry-best scheduling and workload orchestration technologies, such as Ray.io and SLURM, with value-add workflows specifically for Autonomous Vehicle development. These HPC services form the backbone of development workflows across all Zoox software teams, from data engineering to training our AI models in Perception, Planner, Prediction, to simulation, and more. You will take on a breadth of end-to-end responsibilities including distributed system design, algorithmic job scheduling, and adaptive cloud scaling in support of all of Zoox’s computational needs.
The position comes with a high degree of independence and the opportunity to help define Zoox’s compute scaling strategy, both technically and organizationally. You will work closely with stakeholders in Autonomy and Software teams to iterate on world-class developer experiences, incorporating the latest industry tools and best practices.

QUALIFICATIONS



    • 7+ years of experience

    • Experience with Ray.io, particularly Ray Core and Ray Data
    • Experience with Kubernetes, particularly for heterogeneous workloads and clusters
    • Experience with Ray.io and Kubernetes deployed on Amazon Web Services (AWS) or other similar cloud providers such as Azure or GCP
    • Proficiency with Python

    BONUS QUALIFICATIONS



      • Exposure to machine learning workloads (training, inference, data generation, etc) from a compute infra service provider perspective

      • Experience with Kubernetes or SLURM at scale (>10k+ nodes)
      • Experience with SLURM workload manager
        Compensation
        There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. The salary range for this position is $210,000 to $275,000. A sign-on bonus may be offered as part of the compensation package. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate’s relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.
        Zoox also offers a comprehensive package of benefits including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance

      Responsibilities:



        • Evaluate new distributed system paradigms and technologies to meet Zoox’s ever-growing computational and storage needs

        • Strike a balance between incremental improvements to Zoox’s existing in-house HPC infrastructure and greenfield services and abstractions.
        • Create production-grade web service APIs, SDKs, and other tools to provide a world-class developer experience for all of Zoox’s software teams.


        REQUIREMENT SUMMARY

        Min:7.0Max:12.0 year(s)

        Information Technology/IT

        IT Software - Application Programming / Maintenance

        Software Engineering

        Graduate

        Proficient

        1

        Foster City, CA, USA