Senior DevOps Engineer Platform Engineering at DRIVENETS
Tel Aviv, Tel-Aviv District, Israel -
Full Time


Start Date

Immediate

Expiry Date

27 Sep, 26

Salary

0.0

Posted On

29 Jun, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Python, Kubernetes, GitHub Actions, Terraform, Helm, AWS, Docker, CI/CD, DevSecOps, Linux, EKS, Infrastructure as Code, Shell Scripting, Platform Engineering, Cloud Architecture, Observability

Industry

Software Development

Description
About the Company DriveNets is a leader in large-scale networking solutions for AI infrastructure and service providers. The company's disaggregated networking architecture transforms the economics of large-scale infrastructures while maximizing performance, utilization, and operational efficiency. Its high-performance AI fabric maximizes GPU utilization and accelerates deployments by optimizing the AI stack end-to-end, resulting in higher tokens-per-second and lower cost-per-token. DriveNets' solutions power production networks for global tier-1 operators like AT&T and Comcast, and scale multi-vendor AI infrastructures at foundation model labs, NeoClouds, and enterprises. Responsibilities - Design, build, and operate the internal engineering platform powering DriveNets' build, test, deployment, and security validation workflows at scale - Write and maintain production-grade Python and shell tooling that drives platform automation — this is a hands-on coding role, not just pipeline configuration - Architect and manage hybrid cloud/on-prem execution infrastructure, including large-scale Kubernetes runner pools across multiple AWS regions - Own and evolve CI/CD pipelines at scale using GitHub Actions, including reusable workflows, ARC-based runner orchestration, and build caching strategies (BuildKit, sccache, Valkey) - Operate and tune DinD environments (Sysbox, EBS/NVMe, overlay storage, MTU/networking) for build, test, and release workloads - Connect and manage self-hosted and on-prem runners, routing physical device (wbox) test jobs by site and device type - Implement DevSecOps controls including least-privilege IAM, OIDC, isolated runner groups, container signing, and automated security scans - Drive platform observability, cost optimization, and reliability improvements across the engineering infrastructure - Collaborate cross-functionally with hundreds of engineers to improve engineering velocity and release confidence - Take end-to-end ownership of complex infrastructure problems and drive them to resolution Requirements Technical Skills - 5+ years of hands-on DevOps experience with a strong software development background — prior development experience is a must - B.Sc. in Computer Science or equivalent practical experience - Strong programming skills in Python (or a similar high-level language); ability to write and own production tooling - Proven experience designing and building scalable systems, automation frameworks, and infrastructure as code using Terraform and Helm - Solid understanding of Linux, containers (Docker), and Git-based workflows - Hands-on experience with CI/CD at scale using GitHub Actions or similar — including reusable actions, workflow design, and automation frameworks - Deep experience with hybrid cloud infrastructure (AWS and on-prem), including EKS, ARC, Karpenter, ECR, S3, Direct Connect, VPC endpoints, IAM/OIDC, and Secrets Manager - Experience operating spot and on-demand runner pools for builds, DinD tests, releases, and security scans across multiple AWS regions - Experience with DinD environments (Sysbox, EBS/NVMe, memory limits, overlay storage, MTU/networking) and build caching (BuildKit, sccache, Valkey) - Experience connecting on-prem/self-hosted runners and routing physical device (wbox) test jobs by site and device type - Experience implementing DevSecOps controls and improving platform observability, cost efficiency, and reliability - Platform & tooling familiarity: Kubernetes (EKS, on-prem) · GitHub Actions · ARC · Karpenter · Terraform · Helm · Docker/DinD · Sysbox · containerd · BuildKit · ECR · S3 · ElastiCache (Valkey) · sccache · Direct Connect · VPC endpoints · IAM/OIDC · Secrets Manager · self-hosted runners Soft Skills - Strong system-level thinking and troubleshooting skills; able to diagnose and resolve complex infrastructure issues independently - Takes end-to-end ownership and drives problems to resolution without hand-holding - Excellent communication and cross-team collaboration skills; comfortable working alongside large engineering organizations Nice to Have / Advantage - Experience with Jenkins - Familiarity with GitHub merge queue - Experience with MinIO or on-prem S3 caching - Hardware-in-the-loop CI experience - MTU/VPC networking tuning expertise - Monorepo CI optimization experience
Responsibilities
Design and operate a scalable internal engineering platform for build, test, and deployment workflows. Develop production-grade Python tooling and manage hybrid cloud/on-prem infrastructure including Kubernetes runner pools.
Loading...