Senior Linux & Infrastructure IT Engineer at Retym

Ramat Gan, Tel-Aviv District, Israel -

Full Time

Start Date

Immediate

Expiry Date

09 May, 26

Salary

0.0

Posted On

08 Feb, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

AWS, Terraform, Ansible, Linux, HPC, Slurm, LSF, Grid Engine, Python, Bash, Prometheus, Grafana, GitHub Actions, GitLab CI, Networking, Storage

Industry

Semiconductor Manufacturing

Description

About the Role We are a fast-growing semiconductor startup building next-generation silicon. Our design and verification pipelines rely on large-scale Linux compute infrastructure spanning AWS and on-prem environments. We are seeking a senior, hands-on Cloud & Infrastructure IT Engineer to own the reliability, performance, and automation of our mission-critical EDA platforms. You will work directly with chip design teams to ensure our compute environments are fast, stable, secure, and ready to scale. Requirements What You’ll Do Operate and scale hybrid AWS + on-prem Linux compute infrastructure for chip design and verification workloads. Own day-to-day reliability, performance tuning, capacity planning, and incident response. Build and maintain AWS environments using Terraform and Ansible. Automate provisioning of VPCs, IAM, EC2, FSx, EBS, S3, VPNs, and security controls. Tune Linux systems for CPU-, memory-, and I/O-intensive EDA workloads. Operate and optimize grid / job scheduling platforms such as Slurm, LSF, or Grid Engine. Design and manage high-throughput storage solutions for simulation pipelines. Develop automation and self-service tooling using Python and Bash. Implement observability and alerting using Prometheus and Grafana. Participate in on-call rotation and lead root-cause analysis for production incidents. Required Qualifications AWS: VPC, EC2, IAM, FSx, EBS, S3, VPN, security controls Infrastructure as Code: Terraform, Ansible Linux / HPC: Kernel, filesystem, and network performance tuning Schedulers: Slurm / LSF / Grid Engine Automation: Python, Bash Observability: Prometheus, Grafana CI/CD: GitHub Actions / GitLab CI Requirements 7+ years of hands-on experience operating large-scale Linux infrastructure. Strong experience managing AWS production environments. Advanced proficiency with Terraform, Ansible, Python, and Bash. Deep understanding of networking, storage, and Linux internals. Comfortable owning business-critical systems in a fast-moving startup. Experience supporting semiconductor / EDA / HPC workloads. Preferred Exposure to Azure or GCP. Experience with cloud cost optimization / FinOps.

Responsibilities

Operate and scale hybrid AWS and on-prem Linux compute infrastructure for chip design and verification workloads. Own day-to-day reliability, performance tuning, capacity planning, and incident response.