High Performance Computing (HPC) Linux Administrator

at  Terrestrial Energy

Oakville, ON L6H 0C3, Canada -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate01 May, 2025Not Specified01 Feb, 20253 year(s) or aboveMathematics,Rhel,Interconnects,Storage Systems,Virtualization,Python,C,Computer Science,File Systems,C++,Operating Systems,Vmware,Programming Languages,FortranNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

REQUIREMENTS

  • Degree or diploma in Computer Science. Mathematics or Engineering or equivalency through more than 5 years systems administration in a UNIX/Linux environment or HPC environment
  • Certification in Red Hat Certified Systems Administrator (RHCSA), an entry-level certification linked to the Red Hat Enterprise Linux (RHEL) system and/or Red Hat Certified Engineer (RHCE)
  • 3+ years of proven, hands-on experience: Linux/UNIX systems administration preferably in a large-scale computing environment
  • Proven experience managing an HPC grid, Slurm or equivalent scheduler
  • Experience with programming languages commonly used in HPC such as C, C++, Python, Fortran, or similar.
  • Expert knowledge of RHEL and other Linux based operating systems
  • Knowledge of network protocols and interconnects (e.g., InfiniBand, Ethernet).
  • Knowledge of enterprise storage systems, including parallel file systems
  • Good understanding of virtualization and virtual machine technologies such as VMware vSphere and HyperV

Responsibilities:

  • Deploy, configure, and maintain HPC systems (RHEL), including clusters, servers, storage systems, and networking hardware
  • Analyze and optimize system performance, including job scheduling, resource allocation, and load balancing
  • Implement and maintain high-performance software libraries, tools, and frameworks such as MPI, OpenMP, CUDA, and/or other parallel programming environments.
  • Monitor server health, resource utilization, and system logs to troubleshoot and resolve issues related to system performance, hardware/software failures and network problems
  • Work closely with engineering teams to optimize computational workflows and advise on best practices for scaling simulations and computations.
  • Maintain detailed documentation for systems, configurations, and workflows. Provide training to end-users to help optimize usage of HPC resources.
  • Keep up-to-date with the latest industry trends, tools and technologies in high-performance computing and parallel processing
  • Participate in on-call and in-office support rotations, and provide 24/7 support for critical issues


REQUIREMENT SUMMARY

Min:3.0Max:5.0 year(s)

Information Technology/IT

IT Software - Network Administration / Security

Software Engineering

Diploma

Computer Science

Proficient

1

Oakville, ON L6H 0C3, Canada