High Performance Computing (HPC) Linux Administrator
at Terrestrial Energy
Oakville, ON L6H 0C3, Canada -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 01 May, 2025 | Not Specified | 01 Feb, 2025 | 3 year(s) or above | Mathematics,Rhel,Interconnects,Storage Systems,Virtualization,Python,C,Computer Science,File Systems,C++,Operating Systems,Vmware,Programming Languages,Fortran | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
REQUIREMENTS
- Degree or diploma in Computer Science. Mathematics or Engineering or equivalency through more than 5 years systems administration in a UNIX/Linux environment or HPC environment
- Certification in Red Hat Certified Systems Administrator (RHCSA), an entry-level certification linked to the Red Hat Enterprise Linux (RHEL) system and/or Red Hat Certified Engineer (RHCE)
- 3+ years of proven, hands-on experience: Linux/UNIX systems administration preferably in a large-scale computing environment
- Proven experience managing an HPC grid, Slurm or equivalent scheduler
- Experience with programming languages commonly used in HPC such as C, C++, Python, Fortran, or similar.
- Expert knowledge of RHEL and other Linux based operating systems
- Knowledge of network protocols and interconnects (e.g., InfiniBand, Ethernet).
- Knowledge of enterprise storage systems, including parallel file systems
- Good understanding of virtualization and virtual machine technologies such as VMware vSphere and HyperV
Responsibilities:
- Deploy, configure, and maintain HPC systems (RHEL), including clusters, servers, storage systems, and networking hardware
- Analyze and optimize system performance, including job scheduling, resource allocation, and load balancing
- Implement and maintain high-performance software libraries, tools, and frameworks such as MPI, OpenMP, CUDA, and/or other parallel programming environments.
- Monitor server health, resource utilization, and system logs to troubleshoot and resolve issues related to system performance, hardware/software failures and network problems
- Work closely with engineering teams to optimize computational workflows and advise on best practices for scaling simulations and computations.
- Maintain detailed documentation for systems, configurations, and workflows. Provide training to end-users to help optimize usage of HPC resources.
- Keep up-to-date with the latest industry trends, tools and technologies in high-performance computing and parallel processing
- Participate in on-call and in-office support rotations, and provide 24/7 support for critical issues
REQUIREMENT SUMMARY
Min:3.0Max:5.0 year(s)
Information Technology/IT
IT Software - Network Administration / Security
Software Engineering
Diploma
Computer Science
Proficient
1
Oakville, ON L6H 0C3, Canada