Senior Linux Engineer (HPC Specialist) at StoneX Group

Bogotá, Cundinamarca, Colombia -

Full Time

Start Date

Immediate

Expiry Date

08 May, 25

Salary

0.0

Posted On

09 Feb, 25

Experience

0 year(s) or above

Remote Job

Telecommute

Sponsor Visa

Skills

Python, Infiniband, Ansible, Software, Communication Skills, Storage Systems, Collaborative Environment, Linux System Administration

Industry

Information Technology/IT

Description

Overview:
We are looking for a highly skilled Linux HPC (High-Performance Computing) Engineer to join our team. You will be responsible for the design, development, and management of our high-performance computing infrastructure. This role requires expertise in Linux systems, large-scale storage solutions, networking, and various HPC technologies such as InfiniBand, Nvidia Bright Cluster Manager, and the Lustre filesystem.

Responsibilities:

Design, configure, and maintain Linux-based high-performance computing clusters.
Manage and optimize storage systems, including parallel filesystems such as Lustre.
Support and troubleshoot InfiniBand networks, ensuring low-latency communication across HPC nodes.
Deploy and administer Nvidia Bright Cluster Manager for managing and monitoring cluster performance.
Work closely with research teams to understand and support computational workflows and needs.
Provide support for software installations, updates, and the tuning of HPC applications.
Ensure high availability and scalability of the HPC environment.
Monitor cluster health, diagnose performance bottlenecks, and implement solutions to ensure optimal system operation.
Collaborate with internal and external stakeholders to ensure system integrity, security, and compliance with data policies.
Maintain documentation for system configuration, procedures, and user support.

Qualifications:

Proven experience as an HPC Engineer or similar role.
Strong expertise in Linux system administration (preferably RHEL/CentOS).
In-depth knowledge of storage systems and parallel filesystems, specifically Lustre.
Experience with InfiniBand networking and related technologies.
Hands-on experience with Nvidia Bright Cluster Manager or similar tools.
Scripting and automation skills (e.g., Bash, Python, Ansible, Chef).
Experience in troubleshooting hardware and software in HPC environments.
Ability to work in a collaborative environment with excellent communication skills.

Responsibilities

Design, configure, and maintain Linux-based high-performance computing clusters.
Manage and optimize storage systems, including parallel filesystems such as Lustre.
Support and troubleshoot InfiniBand networks, ensuring low-latency communication across HPC nodes.
Deploy and administer Nvidia Bright Cluster Manager for managing and monitoring cluster performance.
Work closely with research teams to understand and support computational workflows and needs.
Provide support for software installations, updates, and the tuning of HPC applications.
Ensure high availability and scalability of the HPC environment.
Monitor cluster health, diagnose performance bottlenecks, and implement solutions to ensure optimal system operation.
Collaborate with internal and external stakeholders to ensure system integrity, security, and compliance with data policies.
Maintain documentation for system configuration, procedures, and user support