Senior Linux Engineer (HPC Specialist) at StoneX Group
Bogotá, Cundinamarca, Colombia -
Full Time


Start Date

Immediate

Expiry Date

08 May, 25

Salary

0.0

Posted On

09 Feb, 25

Experience

0 year(s) or above

Remote Job

No

Telecommute

No

Sponsor Visa

No

Skills

Python, Infiniband, Ansible, Software, Communication Skills, Storage Systems, Collaborative Environment, Linux System Administration

Industry

Information Technology/IT

Description

Overview:
We are looking for a highly skilled Linux HPC (High-Performance Computing) Engineer to join our team. You will be responsible for the design, development, and management of our high-performance computing infrastructure. This role requires expertise in Linux systems, large-scale storage solutions, networking, and various HPC technologies such as InfiniBand, Nvidia Bright Cluster Manager, and the Lustre filesystem.

Responsibilities:

  • Design, configure, and maintain Linux-based high-performance computing clusters.
  • Manage and optimize storage systems, including parallel filesystems such as Lustre.
  • Support and troubleshoot InfiniBand networks, ensuring low-latency communication across HPC nodes.
  • Deploy and administer Nvidia Bright Cluster Manager for managing and monitoring cluster performance.
  • Work closely with research teams to understand and support computational workflows and needs.
  • Provide support for software installations, updates, and the tuning of HPC applications.
  • Ensure high availability and scalability of the HPC environment.
  • Monitor cluster health, diagnose performance bottlenecks, and implement solutions to ensure optimal system operation.
  • Collaborate with internal and external stakeholders to ensure system integrity, security, and compliance with data policies.
  • Maintain documentation for system configuration, procedures, and user support.

Qualifications:

  • Proven experience as an HPC Engineer or similar role.
  • Strong expertise in Linux system administration (preferably RHEL/CentOS).
  • In-depth knowledge of storage systems and parallel filesystems, specifically Lustre.
  • Experience with InfiniBand networking and related technologies.
  • Hands-on experience with Nvidia Bright Cluster Manager or similar tools.
  • Scripting and automation skills (e.g., Bash, Python, Ansible, Chef).
  • Experience in troubleshooting hardware and software in HPC environments.
  • Ability to work in a collaborative environment with excellent communication skills.
Responsibilities
  • Design, configure, and maintain Linux-based high-performance computing clusters.
  • Manage and optimize storage systems, including parallel filesystems such as Lustre.
  • Support and troubleshoot InfiniBand networks, ensuring low-latency communication across HPC nodes.
  • Deploy and administer Nvidia Bright Cluster Manager for managing and monitoring cluster performance.
  • Work closely with research teams to understand and support computational workflows and needs.
  • Provide support for software installations, updates, and the tuning of HPC applications.
  • Ensure high availability and scalability of the HPC environment.
  • Monitor cluster health, diagnose performance bottlenecks, and implement solutions to ensure optimal system operation.
  • Collaborate with internal and external stakeholders to ensure system integrity, security, and compliance with data policies.
  • Maintain documentation for system configuration, procedures, and user support
Loading...