HPC Support Engineer

at  Atos

Timișoara, Timiș, Romania -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate28 Jan, 2025Not Specified29 Oct, 2024N/ABash,Lustre,Perl,Scripting Languages,Puppet,System Monitoring,Gpfs,Python,Configuration Management,Ansible,Storage Systems,Nfs,Infiniband,Storage ManagementNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

HPC SUPPORT ENGINEER

Publication Date: Oct 22, 2024
Ref. No: 522680
Location:Timisoara, RO
Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading positions in digital, cloud, data, advanced computing and security, it brings deep expertise for all industries in more than 47 countries. By uniting unique high-end technologies across the full digital continuum with 47,000 world-class talents, Eviden expands the possibilities of data and technology, now and for generations to come.

CAPABILITIES AND EXPERTISE:

  • System Administration Red Hat expertise.
  • Networking, expertise in configuring and troubleshooting networking setups within HPC clusters, including understanding low-latency interconnects like InfiniBand or Omni-Path.
  • Scripting Proficiency, use scripting languages such as Bash, Python, or Perl for automating routine tasks like cluster monitoring, user onboarding, or job submissions.
  • Configuration Management, familiarity with tools like Ansible, Puppet, or Chef to automate the deployment and configuration of cluster nodes and services.
  • System Monitoring, Implement and manage monitoring tools (e.g., Prometheus, Grafana) to track system health, detect performance bottlenecks, and identify potential hardware or software failures.
  • Storage Management, familiarity with large-scale storage systems such as GPFS, Lustre, or NFS, and the ability to troubleshoot file system issues.

Responsibilities:

  • HPC systems are often clusters of interconnected servers. The engineer is responsible for the administration of these clusters, which includes installation, configuration, and maintenance of hardware and software.
  • Linux is the dominant OS in HPC environments. The engineer ensures that the OS is updated, secure, and optimized for high-performance workloads.
  • HPC environments use job schedulers (e.g., SLURM, PBS, or LSF) to allocate resources efficiently. The engineer manages these schedulers to ensure optimal job performance, queue management, and fair distribution of resources among users.
  • Designing and managing backup solutions for large volumes of data, ensuring minimal data loss in case of hardware failures or other disasters.
  • Interactions with SMC (Smart Management Center) which is the foundation for hosting infrastructure and application micro-services dedicated in managing a HPC supercomputer.
  • Support and maintain technology standards, processes, and policies related to on-prem/cloud Infrastructure in scope.
  • Contribute to international projects by providing consultancy regarding HPC infrastructure architectures (on-premises and cloud).
  • Suggest system changes in accordance with documented SOPs.
  • Produce and maintain appropriate documentation and diagrams describing system setups and overall inventory.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

IT

Graduate

Proficient

1

Timișoara, Romania