HPC Support Engineer
at Atos
Timișoara, Timiș, Romania -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 28 Jan, 2025 | Not Specified | 29 Oct, 2024 | N/A | Bash,Lustre,Perl,Scripting Languages,Puppet,System Monitoring,Gpfs,Python,Configuration Management,Ansible,Storage Systems,Nfs,Infiniband,Storage Management | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
HPC SUPPORT ENGINEER
Publication Date: Oct 22, 2024
Ref. No: 522680
Location:Timisoara, RO
Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading positions in digital, cloud, data, advanced computing and security, it brings deep expertise for all industries in more than 47 countries. By uniting unique high-end technologies across the full digital continuum with 47,000 world-class talents, Eviden expands the possibilities of data and technology, now and for generations to come.
CAPABILITIES AND EXPERTISE:
- System Administration Red Hat expertise.
- Networking, expertise in configuring and troubleshooting networking setups within HPC clusters, including understanding low-latency interconnects like InfiniBand or Omni-Path.
- Scripting Proficiency, use scripting languages such as Bash, Python, or Perl for automating routine tasks like cluster monitoring, user onboarding, or job submissions.
- Configuration Management, familiarity with tools like Ansible, Puppet, or Chef to automate the deployment and configuration of cluster nodes and services.
- System Monitoring, Implement and manage monitoring tools (e.g., Prometheus, Grafana) to track system health, detect performance bottlenecks, and identify potential hardware or software failures.
- Storage Management, familiarity with large-scale storage systems such as GPFS, Lustre, or NFS, and the ability to troubleshoot file system issues.
Responsibilities:
- HPC systems are often clusters of interconnected servers. The engineer is responsible for the administration of these clusters, which includes installation, configuration, and maintenance of hardware and software.
- Linux is the dominant OS in HPC environments. The engineer ensures that the OS is updated, secure, and optimized for high-performance workloads.
- HPC environments use job schedulers (e.g., SLURM, PBS, or LSF) to allocate resources efficiently. The engineer manages these schedulers to ensure optimal job performance, queue management, and fair distribution of resources among users.
- Designing and managing backup solutions for large volumes of data, ensuring minimal data loss in case of hardware failures or other disasters.
- Interactions with SMC (Smart Management Center) which is the foundation for hosting infrastructure and application micro-services dedicated in managing a HPC supercomputer.
- Support and maintain technology standards, processes, and policies related to on-prem/cloud Infrastructure in scope.
- Contribute to international projects by providing consultancy regarding HPC infrastructure architectures (on-premises and cloud).
- Suggest system changes in accordance with documented SOPs.
- Produce and maintain appropriate documentation and diagrams describing system setups and overall inventory.
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Other
IT
Graduate
Proficient
1
Timișoara, Romania