Senior Systems Engineer - HPC
at Core42
Abu Dhabi, أبو ظبي, United Arab Emirates -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 20 Oct, 2024 | Not Specified | 20 Jul, 2024 | N/A | Good communication skills | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
How To Apply:
Incase you would like to apply to this job directly from the source, please click here
Responsibilities:
TO QUALIFY FOR THE ROLE YOU MUST HAVE
- Bachelor’s degree in Information Technology, Computer Science or relevant field.
- Minimum 7 years of hands-on experience in High-Performance Computing (HPC) systems administration and infrastructure management
- Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
- Proficiency in parallel computing principles, distributed computing, and cluster management.
- Comprehensive knowledge and hands-on experience in the system administration of Linux environments.
- Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments ( Slurm, LSF or PBS, K8S )
- Advanced knowledge of Data Center network design and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc].
- Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
- Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
- Proficiency in one or more of the parallel libraries/languages such as MPI, OpenMP, OneAPI and CUDA.
- Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
- Strong scripting and automation skills (e.g., Python, Bash) for system administration tasks.
- Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
- In-depth knowledge of performance tuning and optimization techniques for HPC systems.
- Familiarity with containerization and orchestration (Docker, Kubernetes)
- Experience with monitoring and observability (e.g. Prometheus, Grafana, Nagios, Zabbix, Ganglia, ELK)
- Effective communication and collaboration skills to work with cross-functional teams.
- Relevant certification in cloud computing, virtualization, container technologies and systems architecture are advantageous.
Responsibilities:
- Oversee the design, deployment, and optimization of the HPC infrastructure, including hardware, platform, software, networking, and storage components.
- Partake in preparation and review of HLD, LLD documents, scope of work, RFIs, RFPs and RFQs.
- Lead efforts to maximize the efficiency and performance of HPC systems, ensuring optimal resource utilization and minimal downtime.
- Collaborate closely with product and architecture teams to understand and implement customer computational needs and requirements. Provide tailored technical solutions that align with company’s strategic goals.
- Develop and implement automation solutions and tools for deployment and management.
- Set up monitoring, logging, and alerting systems.
- Act as L3 support for complex technical issues, perform root cause analysis, and implement solutions to ensure the reliability and availability of HPC systems.
- Maintain comprehensive documentation of HPC configurations, procedures, and best practices to facilitate knowledge sharing and future reference.
- Ensure the security and compliance of the HPC infrastructure, implementing necessary safeguards, and adhering to company standards and regulations
- Collaborate with HPC vendors and suppliers for hardware and software procurement, support, and delivery.
- Assist in budget planning and management for HPC-related expenditures, ensuring cost-effective solutions.
- Stay at the forefront of HPC technology trends, evaluating and recommending new technologies and practices to enhance HPC capabilities
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Other
Information Technology
Graduate
Information technology computer science or relevant field
Proficient
1
Abu Dhabi, United Arab Emirates