Senior Systems Architect - HPC

at  Core42

Abu Dhabi, أبو ظبي, United Arab Emirates -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate20 Oct, 2024Not Specified20 Jul, 20245 year(s) or aboveGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities:

TO QUALIFY FOR THE ROLE YOU MUST HAVE

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Software Engineering or related degree in a technology discipline.
  • 7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks and 5+ of experience in designing large-scale HPC environments.
  • Proven track record of successfully completing large-scale infrastructure projects with focus on HPC.
  • Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
  • Proficiency in parallel computing principles, distributed computing, and cluster management.
  • Comprehensive knowledge and hands-on experience in the Linux environments.
  • Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments ( Slurm, LSF or PBS, K8S )
  • Advanced knowledge of Data Center network design and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc].
  • Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
  • Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
  • Proficiency in one or more of the parallel libraries/languages such as MPI, OpenMP, OneAPI and CUDA.
  • Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
  • Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
  • In-depth knowledge of performance tuning and optimization techniques for HPC systems.
  • Solid understanding of cloud computing principles (IaaS, PaaS, SaaS).
  • Experience with Kubernetes and OpenShift, including designing, deploying, and managing Kubernetes and OpenShift clusters.
  • Knowledge of AI/ML platforms (e.g. OpenShift AI, Kubeflow, MLFlow) is highly desirable
  • Familiarity with AI/ML environments and the specific requirements for deploying AI/ML workloads on Kubernetes and OpenShift is highly desirable
  • Experience with monitoring and observability (e.g. Prometheus, Grafana, Nagios, Zabbix, Ganglia, ELK)
  • Understanding of both SQL and NoSQL database management and optimization
  • Knowledge of and experience in using architectural frameworks and methodologies such as TOGAF and Zachman.
  • Familiarity with Agile methodologies (Scrum or Kanban), and an understanding of DevOps principles.
  • Excellent problem-solving and troubleshooting skills with a strong attention to detail

Responsibilities:

  • Collaborate with stakeholders to understand business requirements and translate them into technical solutions.
  • Communicate architectural decisions and strategies to both technical and non-technical audiences. Preparing and delivering presentations on the proposed solutions.
  • Prepare, review, and maintain high-level and low-level design documents, scope of work, RFIs, RFPs and RFQs.
  • Ensure alignment of solutions with organizational goals and industry best practices.
  • Create architectural blueprints and technical documentation for proposed solutions.
  • Providing requirements for equipment specifications, estimating project labor efforts, and liaising with vendors on technical issues.
  • Lead the deployment and configuration of HPC clusters, ensuring scalability, reliability, and performance according to project documentation and design specifications.
  • Oversee the integration of HPC with existing systems and infrastructure.
  • Ensure the solutions and environments adhere to security best practices and organizational policies.
  • Stay updated with the latest trends and advancements in HPC technologies. Engage with vendors and community to stay informed about new features, updates, and best practices.
  • Identify opportunities for process improvements and implement enhancements to the architecture.
  • Evaluate and recommend new tools and technologies to enhance the HPC ecosystem.
  • Engaging in pilot testing and commissioning activities, along with designing and conducting various types of tests: functional, load, and others.
  • Maintain comprehensive documentation for the new and live HPC environments.
  • Develop and deliver training sessions to engineering teams on HPC best practices and usage


REQUIREMENT SUMMARY

Min:5.0Max:7.0 year(s)

Information Technology/IT

IT Software - Other

Information Technology

Graduate

A technology discipline

Proficient

1

Abu Dhabi, United Arab Emirates