Senior HPC Cloud Engineer

at  Stanford University

Stanford, California, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate29 Dec, 2024USD 231391 Annual03 Oct, 202410 year(s) or aboveSecurity,Design,Technical Requirements,Budget Management,Mpi,Cost Management,Openmp,Base Pay,Profiling Tools,Containerization,Addition,Parallel Processing,Docker,Orchestration,Storage,Cuda,Storage Solutions,Infrastructure,Demand,Security ProtocolsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

This position is based on Stanford’s main campus with consideration given to the option for a hybrid work schedule (partially onsite and offsite), subject to operational need. Interested candidates must include a resume and cover letter to be considered for this position.
Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa.

ABOUT US

The Stanford Doerr School of Sustainability strives to create a future when humans and nature thrive in concert and in perpetuity. The school is made up of a three-part structure to drive global impact: Our academic departments and programs educate students and create new knowledge across areas of research that are crucial for advancing the long-term prosperity of the planet and people; Institutes bridge scholarship at Stanford and beyond, bringing multiple viewpoints to bear on urgent challenges; The Sustainability Accelerator drives new policy and technology solutions through a worldwide network of partners who work with our teams to develop solutions at a global scale. The school is dedicated to creating and supporting a diverse, equitable, and inclusive environment, and to creating solutions that benefit all people, particularly those most affected by environmental damage and climate change.
For more information on the school, click here.
The Stanford Doerr School of Sustainability (SDSS) is seeking to fill a technical leadership role to help our school expand and scale our research computing (HPC) resource portfolio in the cloud to meet the dynamic growth in computational demand and to provide agility in resource alignment given the variety of workloads required by our research community. This position requires both strategic leadership in directing the cloud-based research computing resources, combined with hands-on experience architecting and delivering cloud based High Performance Research Computing and Data Management services. Reporting to the CIO at Doerr, this position will partner closely with the University IT Research Computing CTO and work with technical leaders across other Schools and units.
This position will require a unique combination of leadership and technical skills including subject matter expertise in High Performance Computing, Cloud Platform Architecture and Design, deep knowledge of various HPC workloads, applications and their requirements, understanding parallel processing and data intensive tasks, system performance and benchmarks.

IN ADDITION, OUR PREFERRED QUALIFICATIONS INCLUDE:

  • Collaboration and Communication Skills: Strong interpersonal skills to work effectively with cross-functional teams including faculty, researchers, developers and other team members and stakeholders, to understand their needs and challenges, and translate complex functional and technical requirements into detailed architecture, design and high performing solutions.
  • Cloud Architecture Design: Expertise in designing scalable and reliable cloud architectures tailored for HPC workloads, considering factors like compute, storage, networking, security, performance and cost.
  • Understanding of HPC Workloads: Deep knowledge of various HPC workloads, applications, and their requirements, including understanding parallel processing, data-intensive tasks, and performance benchmarks.
  • High-Performance Computing Frameworks: Proficiency with HPC frameworks and programming models such as MPI, OpenMP, and CUDA, as well as understanding how to implement them in a cloud environment. Expertise in utilizing Google’s Compute Engine to provision and manage GPU and TPU resources for demanding computational tasks, including training machine learning models.
  • Performance Optimization: Skills in analyzing and tuning the performance of cloud-based HPC applications, including profiling tools and techniques to identify bottlenecks, including I/O.
  • Data Management Solutions: Knowledge of storage solutions suitable for HPC including distributed filesystems and an emphasis on object storage systems and data lifecycle management.
  • Infrastructure as Code (IaC): Experience with IaC tools (e.g., Terraform, CloudFormation and Help Charts) to automate the deployment and management of HPC resources efficiently.
  • Security Architecture: Understanding security protocols and best practices for securing HPC environments in the cloud, including IAM, Service Accounts, network security (VPN), and data encryption.
  • Containerization and Orchestration: Proficiency in using container technologies such as Docker, Singularity and Kubernetes for deploying and scaling HPC applications, along with orchestration practices.
  • Experience orchestrating automated solution deployment from the OS (Linux) to the application stack, using tools such as Kubernetes, Docker, Jupyter, Open On Demand, airflow, data flow, cloud composer and others
  • Capacity Planning and Cost Management: Skills in planning resource capacity to optimize performance and cost, ensuring effective budget management for HPC resources in the cloud.
  • Compliance and Governance: Knowledge of compliance standards relevant to HPC workloads (e.g., HIPAA, GDPR) and implementing governance frameworks to ensure adherence.
    The expected pay range for this position is $203,499 to $231,391 per annum.
    Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location, and external market pay for comparable jobs.
    At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford’s extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.

Responsibilities:

  • Consultative engagement working effectively with faculty and their research team members to gather requirements about their research workloads, applications and data needs in the research domain, and align technical resources in support of expressed research needs.
  • Cloud infrastructure design, operational management and support, with emphasis on Google Cloud Platform (GCP).
  • Developing long term technology plans that align resources that are informed by the input and requirements of the SDSS research community members.
  • Manage HPC Cloud budget, cost and capacity management including planning resource capacity to optimize performance and cost, ensuring effective budget management for HPC resources in the cloud. Leverage GCP’s budget and quota management tools to optimize resource consumption and control costs, especially for large-scale HPC operations.
  • Monitor and Report Utilization Metrics, including implementation of monitoring solutions to track resource usage, job performance, and system health.
  • Explore, test and recommend the adoption of emerging technologies and the development of new methods and workflows to gain efficiencies in computational research
  • Lead workshops on Research Computing in the cloud, collaborating with University IT partners as well as other technical colleagues across campus.
  • Provide leadership and IT solutions for complex problems.
  • Identify applicable new technologies through research, collaboration with peers, and participation in standards organizations, industry groups, panels, and may present at conferences such as Supercomputing and GoogleNext.
  • May work on University-wide task forces and committees related to strategic planning efforts for information technologies.


REQUIREMENT SUMMARY

Min:10.0Max:15.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Stanford, CA, USA