HPC AI/ML Software Engineer (Scientist 2/3)
at Los Alamos National Laboratory
Los Alamos, New Mexico, USA -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 18 Jun, 2024 | Not Specified | 18 Mar, 2024 | 1 year(s) or above | Docker,Algorithms,Triad,Visualization,Access,Analytics,Programming Languages,Integration,Python,Ml,Federal Government,Addition,Data Analysis,Devops,Pipelines,Job Scheduling,Unsupervised Learning,Deep Learning,Reinforcement Learning,Leadership,Kubernetes | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
SCIENTIST 2 ($99,200 - $164,100)
The successful candidate will perform the full spectrum of tasks, including but not limited to:
- Research, evaluate and recommend AI/ML software for use on LANL systems.
- Work closely with HPC users to integrate AI/ML models into production HPC platforms. Provide support and guidance to HPC user community running AI/ML workflows on high performance computing systems
- Help establish best working practices for users and HPC around AI/ML workflows
- Together with subject matter experts help develop and implement plan for AI/ML data management (including but not limited to IO optimization, transfer and archival)
- Collaborate with stakeholders to understand requirements and translate them into technical solutions
- Work closely with system administrators to troubleshoot problems encountered by applications running on HPC systems
- Contribute to the development of technical presentations, papers, technical reports, etc.
- Stay updated on the latest advancements in AI/ML technologies and best practices
MINIMUM JOB REQUIREMENTS:
This requires both breadth and depth of expertise to create, recommend, and approve designs
- Proficiency in high and low level programming languages such as Python, C/C++, or equivalent languages.
- Experience developing, supporting, and using AI/ML solutions and pipelines together with a strong foundation in algorithms and techniques - such as supervised and unsupervised learning, deep learning and reinforcement learning
- Excellent problem-solving skills and attention to detail.
- Strong communication and collaboration skills.
- Ability to work effectively in a fast-paced, dynamic multi-disciplinary environment.
ADDITIONAL JOB REQUIREMENTS FOR SCIENTIST 3:
In addition to the requirements outlined above, qualification at the higher level requires:
- Proven experience (2+ years) working as an AI and Machine Learning Engineer or similar role
- Leadership: Experience as the technical lead on technical projects
- HPC Computing Experience: Experience working in a production computing environment, preferably with HPC systems or at large scale. Working knowledge of networking concepts and practices.
- Hands-on experience with popular ML frameworks and libraries such as TensorFlow, PyTorch, scikit-learn, etc.
EDUCATION/EXPERIENCE AT LOWER LEVEL
Position requires a Bachelor’s degree in a STEM field from an accredited college and university and 4 years of related experience or equivalent experience directly related to the occupation.
EDUCATION/EXPERIENCE AT HIGHER LEVEL
Position requires a Master’s degree in a STEM field from an accredited college or university and 6 years of relevant experience or an equivalent combination of education and experience directly related to the occupation.
DESIRED QUALIFICATIONS:
- Experience with DevOps including CI/CD pipelines
- Ability to develop and create solutions to difficult problems often requiring integration of conflicting or incomplete data on a fast-paced schedule
- Familiar with conducting data analysis, data preprocessing, and feature engineering to prepare data for model training.
- Skills to train, validate, and fine-tune machine learning models using various techniques and algorithms.
- Experience with containerization and orchestration tools such as Docker, Kubernetes, Charliecloud etc.
- Familiarity with GPU optimizations in a scientific computing environment working wtih large multi-physics, molecular dynamics, material, climate, or genomics models
Work Location: The work location for this position is hybrid and is located in Los Alamos, NM. Hybrid is defined as working partially onsite/partially offsite but within 2 hours ground commute of this location. All work locations are at the discretion of management and can change at any time with appropriate notice.
Position commitment: Regular appointment employees are required to serve a period of continuous service in their current position in order to be eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the time required, they may only apply for Laboratory jobs with the documented approval of their Division Leader. The position commitment for this position is 1 year.
Examples of experience and research areas include, but are not limited to:
- Computational performance of AI/ML algorithms, including the ability to use modern computing hardware and software frameworks efficiently.
- Data parallelism, model-parallelism, collective communication patterns and strategies for large-scale, distributed ML using frameworks e.g. Python (PyTorch, TensorFlow etc).
- Visualization of large-scale HPC/AI discovery campaigns.
- Operational data (power, energy, CPU/GPU utilization, job scheduling, large scale storage and I/O traces, system logs) analytics to enable data-driven intelligence and facility innovation.
- Deployment and use of large language models and other foundation models
Responsibilities:
WHAT YOU WILL DO
The High Performance Computing Environments group (HPC-ENV) is seeking driven HPC data scientists in the very broad areas of HPC and AI/ML overlap, as a Scientist 2 or 3. This position may require engagement in every phase of the system development lifecycle including: requirements generation, system and software design, implementation, integration & test, and verification & validation.
Examples of experience and research areas include, but are not limited to:
- Computational performance of AI/ML algorithms, including the ability to use modern computing hardware and software frameworks efficiently.
- Data parallelism, model-parallelism, collective communication patterns and strategies for large-scale, distributed ML using frameworks e.g. Python (PyTorch, TensorFlow etc).
- Visualization of large-scale HPC/AI discovery campaigns.
- Operational data (power, energy, CPU/GPU utilization, job scheduling, large scale storage and I/O traces, system logs) analytics to enable data-driven intelligence and facility innovation.
- Deployment and use of large language models and other foundation models.
The HPC Division supports the Los Alamos National Laboratory (LANL) mission by managing a world-class supercomputing center. We support stockpile stewardship for NNSA/DOE and accelerate scientific discovery for scientists. We integrate and support some of the world’s largest supercomputers during an exciting time in computing with the focus on traditional large scale simulations, data science, artificial intelligence, and machine learning.
HPC-ENV manages how users interact with the HPC systems at LANL which helps secure the nation and pushes the boundaries of science and innovation. Several teams within HPC-ENV are responsible for the broad range of HPC platforms, programming and runtime environments, software, application optimization and readiness, software engineering, user support & services for a large and diverse customer base. We provide support and services to many production platforms at a world-class computing facility to ensure customers can accomplish their research and mission at extreme scale.
This position will be filled at either the Scientist 2 or Scientist 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.
The successful candidate will perform the full spectrum of tasks, including but not limited to:
- Research, evaluate and recommend AI/ML software for use on LANL systems.
- Work closely with HPC users to integrate AI/ML models into production HPC platforms. Provide support and guidance to HPC user community running AI/ML workflows on high performance computing systems
- Help establish best working practices for users and HPC around AI/ML workflows
- Together with subject matter experts help develop and implement plan for AI/ML data management (including but not limited to IO optimization, transfer and archival)
- Collaborate with stakeholders to understand requirements and translate them into technical solutions
- Work closely with system administrators to troubleshoot problems encountered by applications running on HPC systems
- Contribute to the development of technical presentations, papers, technical reports, etc.
- Stay updated on the latest advancements in AI/ML technologies and best practice
REQUIREMENT SUMMARY
Min:1.0Max:6.0 year(s)
Information Technology/IT
IT Software - Other
Software Engineering
Graduate
STEM
Proficient
1
Los Alamos, NM, USA