Staff MLOps Engineer at HCA Healthcare
Nashville, TN 37203, USA -
Full Time


Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

0.0

Posted On

04 Sep, 25

Experience

7 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Components, Code, Computer Science, Security, Google Cloud Platform, Color, Design, Patient Care, Scalability, Product Management, Dashboards, Testing, Project Management Skills, Stakeholder Management, Platform Development, Data Science, Python, High Availability

Industry

Information Technology/IT

Description

INTRODUCTION

Last year our HCA Healthcare colleagues invested over 156,000 hours volunteering in our communities. As a Staff MLOps Engineer with HCA Healthcare you can be a part of an organization that is devoted to giving back!

NOTE: ELIGIBILITY FOR BENEFITS MAY VARY BY LOCATION.

Would you like to unlock your potential with a leading healthcare provider dedicated to the growth and development of our colleagues? Join the HCA Healthcare family! We will give you the tools and resources you need to succeed in our organization. We are looking for an enthusiastic Staff MLOps Engineer to help us reach our goals. Unlock your potential!

JOB SUMMARY AND QUALIFICATIONS

Position Summary
The Staff MLOps Engineer plays a pivotal role in shaping our MLOps practice within ITG by building and enhancing a scalable, reliable, and cutting-edge Machine Learning Operations (MLOps) platform. This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline workflows, enable seamless collaboration, and drive innovation.
As a key contributor to the organization’s AI/ML strategy, you will partner with cross-functional teams, including data scientists, product managers, and cloud engineers, to align platform development with business objectives. Your work will directly support the deployment of Responsible AI solutions that prioritize transparency, fairness, and ethical practices.

Major Responsibilities:

  • Platform Development: Lead the enhancement of the AI platform to improve the developer experience for data and ML engineers. Optimize workflows by integrating state-of-the-art tools and technologies, ensuring scalability and efficiency.
  • Cloud Infrastructure Design and Management: Architect and manage the cloud infrastructure supporting the MLOps platform, leveraging infrastructure-as-code (IaC) tools like Terraform. Optimize for scalability, security, cost-effectiveness, and high availability.
  • Cross-Functional Collaboration and Stakeholder Management: Partner with data science, product management, engineering, and business teams to understand their requirements and ensure the MLOps platform effectively supports their needs. Effectively communicate technical concepts and strategies to both technical and non-technical audiences.
  • AI/ML Reliability and Observability: Collaborate with the AI/ML reliability engineering team to design and implement components that ensure the platform’s operational reliability, observability, and fault tolerance.
  • Cross-Disciplinary Knowledge: Apply knowledge from related disciplines, such as data science and health/biology sciences, to design holistic MLOps solutions that meet the unique needs of the organization.
  • DevOps for Machine Learning Workloads: Build and maintain robust DevOps pipelines tailored for ML workflows, enabling automated model training, testing, deployment, and monitoring.
  • Tool Development and System Reliability: Design and manage tools to enhance platform reliability, including dashboards, logging systems, and alerting frameworks, to ensure seamless operations.

Advanced proficiency in cloud platforms, especially Google Cloud Platform (GCP). Experience with on-premises and edge deployments is a plus.
Solid understanding of AI/ML concepts, technologies, and best practices, with hands-on experience deploying ML solutions at scale.
Proven ability to work closely with peer teams, data scientists, and product managers to align platform development with strategic goals.
Proficiency in Python and other scripting tools for automation and platform optimization.
Strong analytical and troubleshooting skills, with a track record of solving complex problems under pressure.
Proven experience managing and leading cloud architecture and engineering teams.
Strong background in AI/ML or data science technologies and platform development.
Demonstrated expertise in leading Responsible AI initiatives, with a focus on ethical AI practices.
Excellent communication, leadership, and project management skills.

Education & Experience:

  • Bachelor’s degree preferred
  • Master’s degree in Computer Science, Data Science, AI, or related field preferred
  • 7+ years of experience in ML Ops, Dev Ops, or related role required

HCA Healthcare has been recognized as one of the World’s Most Ethical Companies® by the Ethisphere Institute more than ten times. In recent years, HCA Healthcare spent an estimated $3.7 billion in cost for the delivery of charitable care, uninsured discounts, and other uncompensated expenses.
“There is so much good to do in the world and so many different ways to do it."- Dr. Thomas Frist, Sr.
HCA Healthcare Co-Founder
Be a part of an organization that invests in you! We are reviewing applications for our Staff MLOps Engineer opening. Qualified candidates will be contacted for interviews. Submit your application and help us raise the bar in patient care!
We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status

Responsibilities
  • Platform Development: Lead the enhancement of the AI platform to improve the developer experience for data and ML engineers. Optimize workflows by integrating state-of-the-art tools and technologies, ensuring scalability and efficiency.
  • Cloud Infrastructure Design and Management: Architect and manage the cloud infrastructure supporting the MLOps platform, leveraging infrastructure-as-code (IaC) tools like Terraform. Optimize for scalability, security, cost-effectiveness, and high availability.
  • Cross-Functional Collaboration and Stakeholder Management: Partner with data science, product management, engineering, and business teams to understand their requirements and ensure the MLOps platform effectively supports their needs. Effectively communicate technical concepts and strategies to both technical and non-technical audiences.
  • AI/ML Reliability and Observability: Collaborate with the AI/ML reliability engineering team to design and implement components that ensure the platform’s operational reliability, observability, and fault tolerance.
  • Cross-Disciplinary Knowledge: Apply knowledge from related disciplines, such as data science and health/biology sciences, to design holistic MLOps solutions that meet the unique needs of the organization.
  • DevOps for Machine Learning Workloads: Build and maintain robust DevOps pipelines tailored for ML workflows, enabling automated model training, testing, deployment, and monitoring.
  • Tool Development and System Reliability: Design and manage tools to enhance platform reliability, including dashboards, logging systems, and alerting frameworks, to ensure seamless operations
Loading...