Backend Engineer - AI/ML Infrastructure at CloudGeometry
Remote, Oregon, USA -
Full Time


Start Date

Immediate

Expiry Date

14 Nov, 25

Salary

0.0

Posted On

14 Aug, 25

Experience

7 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Infrastructure, Containerization, Python, Amazon Web Services, Aws, Communication Skills, Software, Typescript

Industry

Information Technology/IT

Description

Remote
At CloudGeometry, we’re redefining how modern data and AI systems are built. As a leading cloud-native engineering firm, we work with pioneering technology companies to deliver high-impact solutions across infrastructure, machine learning, and intelligent applications.
We are looking for a highly skilled AI Infrastructure Engineers x 5 people to join our growing team supporting large-scale AI/ML systems. This is a hands-on engineering role focused on building scalable, secure, and production-ready infrastructure that powers ML workflows end-to-end—from experimentation to deployment and monitoring.

What You’ll Do

  • Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments.
  • Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows.
  • Develop APIs and backend services in TypeScript and Python to support model integration and orchestration.
  • Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred).
  • Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines.
  • Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features.
  • Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets.
  • Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines.

Why Join CloudGeometry?
You’ll work alongside top-tier engineers across the US, LATAM, and Europe on cutting-edge projects in AI, cloud, and enterprise SaaS. We value deep technical curiosity, strong collaboration, and a bias for action in solving meaningful problems.
-

SKILLS

  • Large Language Models (LLM)

  • Software as a Service (SaaS)

  • Databricks Products
  • Python (Programming Language)
  • Infrastructure
  • TypeScript
  • MLflow
  • MLOps
  • Amazon Web Services (AWS)

REQUIREMENTS

What We’re Looking For

  • 7+ years in software or infrastructure engineering with proven experience supporting AI/ML systems.
  • Deep hands-on experience with AWS services and modern IaC practices (Terraform/CDK).
  • Strong backend programming skills in TypeScript and Python.
  • Production-level use of MLFlow for model management and deployment.
  • Expertise in containerization (Docker), CI/CD automation, and orchestration tools.
  • Solid understanding of designing scalable and secure systems in cloud-native environments.
  • Strong communication skills, able to bridge gaps between engineering and product stakeholders.
  • Comfortable in fast-paced, collaborative environments working across time zones.

Nice to Have

  • Exposure to LLM infrastructure and frameworks (e.g., DSPy, LangChain).
  • Knowledge of LLM performance metrics: latency, cost monitoring, and usage optimization.
  • Familiarity with semantic search tools and vector stores (e.g., OpenSearch, Pinecone).
Responsibilities
  • Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments.
  • Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows.
  • Develop APIs and backend services in TypeScript and Python to support model integration and orchestration.
  • Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred).
  • Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines.
  • Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features.
  • Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets.
  • Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines
Loading...