Senior Machine Learning Engineer - Machine Learning Infrastructure at Flip
Remote, Oregon, USA -
Full Time


Start Date

Immediate

Expiry Date

13 Jul, 25

Salary

0.0

Posted On

14 Apr, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Performance Analysis, Distributed Systems, Technical Documentation, Hardware Architecture, Programming Languages, Hadoop, Open Source, Business Logic, Tuning, Communication Skills, Resource Management, Computer Science

Industry

Information Technology/IT

Description

SENIOR MACHINE LEARNING ENGINEER - MACHINE LEARNING INFRASTRUCTURE

Location: based in NYC or US remote
About Flip.shop:
Welcome to Flip.shop, where innovation meets the social commerce revolution! Fresh off our Series C funding round, we’ve raised $144 million, propelling our valuation to an impressive $1.05 billion. We’re redefining the shopping experience by giving consumers a voice in a space dominated by tech giants. Join us on this exhilarating journey where your technical skills will play a pivotal role in shaping the future of social commerce!

REQUIREMENTS:

  • Education: Bachelor’s degree or higher in Computer Science or a related field, with 3+ years of experience in building scalable systems.
  • Technical Skills: Proficiency in one or two programming languages (C/C++, Golang) within a Linux environment.
  • Solid understanding of GPU hardware architecture, GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis.
  • Experience in deep model inference/training, debugging, and tuning.
  • ML Workflow Knowledge: Familiarity with mainstream machine learning frameworks (e.g., TensorFlow, PyTorch, MxNet).
  • Familiarity with MLOps practices.
  • Experience with big data frameworks (e.g., Spark, Hadoop, Flink) and resource management and task scheduling for large-scale distributed systems.
  • Open-source: Experience in using or designing open-source machine learning lifecycle management systems like TFX.

KEY SKILLS

  • Excellent logical analysis and problem-solving skills with the ability to abstract and decompose complex business logic.
  • Strong sense of responsibility, good learning ability, communication skills, and self-motivation, with the ability to respond and act quickly.
  • Good working document habits, with timely writing and updating of workflow and technical documentation.
Responsibilities

ROLE OVERVIEW:

We are seeking a Senior Machine Learning Engineer - Machine Learning Infrastructure to design, build, and optimize the infrastructure that powers our machine learning systems. You’ll ensure the efficient deployment, scaling, and monitoring of machine learning models, and will help streamline the development lifecycle. This role offers the opportunity to create scalable, production-level systems that support real-time recommendations and drive business growth.

RESPONSIBILITIES:

  • Infrastructure Development: Design and implement scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments. Design and implement machine learning systems for feeds, ads, and search ranking models.
  • Training Infrastructure: Optimize the serving and training infrastructure of machine learning models.
  • Model Training: Enhance the workflow for model training and serving, data pipelines, storage systems, and resource management within multi-tenancy machine learning systems.
  • Tooling & Automation: Build tools to automate workflows for model training, testing, and deployment, ensuring that machine learning models can move quickly from development to production.
  • Performance Optimization: Ensure the infrastructure supports high-performance model inference at scale, with a focus on minimizing latency and maximizing throughput.
  • Collaboration: Work closely with data scientists, machine learning engineers, and DevOps teams to create seamless integration between development and production environments.
  • Monitoring & Maintenance: Build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services.
  • Security & Compliance: Implement best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data.
Loading...