ML Engineer - Austin, TX at Baasi

Austin, Texas, United States -

Full Time

Start Date

Immediate

Expiry Date

29 Jun, 26

Salary

0.0

Posted On

31 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, PyTorch, Hugging Face Transformers, PEFT, LoRA, QLoRA, Tokenization, REST, gRPC, CUDA, ROCm, Linux, Testing, Monitoring, Fine-tuning, Adapter Export

Industry

Media and Telecommunications

Description

About Us We are a stealth-mode startup building next-generation infrastructure for the AI industry. Our mission is to make advanced language models portable, efficient, and customizable for real-world deployments. We’re building tools that allow vendors to fine-tune models easily and deploy them securely on diverse hardware. Role We are seeking a AI ML Engineer (Python) to help design and implement our AI Pipelines. This is not an academic research role — you will be productizing and automating existing fine-tuning techniques (LoRA/QLoRA) so vendors can train and manage their own adapters with minimal effort. You’ll work closely with backend engineers (Node.js) who orchestrate jobs and dashboards, while you focus on the training pipelines and adapter export logic. Responsibilities Implement and maintain LoRA/QLoRA fine-tuning pipelines using PyTorch + Hugging Face Transformers + PEFT. Develop logic for incremental training and adapter stacking, producing clean, versioned “delta packs.” Automate data preprocessing (tokenization, formatting, filtering) for user-supplied datasets. Build training scripts/workflows that integrate with orchestration backends (Node.js, REST/gRPC, or job queues). Implement monitoring hooks (loss curves, checkpoints, eval metrics) to feed into dashboards. Collaborate with DevOps to ensure reproducible, portable training environments. Write tests to guarantee reproducibility and correctness of adapter outputs. Willingness to occasionally be present in the office for discussions and team collaboration. Requirements Strong programming skills in Python. Hands-on experience with PyTorch and the Hugging Face ecosystem (Transformers, Datasets, PEFT). Familiarity with LoRA/QLoRA or parameter-efficient fine-tuning methods. Understanding of mixed precision training (FP16/BF16) and memory optimization techniques. Experience building training scripts that are production-ready (reproducibility, logging, error handling). Comfortable working in Linux GPU environments (CUDA, ROCm). Ability to collaborate with backend/frontend engineers who are not ML specialists. Nice to Have Experience with bitsandbytes, xformers, or flash-attention. Familiarity with distributed training (multi-GPU, NCCL, DeepSpeed, or Accelerate). Prior work in MLOps or packaging ML pipelines for deployment. Contributions to open-source ML libraries. Why Join Build the core training product that lets vendors adapt models safely and efficiently. Focus on product engineering, not open-ended research. Collaborate with a lean, highly technical team at the intersection of AI and systems. Competitive compensation, equity potential, and flexible remote work.

Responsibilities

The engineer will implement and maintain LoRA/QLoRA fine-tuning pipelines using PyTorch and Hugging Face, developing logic for incremental training and versioned adapter exports. Responsibilities also include automating data preprocessing and building training scripts that integrate with backend orchestration systems.