Senior AI Engineer at TradeBeyond

Taipei, , Taiwan -

Full Time

Start Date

Immediate

Expiry Date

02 Jun, 26

Salary

0.0

Posted On

04 Mar, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Kubernetes, Linux, LLM, ML/DL, Production Debugging, Monitoring, Rollback Strategy, Performance Profiling, Containerization, Deployment, Evaluation Frameworks, Observability, Incident Response, RAG Systems, Workflow Orchestration

Industry

Software Development

Description

Location: Taipei For more than 20 years, TradeBeyond has been advancing a more efficient, responsible supply chain. Responding to retail sourcings need for smarter, automated workflows and data transparency, we developed the industrys leading supply chain platform, CBX, which is relied on by many of the largest brands and retailers around the world. As consumers, businesses, and governments alike have increased their commitment to sustainability, Fortune 500 companies such as The Home Depot, REI, Safeway, Lidl, and Lululemon have turned to TradeBeyond to help optimize product development, manage suppliers, reduce waste, and improve quality and compliance. Learn more at TradeBeyond.com About the Role Join our AI team building real AI features used in business workflows. Youll work closely with the AI Team Lead and prioritize delivery reliability (60/40 delivery-to-exploration): monitoring, rollbacks, performance tuning, and repeatable improvements. Success is features that operators trust, engineers can reproduce, and stakeholders can measure. If you enjoy turning AI capabilities into reliable products—not just demos—youll fit well here. Responsibilities: Keep LLM and ML/DL features healthy in production: monitor, troubleshoot, rollback, and improve operational reliability for day-to-day usage. Improve performance with numbers: benchmark latency, throughput, and cost; iterate changes and report measurable wins. Ship confidently: package releases with containers, configs, and artifacts; ensure reproducible deployments and safe rollbacks every time. Build a practical evaluation & iteration loop: datasets (as needed), test cases, acceptance criteria, and regression checks tied to clear quality and cost targets. Collaborate frequently with the AI Team Lead; explain trade-offs clearly and maintain lightweight decision notes for shared alignment. Requirements (Must-Have) Strong Python, Kubernetes, Linux, and production debugging experience; able to deliver and operate LLM and ML/DL features in production environments. Proven track record shipping LLM and ML/DL services or workflows with monitoring, rollback strategy, and quantified outcomes. Hands-on performance profiling skills; resolve timeouts, OOMs where applicable, performance bottlenecks, and stability issues through evidence-based iteration. Write clear runbooks and postmortems; communicate trade-offs crisply to managers, engineers, and non-ML stakeholders. Nice-to-Have Experience designing evaluation frameworks for LLM features: offline test sets, online metrics, etc., including acceptance criteria and regression checks. Experience instrumenting AI features with tracing and observability: request IDs, prompt and context logging policy, latency breakdown, failure taxonomy. Hands-on incident response: postmortems, runbooks, on-call, etc. for production services. Familiar with prompt injection and data exfiltration risks in RAG systems and practical mitigations: input sanitization, allowlists, tool permissions, content filters, etc. Experience with workflow orchestration or task queues: Airflow, DVC, dbt, Prefect, Celery, etc. for LLM and ML/DL pipelines. Practical fine-tuning experience with SFT, PEFT and LoRA is a plus; not required now but helpful for future growth. TradeBeyond Offers You will work in a flat and open team environment where your experience and expertise are valued. You will work in partnership with a leadership team who have profound domain knowledge in their functional areas and are keen to work with you to continuously make positive impacts for our customers and employees. Externally, you will be engaging with a client network on a global footprint. We offer competitive compensation in a dynamic, high growth and global environment. At TradeBeyond, we value the diversity of our employees and partners. We believe that our company thrives when we support and celebrate our differences. Interested parties, please apply together with resume, stating current and expected salary, and send it via APPLY NOW. We are an equal opportunity employer and welcome applications from all qualified candidates. All information provided by applicants will be treated in strictest confidence and handled confidentially for recruitment-related purposes within the company and our associated company. Applicants may be considered for other suitable positions within the company over a one-year period, after which their personal data will be destroyed

Responsibilities

The role focuses on keeping LLM and ML/DL features healthy in production through monitoring, troubleshooting, and improving operational reliability. Responsibilities also include improving performance metrics, shipping features confidently with reproducible deployments, and building practical evaluation and iteration loops.