Senior AI/ML Engineer II New at DigitalOcean

Boston, Massachusetts, USA -

Full Time

Start Date

Immediate

Expiry Date

30 Nov, 25

Salary

183300.0

Posted On

31 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Good communication skills

Industry

Computer Software/Engineering

Description

Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
We’re building the next generation of agentic applications on the GradientAI platform—where multi-agent systems of LLM-powered agents collaborate, make decisions, and adapt at scale. You’ll be part of the team designing robust, scalable, and safe agent workflows that empower developers to build sophisticated AI-driven systems with confidence.
We’re looking for someone with a strong software engineering background and deep expertise in generative AI, multi-agent system design, guardrails, monitoring, and evaluation methodologies. Your work will directly shape how thousands of developers create and scale AI agents on our platform.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Architect and deliver production-grade agentic systems: multi-agent orchestration, workflow management, state/memory handling, and runtime governance.
Design and orchestrate modular, LLM-powered agents (e.g., Planner, Tool Executor, QA, Validator) using scalable orchestration patterns (sequential, router, parallel, map-reduce), with clear handoff protocols, shared memory, and structured communication.
Define and enforce guardrails and governance: prompt sanitization, access control, audit trails, threat modeling, and strategies for injection defense, hallucination control, misuse prevention, and compliance.
Establish evaluation and monitoring methods for multi-agent systems: accuracy, safety, cost, and latency—leveraging observability practices (logs, telemetry, tracing, capturing intermediate outputs) and feedback loops to continuously refine performance.
Build fine-tuning and deployment pipelines: supervised fine-tuning, inference optimization, post-deployment updates, and scaling hardened systems with retries, error handling, and fairness checks.
Rapidly define and deliver MCPs: identify minimal agent roles and orchestration logic, validate quickly, and expand iteratively into robust multi-agent applications.
Integrate seamlessly with the GradientAI platform: ensuring agents leverage DO services (inference, KBs, Functions, storage, networking) for scale, reliability, and cost-efficiency.
Apply strong software engineering practices: testing, CI/CD, code quality, scalable architectures, and distributed system design.
Collaborate cross-functionally with product managers, infra teams, design and UX, and other engineers to ship features that developers adopt and trust.
Participate and support in operational excellence
Independently ship product features from planning to launch to maintenance with high autonomy
Collaborate with other engineers to find elegant architectures and solutions