Member of Technical Staff - Platform at Vals AI

San Francisco, California, United States -

Full Time

Start Date

Immediate

Expiry Date

13 Apr, 26

Salary

0.0

Posted On

13 Jan, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, System Design, LLM APIs, React, Django, AWS, Cloud Infrastructure, Distributed Systems, Full-Stack Development, Team Collaboration, Communication, Problem Solving, Iteration Speed, Engineering Best Practices, Feedback, Ambiguity

Industry

Description

About the Role We are looking for an exceptional mid-level to senior engineer to join our team. You, alongside the team, will own the platform that runs our benchmarks. This spans everything needed to evaluate LLMs at scale: Python libraries, a web platform, distributed systems, cloud infrastructure, and tooling. You'll work across the stack—whatever needs to be built to run benchmarks reliably and efficiently. At Vals, we believe in autonomy. You will be given a high degree of independence to make decisions on tech stacks, system architecture, and code structure. You will also provide guidance to others on the team, both through informal feedback and formal processes like architecture reviews and code reviews. Our platform serves startups, enterprises, and research labs measuring model performance. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg. We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will contribute directly to the infrastructure that makes this possible. What You'll Do Build distributed systems to run evaluations across multiple models, benchmarks, and machines at scale Deploy cloud infrastructure using IAC, including deployment pipelines, servers, logging, monitoring, etc. Contribute to the internal and external libraries we maintain, including our public model library Develop full-stack features for our platform using React/TypeScript on the frontend and Python/Django on the backend Perform code and architecture reviews for other members of the team Help establish engineering best practices across the organization Collaborate closely with the research team to ensure our infrastructure meets their needs Requirements Technical 2+ YOE: 2+ years of full-time experience in software engineering. If you are a new grad, we encourage you to apply to our MTS - Infra role. Strong engineering fundamentals: You can build and ship quickly with high quality. You should have a track record of building things of significant scope (at jobs, side projects, open source, etc.) Python expertise: Significant experience in Python, especially in a professional setting. System Design Experience: You should be familiar with common concepts like VMs, containerization, load balancers, databases, etc. and when to use them appropriately. Familiarity with LLMs: You should have previously worked with LLM APIs, and understand concepts like temperature, tokenization, reasoning, etc. Non-technical: Team collaboration: Experience working in development sprints, Git workflows, and pull request reviews Communication: Strong ability to provide and receive feedback effectively. This includes both spoken and written communication (e.g. design docs). Comfort with Ambiguity: You will often be the one taking a fuzzy problem and breaking it down into clear and actionable steps. Iteration speed: A tenacity to develop and iterate quickly. If you are coming from a large organization, the speed at which we ship will likely be uncomfortable initially. Location: We are an in-person team based in San Francisco. We will support your relocation or transportation as needed. Nice to Haves Experience with frontend development, ideally React. Experience with Django, FastAPI, or other Python-based HTTP servers Experience working with AWS infrastructure, including IaC Experience at early-stage startups or your own company Interest in AI/ML systems and evaluation What We Offer Highly competitive salary and meaningful ownership. Excellence is well rewarded. Relocation and transportation support Health/dental insurance coverage Lunch and dinner provided, free snacks/coffee/drinks 401K plan Unlimited PTO About Us Founding team: The core methodology behind this platform comes from NLP evaluation research we had done at Stanford. We raised a $5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work. Our early team include Stanford PhDs, ex-Jane Street quants, and the first designer at Snorkel. Tech stack: We use Python for most things at Vals. Our platform is built on Django, with a React frontend. All of the infra is on AWS using CDK for IaC. What We're Looking For Learning velocity: The role encompasses a wide variety of tasks. Rather than expecting you to be an expert on Day 1, we are looking for someone who can learn new skills and technologies extremely quickly. Ownership: Working in a small, talent-dense team, we expect everyone to show initiative to build where it's needed, not where it's asked. We strive for autonomy over consensus. This is especially true for this role. Intensity: The LLM landscape is constantly changing. Foundation model labs are continuously pushing the frontier. The unicorn companies that will emerge from this technology shift are being built now. Those that win will have an incredibly high speed of execution. Solution-oriented mindset: We're looking for people who see opportunities to craft solutions at each juncture, not those who pass hard problems to others or admit defeat. Further Reading: Hugging Face blog on evaluation Anthropic’s blog on challenges in evaluation New York Times article on issues in benchmarking Stanford HAI report showing hallucinations in legal tech tools Referral Bonus Know someone who would be a good fit? Connect them with rayan@vals.ai. If we hire them and they stay on for 90 days you’ll get a $10,000 referral bonus and Vals AI merch! Please mention the bonus in your email.

Responsibilities

You will build distributed systems to run evaluations across multiple models and benchmarks at scale, and deploy cloud infrastructure using Infrastructure as Code. Additionally, you will contribute to internal and external libraries and help establish engineering best practices.