Open Source Engineer at LanceDB
San Francisco, California, United States -
Full Time


Start Date

Immediate

Expiry Date

23 Jan, 26

Salary

0.0

Posted On

25 Oct, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Open Source, High-Performance Computing, Big Data, Distributed Systems, Java, Scala, Rust, Apache, AI, Data Processing, Community Engagement, Integration, Data Infrastructure, Predicate Pushdown, Data Encodings, Table Formats

Industry

Information Services

Description
About LanceDB LanceDB is a developer-friendly, open-source database for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today. About the Role We’re looking for an Open Source Engineer to help expand the reach of Lance and LanceDB within the broader data infrastructure ecosystem. You’ll work at the intersection of high-performance computing, big data, and open-source systems—driving integrations, improving distributed operations, and contributing to projects across the Apache and AI communities. You’ll be responsible for: Driving open-source community efforts to integrate the Lance format with Spark, Hive Metastore, Presto, Trino, Ray, and other data infrastructure systems Designing and maintaining efficient distributed Lance dataset operations Building efficient indices to enable predicate pushdown and accelerate queries in Spark, Ray, or Trino Working on table formats, data encodings, and various aspects of the Lance format in Rust Operating and improving internal data processing infrastructure Promoting the Lance format in open-source communities and at Big Data conferences Requirements 5+ years of experience building high-performance databases, big data systems, or large-scale data services Deep understanding of internals of open-source Big Data or AI training systems (e.g., Hadoop, Spark, Flink, Ray, Iceberg, Delta Lake, Hudi, ClickHouse, Trino, Presto, PyTorch, or JAX) Strong experience with high-performance computing in Java or Scala Experience with Rust (or willingness to learn it) Proven ability to move fast, work independently, and collaborate with a high-caliber team Nice to Have Contributor, committer, or PMC member in Apache or other large open-source projects Experience with Java, Rust, C++, Apache Arrow, DataFusion, Parquet, Iceberg, or Delta Lake Track record of driving large features or integrations in distributed systems Strong community presence and passion for open-source collaboration What We Offer A key role shaping an open-source project with real production usage Remote-first team with flexible hours Competitive compensation, equity, and benefits Generous learning budget and support for open-source contributions About the LanceDB Team LanceDB was created by experts with decades of experience building tools for data science and machine learning. From co-authors of pandas to Apache PMC members of HDFS, Arrow, and Delta, the LanceDB team has created open-source tools used by millions worldwide.
Responsibilities
The Open Source Engineer will drive community efforts to integrate the Lance format with various data infrastructure systems and design efficient distributed dataset operations. They will also promote the Lance format in open-source communities and at Big Data conferences.
Loading...