Director/Sr. Director - Discovery Data Team Lead Engineer, Molecule Discove at Eli Lilly

Indiana, Indiana, USA -

Full Time

Start Date

Immediate

Expiry Date

12 Sep, 25

Salary

242000.0

Posted On

13 Jun, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Integration, Molecular Biology, Data Extraction, Bioinformatics, Data Systems, Computer Science, Spark, Python, Strategic Leadership, Data Processing

Industry

Information Technology/IT

Description

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.
The Discovery Data Team (DDT) is accelerating molecule discovery through the integration of high-throughput lab data, next-generation sequencing (NGS), lab automation, and machine learning. We’re championing scalable, cloud-native infrastructure to power data pipelines and APIs that unify experimental and computational datasets across the molecule discovery lifecycle and modalities.
We’re seeking a Discovery Data Team Lead Engineer to design and implement robust, scalable infrastructure for ingesting and processing scientific datasets—especially NGS and experimental workflows—from lab instruments, ELNs, and cloud storage systems. You’ll play a key role in leading the and generating the technical, engineering strategy and collaborating closely with scientific and Tech@Lilly team. You will also lead the strategy to build data pipelines, APIs, and workflow orchestration platforms across AWS and modern data technologies. As the first engineer on the DDT, you’ll also work closely with bench scientists, computational scientists and bioinformatician, and Tech@Lilly on several data initiatives leading the technical strategy and influencing stakeholders and informing the leadership on the technical roadmaps.

BASIC REQUIREMENTS:

Bachelor’s degree or higher degree in engineering, computer science or related sciences fields
10+ years of work experience in leading engineering teams and working in cloud infrastructure or DevOps roles with strong focus on strategic leadership and data systems

ADDITIONAL SKILLS/PREFERENCES:

Familiarity with columnar data formats and scalable storage architectures (e.g., data lakes, Redshift, Parquet).
Excellent problem-solving skills and ability to troubleshoot complex issues.
Strong communication and collaboration skills.
Experience with Nextflow or similar workflow languages for NGS or scientific data processing.
Strong hands-on experience with AWS services, especially Lambda, Batch, S3, and container orchestration.
Proficiency with Python and frameworks like FastAPI for developing APIs.
Experience with scientific data systems and ELNs like Benchling and Signals.
Strong understanding of data pipeline orchestration (Airflow), distributed compute (Spark), and data modeling for scientific datasets.
Experienced in developing solutions using agile methodology (e.g. Scrum) and tools (e.g. JIRA)
Experience working with lab instrumentation data extraction and integration into cloud data stores.
Background in bioinformatics, molecular biology, or a related life sciences field.
Experience in regulated or GxP-compliant environments.
Knowledge of scientific computing environments and HPC systems.
Familiarity with workflow containerization (Docker, Singularity) and CI/CD pipelines.

Responsibilities

Serve as a technical lead and data architect within the Discovery Data Team in Molecule Discovery
Thought partner to the DDT head on engineering and technical strategy for projects
Influence cross-functional partners and drive the technical design of new data products and pipelines
Lead a team of engineers to catalyze and execute on data initiatives in molecule discovery
Build and scale cloud-native infrastructure to support data ingestion, processing, and retrieval for molecule discovery and sequencing workflows.
Develop workflows using Nextflow for NGS data processing and integrate them into larger data pipeline systems.
Integrate and extract data from lab instruments and ELNs (e.g., Benchling, Signals) and route them into structured data lakes or databases.
Develop and maintain APIs using FastAPI to interface between data sources, pipelines, and downstream applications.
Design and implement data pipelines using Airflow, PostgreSQL, Spark, and columnar storage formats (e.g., Parquet, Redshift).
Deploy, monitor, and optimize infrastructure on AWS, including services like Lambda, Batch, S3, and EC2.
Build secure, scalable APIs for data sharing and querying between storage systems and data consumers.
Work cross-functionally with bioinformaticians, data scientists, and lab informatics teams to enable seamless scientific data workflows