Biomedical Cloud Engineer at Stanford University

Stanford, California, USA -

Full Time

Start Date

Immediate

Expiry Date

09 Nov, 25

Salary

0.0

Posted On

09 Aug, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Data Modeling, Data Security, Communication Skills, Computational Physics, Python, Google Cloud, Databases, Cloud Security, Development Projects, Testing, Computer Science, Bioinformatics, Genetics, Cloud Computing, Research Projects, Bam, Fasta

Industry

Information Technology/IT

Description

The Stanford Center for Genomics and Personalized Medicine (SCGPM) has an exciting opportunity available for a motivated Biomedical Cloud Engineer to create innovative data architectures that will automate the process of turning big genomic data into biomedical insights. The ideal person for this position is a keen listener who can interpret biological questions, assess the value and relevance of different technologies and methods, and deliver actionable technical solutions.
Background:
The Department of Veterans Affairs (VA) has commissioned the sequencing of hundreds of thousands of whole genomes from participants in the Million Veteran Program (MVP) [https://www.mvp.va.gov/]. This data is currently being delivered to the SCGPM’s cloud computing environment and constitutes one of the largest repositories of whole-genome sequencing data in the world. The scale and richness of this data make it an incredible resource for biomedical research. Our goal is to turn this data lake into a data commons: a dynamic computing environment where researchers bring questions and get answers, all without having to go through the ordeal of manually collecting, cleaning, massaging, scrubbing, sorting, transforming, and filtering data.
As an example of a publication from this group, see this reference describing the early design of our data processing system:
Ross, P.B., Song, J., Tsao, P.S. et al. Trellis for efficient data and task management in the VA Million Veteran Program. Scientific Reports 11, 23229 (2021). https://doi.org/10.1038/s41598-021-02569-5
Position:
In this position, you would be the system developer of the cloud-based MVP data management system that we have created called Trellis. Trellis stores the petabytes of sequence data contributed to the MVP by veterans and orchestrates its processing while keeping track of what programs were used, maintaining a detailed record of data provenance.
To manage the enormous volumes of biomedical research data that the MVP generates, we have built Trellis in the Google Cloud Platform. The Trellis architecture takes advantage of many serverless cloud services, such as Cloud Functions, Dataproc, Cloud SQL, and Pub/Sub, to make a workflow which responds to the arrival of new data by initiating pipeline processes automatically and at scale.
A production version of Trellis has already processed the whole genomic sequences of over 150,000 veterans and we plan to process at least as many more in the coming year. You would take the lead in keeping this production system running and optimized, and you would interface with our SecOps team which maintains that system in a FedRAMP-secure environment.
Our Team:
Our SCGPM bioinformatics team is a multi-disciplinary group composed of about a dozen scientists, engineers, and software developers with complementary backgrounds, each contributing their own expertise in managing and analyzing complex biomedical data [http://med.stanford.edu/gbsc/scgpm-team.html]. Other projects supported by this team include the NCI Human Tumor Atlas Network, Human BioMolecular Atlas Program, and the Stanford Metabolic Health Center.
This position can be on-site in Palo Alto, fully remote, or hybrid.

Duties include:

Maintaining the smooth execution of our production Trellis system
Working with our Security Operations team to respond to any security incidents
Constructing queries to our graph database to gain insights from pipeline run data
Implementing population-level genomic analyses (GWAS, PCA) to verify data integrity
Designing and integrating novel bioinformatics pipelines into our Trellis system
Troubleshooting data flow in our state-driven Trellis architecture
Building containers for bioinformatics tools and integrating them with our internal data management system to automate workflows
Collaborating with researchers to explore solutions to relevant biological questions and maximize the value of our whole-genome sequencing dataset to the public
Other duties may also be assigned.

DESIRED QUALIFICATIONS:

Four-year degree in Genetics, Computer Science, Bioinformatics, Computational Physics, or a related field
Experience with biomedical data formats (FASTQ, FASTA, BAM, CRAM, Hail MatrixTable, et al.)
Comfortable in programming with Python
Experience with cloud computing, especially Google Cloud
Experience with databases, especially graph databases
Experience with big data technologies (e.g., BigQuery, Spark, Hail, Terra)
Familiarity with issues in computer data security
Familiarity with FedRAMP cloud security
Familiarity with FAIR principles of data management
Excellent verbal and written communication skills
An ability to independently grasp the objectives of research projects and assemble solutions from a range of technologies, standards, and approaches
A desire to learn new methods and technologies and to adapt to demands of fast-paced research

EDUCATION & EXPERIENCE (REQUIRED):

Bachelor’s degree and five years of relevant experience, or a combination of education and relevant experience.

KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):

Expertise in designing, developing, testing, and deploying applications.
Proficiency with application design and data modeling.
Ability to define and solve logical problems for highly technical applications.
Strong communication skills with both technical and non-technical clients.
Ability to lead activities on structured team development projects.
Ability to select, adapt, and effectively use a variety of programming methods.
Knowledge of application domain.

PHYSICAL REQUIREMENTS*:

Constantly perform desk-based computer tasks.
Frequently sit, grasp lightly/fine manipulation.
Occasionally stand/walk, writing by hand.
Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.
Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Maintaining the smooth execution of our production Trellis system
Working with our Security Operations team to respond to any security incidents
Constructing queries to our graph database to gain insights from pipeline run data
Implementing population-level genomic analyses (GWAS, PCA) to verify data integrity
Designing and integrating novel bioinformatics pipelines into our Trellis system
Troubleshooting data flow in our state-driven Trellis architecture
Building containers for bioinformatics tools and integrating them with our internal data management system to automate workflows
Collaborating with researchers to explore solutions to relevant biological questions and maximize the value of our whole-genome sequencing dataset to the public
Other duties may also be assigned