Start Date
Immediate
Expiry Date
09 Nov, 25
Salary
0.0
Posted On
09 Aug, 25
Experience
5 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Data Modeling, Data Security, Communication Skills, Computational Physics, Python, Google Cloud, Databases, Cloud Security, Development Projects, Testing, Computer Science, Bioinformatics, Genetics, Cloud Computing, Research Projects, Bam, Fasta
Industry
Information Technology/IT
The Stanford Center for Genomics and Personalized Medicine (SCGPM) has an exciting opportunity available for a motivated Biomedical Cloud Engineer to create innovative data architectures that will automate the process of turning big genomic data into biomedical insights. The ideal person for this position is a keen listener who can interpret biological questions, assess the value and relevance of different technologies and methods, and deliver actionable technical solutions.
Background:
The Department of Veterans Affairs (VA) has commissioned the sequencing of hundreds of thousands of whole genomes from participants in the Million Veteran Program (MVP) [https://www.mvp.va.gov/]. This data is currently being delivered to the SCGPM’s cloud computing environment and constitutes one of the largest repositories of whole-genome sequencing data in the world. The scale and richness of this data make it an incredible resource for biomedical research. Our goal is to turn this data lake into a data commons: a dynamic computing environment where researchers bring questions and get answers, all without having to go through the ordeal of manually collecting, cleaning, massaging, scrubbing, sorting, transforming, and filtering data.
As an example of a publication from this group, see this reference describing the early design of our data processing system:
Ross, P.B., Song, J., Tsao, P.S. et al. Trellis for efficient data and task management in the VA Million Veteran Program. Scientific Reports 11, 23229 (2021). https://doi.org/10.1038/s41598-021-02569-5
Position:
In this position, you would be the system developer of the cloud-based MVP data management system that we have created called Trellis. Trellis stores the petabytes of sequence data contributed to the MVP by veterans and orchestrates its processing while keeping track of what programs were used, maintaining a detailed record of data provenance.
To manage the enormous volumes of biomedical research data that the MVP generates, we have built Trellis in the Google Cloud Platform. The Trellis architecture takes advantage of many serverless cloud services, such as Cloud Functions, Dataproc, Cloud SQL, and Pub/Sub, to make a workflow which responds to the arrival of new data by initiating pipeline processes automatically and at scale.
A production version of Trellis has already processed the whole genomic sequences of over 150,000 veterans and we plan to process at least as many more in the coming year. You would take the lead in keeping this production system running and optimized, and you would interface with our SecOps team which maintains that system in a FedRAMP-secure environment.
Our Team:
Our SCGPM bioinformatics team is a multi-disciplinary group composed of about a dozen scientists, engineers, and software developers with complementary backgrounds, each contributing their own expertise in managing and analyzing complex biomedical data [http://med.stanford.edu/gbsc/scgpm-team.html]. Other projects supported by this team include the NCI Human Tumor Atlas Network, Human BioMolecular Atlas Program, and the Stanford Metabolic Health Center.
This position can be on-site in Palo Alto, fully remote, or hybrid.
Duties include:
DESIRED QUALIFICATIONS:
EDUCATION & EXPERIENCE (REQUIRED):
Bachelor’s degree and five years of relevant experience, or a combination of education and relevant experience.
KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):
PHYSICAL REQUIREMENTS*:
How To Apply:
Incase you would like to apply to this job directly from the source, please click here