Stanford University is seeking a Big Data Architect 1 for a 1 year fixed term (possibility of renewal) to design and develop applications, test and build automation tools and support the development of Big Data architecture and analytical solutions.
ABOUT US:
The Department of Biomedical Data Science merges the disciplines of biomedical informatics, biostatistics, computer science and advances in AI. The intersection of these disciplines is applied to precision health, leveraging data across the entire medical spectrum, including molecular, tissue, medical imaging, EHR, biosensory and population data.
ABOUT THE POSITION:
We are seeking an experienced ML Data Engineer to drive the programmatic curation, cleaning, and generation of healthcare data. In this role, you will focus exclusively on developing and maintaining automated, ML-accelerated pipelines that ensure high-quality data ready for machine learning applications. Your work will be pivotal in shaping the integrity of our data and supporting downstream predictive models in a complex healthcare environment.
DESIRED QUALIFICATIONS:
- 3+ years of experience in software development and data engineering with a strong focus on data cleaning, transformation, and creation.
- Proficiency in Python and experience with data processing libraries (e.g., Pandas, Polars, NumPy).
- Hands-on experience in building and maintaining automated data pipelines for large-scale data processing.
- Familiarity with machine learning frameworks (e.g., PyTorch, JAX, scikit-learn) as applied to data quality and augmentation tasks.
- Expertise in working with healthcare data, including familiarity with the OMOP Common Data Model (OMOP CDM).
- Strong experience in a Linux environment and comfort with UNIX command-line tools.
- Proven ability to work collaboratively in multidisciplinary teams and communicate technical concepts effectively.
PREFERRED QUALIFICATIONS:
- Experience with cloud platforms (e.g., GCP, AWS, or Azure) and distributed computing frameworks.
- Proficiency with version control systems (e.g., Git) and containerization tools (e.g., Docker).
- Familiarity with healthcare data standards and regulatory requirements.
EDUCATION & EXPERIENCE (REQUIRED):
Bachelor’s degree in scientific or analytic field and five years of relevant experience, or a combination of education and relevant experience.
KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):
- Knowledge of key data structures algorithms, and techniques pertinent to systems that support high volume, velocity, or variety datasets (including data mining, machine learning, NLP, data retrieval).
- Experience with relational, NoSQL, or NewSQL database systems and data modeling, structured and unstructured.
- Experience in parallel and distributed data processing techniques and platforms (MPI, Map/Reduce, Batch).
- Experience in scripting languages and experience in debugging them, experience with high performance/systems languages and techniques.
- Knowledge of benchmark software development and programmable fields/systems, ability to analyze systems and data pipelines and propose solutions that leverage emerging technologies.
- Ability to use and integrate security controls for web applications, mobile platforms, and backend systems.
- Experience deploying reliable data systems and data quality management.
- Ability to research, evaluate, architect, and deploy new tools, frameworks, and patterns to build scalable Big Data platforms.
- Ability to document use cases, solutions and recommendations.
- Demonstrated excellence in written and verbal communication skills.
PHYSICAL REQUIREMENTS*:
- Frequently sit, grasp lightly, use fine manipulation and perform desk-based computer tasks, lift, carry, push pull objects that weigh to ten pounds.
- Occasionally sit, use a telephone or write by hand.
- Rarely kneel, crawl, climb, twist, bend, stoop, squat, reach or work above shoulders, sort, file paperwork or parts, operate foot and hand controls.
- Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.