Data Scientist (Biostatistician 2) at Stanford University

Stanford, California, USA -

Full Time

Start Date

Immediate

Expiry Date

17 Sep, 25

Salary

132108.0

Posted On

17 Jun, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Informatics, Sql, Natural Language Processing, Statistical Computing, Pandas, Database Design, Github, Survival Analysis, Data Cleaning, R, Google Cloud Platform, Neural Networks, Insurance Claims, Biostatistics, Preparation, Python, Datasets, Statistics, Medical Records

Industry

Information Technology/IT

Description

The Department of Ophthalmology at Stanford University, School of Medicine, is seeking a highly motivated, hard-working, and professional Data Scientist to facilitate research efforts in ophthalmology. The incumbent will be part of the Department of Ophthalmology; however, the position will be in a collaborative environment, engaging with other Stanford faculty and staff across multiple departments, including Biomedical Data Science, Research IT, and Research Informatics Center. The incumbent will work with a combination of structured and unstructured (text, imaging) data from several sources, including Stanford’s STARR and STARR-OMOP clinical research databases, the ophthalmology Sight Outcomes Research Collaborative (SOURCE) national electronic health records registry, the All of Us national cohort, commercial and Medicare claims data, national survey data, and other sources.
The position will require an incumbent who is comfortable working with some independence; consulting with and advising investigators to refine research questions, define hypotheses and project objectives, design studies and devise analysis plans; and working with project team members—including clinicians, trainees, and other statisticians/informaticists—to implement analysis plans and publish findings. The incumbent must be proficient at balancing involvement in multiple simultaneous projects and prioritizing to manage competing priorities. The incumbent will work closely with others to interrogate databases to create analytic files, perform quality control and data cleaning, and manage and analyze data. The incumbent must be an excellent and timely communicator, able to present results in oral and written form to clinical investigators.

Duties include:

Work directly with investigators and independently identify appropriate data analytic approaches; assist in study design and proposal development
Create analytic files with detailed documentation. Prepare data for analysis by cleaning, identifying cohorts, reshaping data, creating new variables, merging multiple data tables, creating and maintaining new databases as needed.
Implement data analyses using predictive modeling approaches (machine-learning, deep-learning), or inferential statistical methods as appropriate to the project
Develop reusable and well-documented code for all projects, that can be maintained in a repository (e.g. GitHub) for collaborative use
Quickly learn new skills as needs arise, such as new programming or statistical packages
Communicate and present results for investigators using graphs and tables.
Develop oral and written dissemination of findings for conference presentations and peer-reviewed journal articles.
- - Other duties may also be assigned: The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory for all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned.

DESIRED QUALIFICATIONS:

Strong background in machine learning, biostatistics, and bioinformatics
Intellectually curious; willing and eager to learn new skills
Experience with large datasets and database use
Experience with analysis of real-world observational health data (e.g., electronic medical records, insurance claims)
Manipulation and analyses of complex high-dimensional data
Ability to perform careful data cleaning and preparation, including: identifying and handling data discrepancies, duplicates, missing values, outliers, etc; developing cohorts of patients based on inclusion and exclusion criteria, such as those based on billing code diagnoses, age or other demographics, length of follow-up, or other characteristics; creating new variables, including coding relevant outcomes, combining sparse variables, normalizing/standardizing variables; merging datasets on multiple key values; reshaping data from long to wide or vice versa as the befits the analysis needs; loading data into analysis programs, saving data into different file formats
Experience with the following: 1) Machine learning predictive models (gradient boosted trees, random forest etc.); 2) Deep learning neural networks, transfer learning
Experience with free-text data (e.g., natural language processing, large language models) is a plus, or else willingness to learn

EDUCATION & EXPERIENCE (REQUIRED):

Master’s degree in biostatistics, statistics or related field and at least 3 years of experience.

KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):

Proficient in R (preferred), or alternatively either SAS or STATA for statistical analyses and visualization.
Proficient in SQL
Proficient in Python, including packages such as Jupyter Notebook, matplotlib, pandas, scikit-learn, and either tensorflow/keras or pytorch or both.
Able to use GitHub, write reusable and well-documented code
Familiarity with using cloud computing platforms for data analysis, such as Google Cloud Platform and/or Amazon Web Services
Outstanding ability to communicate in written and oral English how data analyses were performed, to both technical and non-technical audiences.
Demonstrated excellence in at least one area of expertise, which may include statistical methodology such as missing data, survival analysis, or informatics; statistical computing; database design (e.g., Oracle databases, SQL); predictive modeling (machine learning and deep learning).

PHYSICAL REQUIREMENTS*:

Frequently perform desk based computer tasks, seated work and use light/ fine grasping.
Occasionally stand, walk, and write by hand, lift, carry, push pull objects that weigh up to 10 pounds.
Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Work directly with investigators and independently identify appropriate data analytic approaches; assist in study design and proposal development
Create analytic files with detailed documentation. Prepare data for analysis by cleaning, identifying cohorts, reshaping data, creating new variables, merging multiple data tables, creating and maintaining new databases as needed.
Implement data analyses using predictive modeling approaches (machine-learning, deep-learning), or inferential statistical methods as appropriate to the project
Develop reusable and well-documented code for all projects, that can be maintained in a repository (e.g. GitHub) for collaborative use
Quickly learn new skills as needs arise, such as new programming or statistical packages
Communicate and present results for investigators using graphs and tables.
Develop oral and written dissemination of findings for conference presentations and peer-reviewed journal articles