Scientific Data Management Associate Director at Vertex Pharmaceuticals

Boston, Massachusetts, USA -

Full Time

Start Date

Immediate

Expiry Date

19 Nov, 25

Salary

234000.0

Posted On

20 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Sql, Ontology, Apache Spark, Databases, Snowflake, Integration, Communication Skills, Emerging Technologies, Python, Unity, Harmonization, Ml

Industry

Information Technology/IT

Description

JOB DESCRIPTION

We are seeking a computational scientist and scientific data manager to join the Data & Computational Sciences (DCS) department at the Boston, MA site of Vertex Pharmaceuticals.
In this role, you will work across the DCS department to ensure that our data is stored in an appropriately structured form, with the relevant metadata that conforms with FAIR (Findable, Accessible, Interoperable, and Reusable) principles. These efforts will strengthen and further breakthroughs in AI, driving new discoveries and insights for Global Research across domains including chemistry, biology, and imaging. The initial focus of the role will be genomics data, spanning genetic variants, therapeutic gene editing, transcriptomic, and single-cell data. Through this work, you will ensure that results from large-scale computational analyses are findable and available for reuse, allowing scientists to derive novel insights using results from both computation and experiment. You will work closely with colleagues across the DCS department from different domains and expertise areas, bringing your expertise in data management and data governance, to devise appropriate solutions to store and access their data, to develop appropriate policies around data use, access, and storage, and to ensure appropriate management of data. In parallel, you will also collaborate cross-functionally across the organization on data needs for DCS, including close partnership with the Data, Technology and Engineering (DTE) team on potential technology solutions, and incorporating enterprise guidelines including those for ontology use, data catalogs, data retention and security.
To succeed in this role, you will need familiarity with the use of large-scale scientific data to drive Research discoveries in a drug discovery and therapeutic development setting. You will also need broad experience in the management of scientific data including the architecture of scientific data storage solutions, with a primary focus on data types that are generated from computational workflows on large-scale data in Global Research across domains including chemistry, genomics and imaging.

QUALIFICATIONS:

A PhD (or equivalent) in computational sciences, with 7+ years of relevant experience, or a Masters degree in the computational sciences, with 9 or more years of relevant experience in research data management
Experience working cross-functionally and collaborating across a team to drive alignment
Experience championing for data governance principles
Excellent oral and written communication skills.
A team-oriented growth mindset that welcomes feedback from others and supports other team members; a positive attitude that enthusiastically tackles and overcomes challenges
Experience in working with scientific data sources such as ChEMBL, UniProt, OpenTargets, HPA, GnomAD, SingleCellPortal or GTEx, including integration and harmonization of data across databases
Experience on FAIR data practices including the use of common ontologies such as Allotrope Foundation and the BioAssay Ontology
Knowledge of data governance principles and frameworks, and tools such as Colibra or Unity Catalog
Very strong programming skills, ideally in Python and SQL, as well as familiarity with distributed data processing languages such as Apache Spark
Experience with data platforms such as Databricks, Snowflake
Familiarity with database architectures that are oriented towards large scale scientific data such as TileDB, VoltDBA strong understanding of emerging technologies such as cloud architectures and AI and ML approaches to data management tasks
LI-KM1

LI-Hybrid

Responsibilities

Work with colleagues across DCS to understand their data and how it is used, and develop a data management & governance roadmap that will address data management needs in a prioritized manner
Define and implement data management solutions, in collaboration with DTE, for large-scale results generated from computational workflows spanning chemistry, biology, imaging, and screening
Where relevant, be responsible for prototyping pipelines to create integrated datasets that combine internal and/or external data sources, and work with colleagues in DTE to productionize such datasets for broader use within DCS
Contribute to DCS-specific development of best practices, guidelines, and SOPs, as appropriate with a focus on data related aspects
Align with enterprise-level data governance frameworks
Coordinate with data engineering efforts in DTE to integrate samples, tests and results across the Research environment
Support other prioritized data needs as needed, such as evaluation of technology solutions for insights from scientific literature, and identification of key external data sources to address gaps in internal data

LI-KM1

LI-Hybrid