Senior Data Scientist at GPTZero
Toronto, ON, Canada -
Full Time


Start Date

Immediate

Expiry Date

18 Apr, 25

Salary

0.0

Posted On

19 Jan, 25

Experience

3 year(s) or above

Remote Job

No

Telecommute

No

Sponsor Visa

No

Skills

Pandas, Numpy, Machine Learning, Hive, Data Manipulation, Scipy, Visualization

Industry

Information Technology/IT

Description

WHO ARE WE?

GPTZero’s mission is to restore information quality and transparency on the internet.
Our team comes from high-performing engineering cultures, including Uber, Meta, Microsoft, Affirm, and leading AI research labs, including Princeton, Caltech, Vector, and MILA. We are working on novel models and pushing cutting-edge research to production, including AI detection, AI hallucination detection, retrieval augmented generation, and writing stylometry to over 3 million active users, and enterprise clients including Fortune 1000 and Unicorn AI companies.
We are backed by some of the best in the valley, including Uncork, Neo, Altman Capital, and in journalism, including Mark Thompson (CEO of NYT, BBC, CNN) and Tom Glocer (CEO of Reuters) who defined a generation of quality digital information.

QUALIFICATIONS

  • 3+ years of experience with Pandas, Numpy, and Scipy for data manipulation, visualization and statistical analysis of complex datasets.
  • Experience with large-scale data pipelines and feature engineering (such as PySpark, BigQuery, AWS EMR, or Hive).
  • Strong storytelling ability to communicate complex findings to technical and non-technical audiences.
  • Self-starter (pitch, plan, and implement as a project owner in a fast-paced team)
  • Highly motivated to make positive societal impact
  • Ability to wear multiple hats and be a leader as our team grows
  • Ability to work in Canada
  • Bonus:
  • strong open-source portfolio
  • experience with PyTorch for machine learning
  • experience with databricks, and amplitude for product analytics
  • experience working in an early-stage startup environment
Responsibilities
  • Build our understanding of user behaviour and needs with product analytics using Pandas, Numpy, Scipy, and PySpark
  • Write clean, efficient Python and SQL to extract insights, identify patterns, and evaluate the performance of our machine learning models
  • Build robust, scalable pipelines to ingest, preprocess, and analyze large-scale datasets using tools like Databricks and amplitude.
  • Collaborate with both machine learning engineers and product teams to validate datasets, interpret model outputs, and recommend improvements.
Loading...