Lead Data Engineer at TripleTen

Remoto, Sicilia, Portugal -

Full Time

Start Date

Immediate

Expiry Date

20 Apr, 25

Salary

0.0

Posted On

21 Jan, 25

Experience

0 year(s) or above

Remote Job

Telecommute

Sponsor Visa

Skills

Transformation, Cultural Institutions, Aws, Gitlab, Data Systems, Latin America, Spark, Apache Spark, Python, Design Principles

Industry

Information Technology/IT

Description

TripleTen is a service that empowers individuals, regardless of their prior experience, to embark on the exciting and challenging journey of mastering IT professions such as software engineering, data science, business intelligence analytics, and QA engineering in a feasible and accessible way, ultimately leading to employment opportunities.*
Our mission is to ensure that every student has the opportunity to master a new profession successfully and become a valuable member of the IT industry. We are successfully producing highly desirable tech professionals in the most competitive EdTech market in the world — the US market.*
We’re looking for a Data Engineer to design and maintain robust data pipelines for building personalized features for our students and powering our AI Tutor Assistant. You’ll work with a Medallion Architecture (adapted from Databricks) that processes and stores varied student activity data, and you’ll ensure our Spark-based workflows run smoothly, securely, and efficiently.

Requirements:

Proven experience as a data engineer, preferably with at least 5 or more years of relevant experience.
You have experience setting up and maintaining scalable data systems with Spark or other similar framework.
You’re comfortable working in a cloud environment and have hands-on experience with data lakehouse design principles.
You demonstrate strong problem-solving skills and can optimize data ingestion, transformation, and storage workflows.
You communicate effectively with both technical and non-technical teams to ensure data is used effectively and responsibly.
Experience in MLOps is a plus (e.g., deploying and maintaining MLOps services like MLflow).

What you will do:

Architect & Implement Data Pipelines: Build, monitor, and optimize batch/streaming data ingestion and transformations within a lakehouse environment.
Spark Development: Develop and maintain Spark jobs (PySpark) for ETL/ELT and data processing at scale.
Data Model & Warehouse: Collaborate on data modeling for analytical and machine learning use cases.
Infrastructure & Tooling: Work with cloud platforms (AWS) for scalable storage, compute, and workflow orchestration.
Data Quality & Governance: Ensure data accuracy, consistency, and compliance across pipelines, implementing robust validations and best practices.
Cross-team Collaboration: Partner with Developers and Product teams to deliver data in a consumable format that enables our AI Assistant to provide valuable insights.

DISCLOSURES

At this time we are unable to offer H1B sponsorship opportunities in the USA.
This job description is not designed to contain a comprehensive listing of activities, duties, or responsibilities that are required. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities at any time.

*TripleTen is an equal employment opportunity/affirmative action employer and considers qualified applicants for employment without regard to race, color, religion, sex, national original, age, religion, disability, marital status, sexual orientation, gender identity/expression, protected military/veteran status, or any other legally protected factor

Responsibilities

Architect & Implement Data Pipelines: Build, monitor, and optimize batch/streaming data ingestion and transformations within a lakehouse environment.
Spark Development: Develop and maintain Spark jobs (PySpark) for ETL/ELT and data processing at scale.
Data Model & Warehouse: Collaborate on data modeling for analytical and machine learning use cases.
Infrastructure & Tooling: Work with cloud platforms (AWS) for scalable storage, compute, and workflow orchestration.
Data Quality & Governance: Ensure data accuracy, consistency, and compliance across pipelines, implementing robust validations and best practices.
Cross-team Collaboration: Partner with Developers and Product teams to deliver data in a consumable format that enables our AI Assistant to provide valuable insights