Sign up with

Already have an account? Log in here

Need some help?
Talk to us at +91 7670800001

Sr. ML Data Engineer (ETL at CEDENT

United States, North Carolina, USA -

Full Time

Start Date

Immediate

Expiry Date

30 Apr, 25

Salary

0.0

Posted On

01 Feb, 25

Experience

0 year(s) or above

Remote Job

Telecommute

Sponsor Visa

Skills

Query Optimization, Master Data Management, Java, Data Security, Data Engineering, Sql, Python, Scala, Pipeline Development, Data Processing, Data Preparation, Analytics, Apache Spark, Pii, Data Manipulation, Metadata, Ml

Industry

Information Technology/IT

Description

QUALIFICATIONS:

7 years in data engineering and at least 4 years focusing on ML feature engineering ETL pipeline development and data preparation for ML
Proven experience managing pipelines on Data bricks using Apache Spark with a strong understanding of the medallion architecture
Familiarity with ML lifecycle management with MLflow experience as a strong plus and advanced skills in Apache Spark PySpark for big data processing and analytics
Proficient in Python for data manipulation and SQL for query optimization
Experience building pipelines for real-time and batch model serving in production environments and knowledge of CICD practices for ETLELT pipeline development
Expertise in metadata and master data management within technical data catalogues
Understanding of data security and compliance especially with sensitive data like PII
Mandatory Skills: : Apache Spark, Databricks, Java, Python, Scala, SparkSQ

Responsibilities

Feature Engineering Data Integration Develop and maintain feature engineering pipelines using Data bricks to support ML models effectively
Data Pipeline Development Integrate diverse data sources eg clickstreams user behavior demographic data to create user behavior features profiles for complex ML tasks
Medallion Architecture Design and implement ETL, ELT pipelines aligned with the bronze silver and gold layers of the medallion architecture
Model Support Build data pipelines to support ML model training calibration and deployment leveraging MLflow for experiment tracking and performance monitoring
Query Optimization Low Latency Pipelines Design low latency production ready data pipelines to support real-time and batch model inference
CICD Practices Apply CICD principles for seamless pipeline deployment
Data Governance Ensure pipelines comply with security and regulatory standards particularly for handling PII and maintain metadata and master data across the data catalogue
Collaboration Work closely with ml scientists ml engineers and other stakeholders to align data transformation with business objectives