Sr. ML Data Engineer (ETL

at  CEDENT

United States, North Carolina, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate30 Apr, 2025Not Specified01 Feb, 2025N/AQuery Optimization,Master Data Management,Java,Data Security,Data Engineering,Sql,Python,Scala,Pipeline Development,Data Processing,Data Preparation,Analytics,Apache Spark,Pii,Data Manipulation,Metadata,MlNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

QUALIFICATIONS:

  • 7 years in data engineering and at least 4 years focusing on ML feature engineering ETL pipeline development and data preparation for ML
  • Proven experience managing pipelines on Data bricks using Apache Spark with a strong understanding of the medallion architecture
  • Familiarity with ML lifecycle management with MLflow experience as a strong plus and advanced skills in Apache Spark PySpark for big data processing and analytics
  • Proficient in Python for data manipulation and SQL for query optimization
  • Experience building pipelines for real-time and batch model serving in production environments and knowledge of CICD practices for ETLELT pipeline development
  • Expertise in metadata and master data management within technical data catalogues
  • Understanding of data security and compliance especially with sensitive data like PII
    Mandatory Skills: : Apache Spark, Databricks, Java, Python, Scala, SparkSQ

Responsibilities:

  • Feature Engineering Data Integration Develop and maintain feature engineering pipelines using Data bricks to support ML models effectively
  • Data Pipeline Development Integrate diverse data sources eg clickstreams user behavior demographic data to create user behavior features profiles for complex ML tasks
  • Medallion Architecture Design and implement ETL, ELT pipelines aligned with the bronze silver and gold layers of the medallion architecture
  • Model Support Build data pipelines to support ML model training calibration and deployment leveraging MLflow for experiment tracking and performance monitoring
  • Query Optimization Low Latency Pipelines Design low latency production ready data pipelines to support real-time and batch model inference
  • CICD Practices Apply CICD principles for seamless pipeline deployment
  • Data Governance Ensure pipelines comply with security and regulatory standards particularly for handling PII and maintain metadata and master data across the data catalogue
  • Collaboration Work closely with ml scientists ml engineers and other stakeholders to align data transformation with business objectives


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - DBA / Datawarehousing

Software Engineering

Graduate

Proficient

1

United States, USA