Sr. ML Data Engineer (ETL
at CEDENT
United States, North Carolina, USA -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 30 Apr, 2025 | Not Specified | 01 Feb, 2025 | N/A | Query Optimization,Master Data Management,Java,Data Security,Data Engineering,Sql,Python,Scala,Pipeline Development,Data Processing,Data Preparation,Analytics,Apache Spark,Pii,Data Manipulation,Metadata,Ml | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
QUALIFICATIONS:
- 7 years in data engineering and at least 4 years focusing on ML feature engineering ETL pipeline development and data preparation for ML
- Proven experience managing pipelines on Data bricks using Apache Spark with a strong understanding of the medallion architecture
- Familiarity with ML lifecycle management with MLflow experience as a strong plus and advanced skills in Apache Spark PySpark for big data processing and analytics
- Proficient in Python for data manipulation and SQL for query optimization
- Experience building pipelines for real-time and batch model serving in production environments and knowledge of CICD practices for ETLELT pipeline development
- Expertise in metadata and master data management within technical data catalogues
- Understanding of data security and compliance especially with sensitive data like PII
Mandatory Skills: : Apache Spark, Databricks, Java, Python, Scala, SparkSQ
Responsibilities:
- Feature Engineering Data Integration Develop and maintain feature engineering pipelines using Data bricks to support ML models effectively
- Data Pipeline Development Integrate diverse data sources eg clickstreams user behavior demographic data to create user behavior features profiles for complex ML tasks
- Medallion Architecture Design and implement ETL, ELT pipelines aligned with the bronze silver and gold layers of the medallion architecture
- Model Support Build data pipelines to support ML model training calibration and deployment leveraging MLflow for experiment tracking and performance monitoring
- Query Optimization Low Latency Pipelines Design low latency production ready data pipelines to support real-time and batch model inference
- CICD Practices Apply CICD principles for seamless pipeline deployment
- Data Governance Ensure pipelines comply with security and regulatory standards particularly for handling PII and maintain metadata and master data across the data catalogue
- Collaboration Work closely with ml scientists ml engineers and other stakeholders to align data transformation with business objectives
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - DBA / Datawarehousing
Software Engineering
Graduate
Proficient
1
United States, USA