Data Engineer II - Enterprise Data & Analytics - Digital and Technology Par at Mount Sinai
New York, NY 10017, USA -
Full Time


Start Date

Immediate

Expiry Date

29 Jun, 25

Salary

90000.0

Posted On

30 Mar, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Cloud Security, Object Oriented Design, Data Engineering, Mirth, Node.Js, Data Services, Algorithms, Avro, Programming Languages, Power Bi, Spark, Unit Testing, Agile Methodologies, Java, Git, Python, Json, Django, Nlp, Sql, Tableau, Data Structures, Pipeline Development

Industry

Information Technology/IT

Description

EDUCATION REQUIREMENTS

Bachelor’s degree in Computer Science or a related discipline; Advanced degree preferred.

EXPERIENCE REQUIREMENTS

  • 4+ years of relevant professional experience, preferably in data engineering, data pipeline development, and data science workflows, in a Linux environment.
  • Strong knowledge of SQL and NoSQL databases, including Azure SQL, PostgreSQL/MySQL, and MongoDB or similar.
  • Proficiency with at least two programming languages among Python, Scala, Java, or Go, with the flexibility to quickly learn additional languages.
  • Hands-on experience working with RESTful APIs and services, preferably using Node.js, Django, or similar frameworks.
  • Demonstrated expertise in containerization technologies such as Docker and Kubernetes.
  • Solid understanding and practical experience with orchestration and automation tools like Airflow or Dataiku.
  • Experience in developing, deploying, and maintaining AI workflows, including computer vision/image processing, NLP, and data categorization models.
  • Proficiency in using DevOps tools and practices, including Git, CI/CD tools, automated unit testing, and GitHub Copilot or similar AI-assisted coding tools (e.g., ChatGPT).
  • Experience building scalable data architectures with reference tables and automated AI-driven updating mechanisms.
  • Familiarity with Hadoop, Spark, Kafka, and streaming data platforms and technologies.
  • Knowledge of healthcare data standards such as HL7, and familiarity with integration tools like Mirth is a significant plus.
  • Strong knowledge of Azure cloud data services, including Azure Databricks, Azure Fabric, Azure Data Factory, serverless computing, virtual machines, and cloud security.
  • Strong knowledge of data visualization tools such as Power BI or Tableau.
  • Strong knowledge of data structures, data formats, algorithms, and object-oriented design and practical experience with data serialization and storage formats such as Parquet, Avro, JSON, or ORC.
  • Experience working within Agile methodologies and tools, specifically JIRA, is highly desirable.
Responsibilities
  • Design, develop, and maintain scalable and reliable data pipelines using orchestration engines such as Airflow and Dataiku, ensuring seamless automation of data ingestion, transformation, and delivery processes.
  • Deploy and maintain containerized applications and pipelines, employing technologies like Docker and Kubernetes to achieve resilient and maintainable data workflows.
  • Develop, deploy, and operationalize AI workflows, including image processing, data categorization, and natural language processing (NLP) models, ensuring production-level reliability and performance.
  • Implement DevOps best practices, including version control with Git, continuous integration and continuous deployment (CI/CD) pipelines, automated testing frameworks, and unit testing to facilitate rapid, reliable, and high-quality software deployments.
  • Create and manage a scalable and maintainable data architecture, including designing reference tables and deploying AI-driven mechanisms to ensure reference tables remain accurate and current.
  • Develop and maintain comprehensive data dictionaries, enforce data quality metrics, implement anomaly detection solutions, and establish atomic rollback processes to effectively manage and rectify data errors.
  • Build robust data ingestion pipelines capable of handling diverse data sources, including AI-generated data, flat files, and RESTful APIs (both reading from and writing to endpoints).
  • Collaborate closely with agile teams consisting of Application Developers, Database Developers, and Data Scientists, actively participating in sprint planning, stand-ups, and retrospectives.
  • Create centralized documentation, diagrams, and metadata catalogs that clearly describe data solutions, facilitating knowledge sharing and ease of use.
  • Design, implement, and manage data system monitoring, backups, and disaster recovery plans, safeguarding the integrity, availability, and security of data.
  • Engage with stakeholders with a customer-focused approach, delivering solutions that align with scientific, research, and clinical objectives.
  • Ensure adherence to industry best practices, HIPAA compliance, and institutional data governance policies and procedures.
  • Maintain current knowledge of industry trends and emerging technologies, demonstrating flexibility and continuous learning to adapt and enhance skillsets relevant to data engineering.
  • Develop standards and best practices documentation related to data management, architecture, and maintenance, and provide training and presentations to team members as required.
  • Maintains strong desire to understand the “why” behind data
Loading...