Senior Data Engineer, Network Clustering at NVIDIA

Yokneam Ilit, Haifa District, Israel -

Full Time

Start Date

Immediate

Expiry Date

20 May, 26

Salary

0.0

Posted On

19 Feb, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Data Ingestion, Data Transformation, ETL/ELT Workflows, Schema Evolution, Data Versioning, Data Validation, Databricks, Apache Spark, PySpark, Scala, Apache Kafka, Python, SQL, Cloud Platforms, Data Orchestration, Time-Series Data

Industry

Computer Hardware Manufacturing

Description

We are looking for an expert Data Engineer to build and evolve the data backbone for our R&D telemetry and performance analytics ecosystem. Responsibilities include processing raw, large quantities of data from live systems at the cluster level: hardware, communication units, software, and efficiency indicators. You’ll be part of a fast paced R&D organization, where system behavior, schemas, and requirements evolve constantly. Your mission is to develop flexible, reliable, and scalable data handling pipelines that can adapt to rapid change and deliver clean, trusted data for engineers and researchers. What you’ll be doing: Build flexible data ingestion and transformation frameworks that can easily handle evolving schemas and changing data contracts Develop and maintain ETL/ELT workflows for refining, enriching, and classifying raw data into analytics-ready form Collaborate with R&D, hardware, DevOps, ML engineers, data scientists and performance analysts to ensure accurate data collection from embedded systems, firmware, and performance tools Automate schema detection, versioning, and validation to ensure smooth evolution of data structures over time Maintain data quality and reliability standards, including tagging, metadata management, and lineage tracking Enable self-service analytics by providing curated datasets, APIs, and Databricks notebooks What we need to see: B.Sc. or M.Sc. in Computer Science, Computer Engineering, or a related field 12+ years of experience in data engineering, ideally in telemetry, streaming, or performance analytics domains Confirmed experience with Databricks and Apache Spark (PySpark or Scala) Understanding of streaming processes and their applications (e.g., Apache Kafka for ingestion, schema registry, event processing) Proficiency in Python and SQL for data transformation and automation Shown knowledge in schema evolution, data versioning, and data validation frameworks (e.g., Delta Lake, Great Expectations, Iceberg, or similar) Experience working with cloud platforms (AWS, GCP, or Azure) — AWS preferred Familiarity with data orchestration tools (Airflow, Prefect, or Dagster) Experience handling time-series, telemetry, or real-time data from distributed systems Ways to stand out from the crowd: Exposure to hardware, firmware, or embedded telemetry environments Knowledge of real-time analytics frameworks (Spark Structured Streaming, Flink, Kafka Streams) Understanding of system performance metrics (latency, throughput, resource utilization) Experience with data cataloging or governance tools (DataHub, Collibra, Alation) Familiarity with CI/CD for data pipelines and infrastructure-as-code practices With competitive salaries and a generous benefits package, NVIDIA is widely considered one of the technology world’s most desirable employers. Our team comprises some of the most forward-thinking and hardworking individuals in the industry. Due to unprecedented growth, our exclusive engineering teams are rapidly expanding. If you're a creative engineer with a real passion for technology, we want to hear from you. #LI-Hybrid NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.

Responsibilities

The role involves building and evolving the data backbone for R&D telemetry and performance analytics by processing large quantities of raw data from live cluster systems. The engineer will develop flexible, reliable, and scalable data handling pipelines to deliver clean, trusted data for researchers and engineers.