Data Engineering Pipeline Engineer – Role Description at Innoventes

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

14 Jun, 26

Salary

0.0

Posted On

16 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Scala, SQL, Apache Spark, Kafka, Flink, Airflow, ETL/ELT, AWS, GCP, Azure, dbt, Data Modeling, CI/CD, Terraform, Pulumi

Industry

technology;Information and Internet

Description

Data Engineering Pipeline Engineer – Role ABOUT THE ROLE We are looking for a skilled Data Engineering Pipeline Engineer to design, build, and maintain scalable data infrastructure that powers our analytics, machine learning, and business intelligence platforms. You will work across the full data lifecycle — from ingestion and transformation to storage and delivery — ensuring reliability, performance, and governance at every stage. This is a high-impact role where you will collaborate closely with data scientists, analysts, and platform engineers to ship robust pipelines that enable data-driven decisions across the organization. KEY RESPONSIBILITIES • Design, build, and maintain scalable batch and real-time data pipelines using tools such as Apache Spark, Kafka, Flink, or Airflow • Develop and optimize ETL/ELT workflows to ingest data from diverse sources including APIs, databases, event streams, and flat files • Architect and manage cloud-based data infrastructure on AWS, GCP, or Azure (e.g., S3, BigQuery, Redshift, Databricks, Snowflake) • Implement data quality monitoring, alerting, and observability frameworks to ensure pipeline reliability and SLA compliance • Collaborate with data scientists and ML engineers to support model training, feature engineering, and inference pipelines • Partner with analytics engineers to maintain and evolve data warehouse models (dbt, dimensional modeling) • Define and enforce data governance standards including cataloging, lineage tracking, and access control policies • Optimize pipeline performance through profiling, query tuning, partitioning strategies, and cost management • Document pipeline architecture, data contracts, and runbooks for operational clarity • Participate in on-call rotation and incident response for critical data infrastructure REQUIRED QUALIFICATIONS • 4+ years of experience in data engineering, ETL development, or a related software engineering discipline • Proficiency in Python and/or Scala for pipeline development; strong SQL skills across multiple dialects • Hands-on experience with distributed processing frameworks such as Apache Spark, Beam, or Flink • Experience with workflow orchestration tools such as Apache Airflow, Prefect, or Dagster • Deep familiarity with cloud data platforms (AWS, GCP, or Azure) and managed services such as BigQuery, Redshift, or Synapse • Experience designing and maintaining data warehouses or lakehouses (Snowflake, Databricks, Delta Lake, Iceberg) • Strong understanding of data modeling concepts: normalization, star/snowflake schema, slowly changing dimensions • Experience with streaming and event-driven architectures using Kafka, Kinesis, or Pub/Sub • Familiarity with CI/CD practices and infrastructure-as-code tools (Terraform, Pulumi) for data platform deployments • Excellent communication skills with the ability to translate business requirements into technical solutions

Responsibilities

The engineer will design, build, and maintain scalable batch and real-time data pipelines using various tools and manage cloud-based data infrastructure across ingestion, transformation, storage, and delivery stages. Responsibilities also include implementing data quality monitoring, collaborating with data scientists on ML pipelines, and enforcing data governance standards.