Data Engineer at Scapia

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

27 May, 26

Salary

0.0

Posted On

26 Feb, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Apache Spark, Databricks, ETL, Delta Lake, Apache Iceberg, Trino, Parquet, PySpark, Scala, SQL, Data Lake, Data Ingestion, Data Transformation, Data Quality, PII Handling, Performance Optimization

Industry

Financial Services

Description

We are looking for a skilled Data Engineer to join our Data Platform team and help build scalable, reliable, and secure data infrastructure. The ideal candidate will be responsible for designing and developing distributed data pipelines, working with large-scale datasets and enabling downstream analytics and product use-cases through a modern lakehouse architecture. You will work closely with backend, analytics, and platform teams to build high-performance ETL pipelines and optimize data workflows across batch and near real-time systems. Key Responsibilities • Design, develop, and maintain scalable batch data pipelines using Apache Spark • Build and manage ETL workflows on Databricks or equivalent distributed compute platforms • Work with data lake storage formats such as Delta Lake / Apache Iceberg • Implement efficient data ingestion, transformation, and processing pipelines • Query and process large datasets using distributed SQL engines such as Trino • Work with columnar file formats such as Parquet for optimized storage and processing • Ensure data quality, consistency, and reliability across ingestion and transformation layers • Implement best practices for handling PII and sensitive data in compliance with security and governance standards • Optimize data processing jobs for performance and cost efficiency • Collaborate with cross-functional teams to understand data requirements and enable data-driven product features • Monitor, troubleshoot, and improve pipeline reliability and performance Profile: • 3-4 years of experience in Data Engineering or related roles • Strong hands-on experience with Apache Spark (PySpark / Scala / SQL) • Experience working with Databricks or similar distributed data platforms • Experience with Trino / Presto for distributed querying • Solid understanding of ETL/ELT concepts and data pipeline design • Experience working with data lake architectures • Hands-on experience with Delta Lake / Apache Iceberg • Strong experience working with Parquet or other columnar storage formats • Experience working with large-scale structured/semi-structured datasets • Familiarity with handling and processing PII / sensitive data securely • Understanding of data partitioning, schema evolution, and performance tuning Good to Have • Experience with workflow orchestration tools (Airflow, Dagster, etc.) • Familiarity with AWS/GCP/Azure data ecosystem • Experience with streaming frameworks such as Kafka • Knowledge of data governance and access control mechanisms • Exposure to lakehouse architecture patterns Education Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)

Responsibilities

The role involves designing, developing, and maintaining scalable, reliable, and secure distributed data pipelines using a modern lakehouse architecture. Key tasks include building high-performance ETL workflows and optimizing data processing across batch and near real-time systems.