Data Engineer at Qode

, , Vietnam -

Full Time

Start Date

Immediate

Expiry Date

18 Aug, 26

Salary

0.0

Posted On

20 May, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Data Engineering, Python, PySpark, SQL, AWS, Apache Airflow, Databricks, dbt, MongoDB, PostgreSQL, CDC, ETL/ELT

Industry

Software Development

Description

About the Role We are looking for a Data Engineer to join our Data Platform team, focusing on building scalable production-grade data pipelines, ingestion systems, and migration workflows across operational and analytical data platforms. In this role, you will work with systems such as MongoDB, PostgreSQL/RDS, AWS, Airflow, Airbyte, Databricks, dbt, and related data platform tools. This is a good fit if you enjoy working with large-scale data systems, building reliable pipelines, and optimizing performance in a cloud-based environment. Responsibilities Design and build production-grade data pipelines using batch, streaming, incremental, and CDC-based patterns. Build ingestion workflows from operational systems such as MongoDB, PostgreSQL, RDS, APIs, and event streams. Design and operate data migration workflows, including full load, incremental sync, CDC replay, cutover, rollback, and reconciliation. Convert semi-structured or NoSQL data into reliable relational and analytical models. Build and optimize data processing jobs using Python, PySpark, Spark SQL, SQL, and Databricks. Orchestrate workflows using Apache Airflow and manage connectors using Airbyte or similar tools. Maintain data quality, observability, alerting, backfills, and production reliability across pipelines. Work with AWS services such as S3, Lambda, IAM, EC2, RDS, DMS, SQS, Kinesis, or similar services. Build modular, testable transformation layers using dbt where appropriate. Document data flows, source-to-target mapping, pipeline behavior, data contracts, and operational runbooks. Requirements 5+ years of experience in Data Engineering, not only analytics, BI, or reporting. Proven hands-on ownership of production data pipelines, including design, implementation, deployment, monitoring, debugging, and backfill/recovery. Strong experience building ETL/ELT pipelines with full load, incremental load, and CDC-based patterns. Good understanding of CDC correctness, including idempotency, deduplication, ordering, deletes/tombstones, late-arriving events, replay, and reconciliation. Hands-on experience with OLTP databases, preferably MongoDB and PostgreSQL/RDS. Practical experience with schema design, including relational modeling, constraints, indexing, normalization/denormalization, and source-to-target mapping. Experience migrating or syncing data between operational systems and analytical platforms, including validation, cutover, rollback, and reconciliation. Strong SQL skills, including joins, CTEs, window functions, query optimization, and debugging incorrect results. Strong Python or PySpark experience for data processing, automation, validation, and pipeline development. Experience with production pipeline reliability: retries, idempotency, monitoring, alerting, backfill, and incident handling. Hands-on AWS data pipeline experience using services such as S3, Lambda, IAM, RDS, DMS, SQS, Kinesis, Glue, or equivalent. Nice to have Advanced dbt modeling, metrics layers, semantic layer, or self-service analytics. BI/dashboarding experience. Great Expectations, Soda, or other data quality frameworks. Databricks Unity Catalog or governance/catalog tools. Terraform or Infrastructure as Code. Docker, Kubernetes, or EKS. CI/CD using GitHub Actions, GitLab CI, or similar. Data warehouse modeling for analytics marts. 👉 Our Benefit Packages: Attractive salary range and we are open to negotiate if you're a strong fit. Hybrid/Remote-friendly culture, work where you grow best! Flexible hours, async teamwork (we respect your focus time) Work equipment support Allowance for Certification & Skill Development Year-end bonus & performance-based rewards 22 paid leaves from your 5th year - take a full month off Career growth with personal coaching sessions Open, collaborative team culture - no micromanagement, only trust Tools & AI-powered workflows that make remote work easier About CoderPush CoderPush is a remote-first technology company that partners with startups and global businesses to build scalable, high-quality software products. We focus on long-term collaboration, clear communication, and delivering real impact through strong engineering and product thinking. Please find more at: https://coderpush.com/

Responsibilities

Design and build scalable production-grade data pipelines and ingestion workflows using batch, streaming, and CDC patterns. Optimize data processing jobs and maintain pipeline reliability, observability, and data quality across cloud-based environments.