Databricks Lead at NTT DATA

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

13 Apr, 26

Salary

0.0

Posted On

13 Jan, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Databricks, PySpark, Delta Lake, Databricks Workflows, CI/CD, Amazon Redshift, Data Quality, Auditing Frameworks, Informatica, AWS Glue, Lambda, JSON, Structured Streaming, Code Reviews, Mentoring, Technical Governance, Pipeline Standardization

Industry

IT Services and IT Consulting

Description

Develop high-performance PySpark pipelines in Databricks, orchestrated using Databricks Workflows. Ingest JSON files from S3, flatten and transform them, and push curated outputs to Amazon Redshift. Implement data quality and auditing frameworks, including schema validation, record counts, and exception tracking. Leverage and extend established CI/CD processes using Databricks Asset Bundles (DAB), Git, and release pipelines. Monitor pipeline performance and apply best practices to ensure cost-effective Databricks usage. Serve as the offshore lead, participating in design discussions, daily standups, and technical governance with client teams. Lead technical planning and execution for ETL migration from Informatica, Glue, and Lambda into Databricks. Drive evolution of modular code patterns, pipeline standardization, and DQ reusability across multiple data domains. Guide junior engineers and ensure high delivery velocity with sustained code quality. Collaborate with client architects to align platform capabilities with business and analytics needs. 10+ years in data engineering, with 3+ years of recent hands-on Databricks (AWS) experience. Proficiency in PySpark, Delta Lake, Databricks Workflows, and structured streaming (if required). Strong experience working with semi-structured data (JSON), including flattening and normalization strategies. Experience writing performant pipelines that push processed data to Amazon Redshift. Deep understanding of Databricks Asset Bundles (DAB) and how they support CI/CD, environment promotion, and modular code packaging. Familiarity with CI/CD processes in Databricks including Git integration, testing, and environment management. Strong grasp of data quality, validation, audit logging, and exception handling strategies. Proven experience as an offshore team lead in a global delivery model. Ability to mentor, conduct code reviews, and enforce engineering standards. Strong verbal and written communication skills to engage directly with client-side architects and product owners. Familiarity with Informatica, AWS Glue, or Lambda-based ETL pipelines. Understanding of cost-performance trade-offs in cloud-native data processing. Exposure to the hospitality industry or high-volume transactional data pipelines." LI-NorthAmerica

Responsibilities

Develop high-performance PySpark pipelines in Databricks and lead technical planning for ETL migration from various platforms into Databricks. Serve as the offshore lead, participating in design discussions and ensuring high delivery velocity with sustained code quality.