Data engineer with data bricks at ZENSAR TECHNOLOGIES SINGAPORE PTE LTD

Hyderabad, Telangana, India -

Full Time

Start Date

Immediate

Expiry Date

24 Aug, 26

Salary

0.0

Posted On

26 May, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Databricks, Apache Spark, Delta Lake, Python, Java, SQL, AWS, ETL/ELT, MLflow, CI/CD, Bash, Data Pipeline Design, Performance Tuning, Data Integration, Cloud Infrastructure, Data Governance

Industry

IT Services and IT Consulting

Description

Data Engineer With Data Bricks- GCH - Job Description Data Bricks * Design & Build Data Pipelines: Develop, optimize, and manage large-scale data engineering workflows using Databricks and Apache Spark, ensuring scalable and efficient data processing. * Data Lake & Delta Lake: Build and manage lake houses/data lakes using Databricks Delta Lake to ensure high-performance, scalable, and reliable storage for structured and unstructured data. * Collaboration with Cross-Functional Teams: Work with data scientists, analysts, and business stakeholders to understand data needs, provide insights, and deliver solutions using Databricks. * Performance Tuning & Optimization: Tune and optimize Spark-based jobs and other Databricks workflows for high performance and low latency across large datasets. * Data Integration: Integrate and orchestrate data from multiple sources (e.g., databases, APIs, cloud services) into the Databricks environment, ensuring data quality and consistency. * Cloud Infrastructure Management: Leverage cloud platforms (e.g., AWS, Azure, GCP) for efficient use of Databricks clusters, ensuring cost optimization and scalability. * Machine Learning Pipeline Development: Collaborate with data scientists to design and implement end-to-end machine learning pipelines, utilizing MLflow for model tracking, versioning, and deployment. * Model Deployment & Monitoring: Collaborate on model deployment strategies within Databricks, using MLflow for tracking, managing experiments, and deploying models into production environments. * Automation & CI/CD: Implement continuous integration and continuous delivery (CI/CD) pipelines to automate the deployment of data pipelines and machine learning models. Data Engineer * Assist in designing and implementing scalable and robust processes for ingesting and transforming complex datasets. * Designs, develops, constructs, maintains and supports data pipelines for ETL from a multitude of sources. * Creates blueprints for data management systems to centralize, protect, and maintain data sources. * Focused on data stewardship and curation, the data engineer enables the data scientist to run their models and analyses to achieve the desired business outcomes * Ingest large, complex data sets that meet functional and non-functional requirements. * Enable the business to solve the problem of working with large volumes of data in diverse formats, and in doing so, enable innovative solutions. * Design and build bulk and delta data lift patterns for optimal extraction, transformation, and loading of data. * Supports the organisation’s cloud strategy and aligns to the data architecture and governance including the implementation of these data governance practices. * Engineer data in the appropriate formats for downstream customers, risk and product analytics or enterprise applications. * Assist in identifying, designing and implementing robust process improvement activities to drive efficiency and automation for greater scalability. This includes looking at new solutions and new ways of working and being on the forefront of emerging technologies. * Work with various stakeholders across the organization to understand data requirements and apply technical knowledge of data management to solve key business problems. * Provide support in the operational environment with all relevant support teams for data services. * Provide input into the management of demand across the various data streams and use cases. * Create and maintain functional requirements and system specifications in support of data architecture and detailed design specifications for current and future designs. * Support test and deployment of new services and features. * Provides technical leadership to junior data engineers in the team MINIMUM QUALIFICATIONS/EXPERIENCE (REQUIRED FOR THE JOB) * Matric, with a degree in Computer Science, Business Informatics, Mathematics, Statistics, Physics or Engineering. * 6+ years of data engineering experience  * 6+ years of experience with any data warehouse technical architectures, ETL/ELT, and reporting/analytics tools including , but not limited to , any of the following combinations (1) SSIS ,SSRS or something similar (2) ETL Frameworks, (3) Spark (4) AWS data builds * The candidate having DBA ability and knowledge across at least 2 platforms (example: TSQL, SAS, PSQL, IBM VSAM, DynamoDB and DB2) will also be beneficial. * Should be at least at a proficient level in at least one of Python or Java * Should be Proficient in bash SQL * Some experience with R, AWS, XML, json, cron will be beneficial * Experience with designing and implementing Cloud (AWS) solutions including use of APIs available. * Some experience with Dev/OPS architecture, implementation and operation would be advantageous. * Knowledge of Engineering and Operational Excellence using standard methodologies. Best practices in software engineering, data management, data storage, data computing and distributed systems to solve business problems with data. * Some experience in applying SAFe/Scrum/Kanban methodologies. * Knowledge and understanding of business process management lifecycle which covers the design, modelling, execution, monitoring, and optimization as well as business process re-engineering. * Good problem solving skills: The ability to exercise judgment in solving technical, operational, and organizational challenges, to identify issues proactively, to present solutions and options leading to resolution

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Design, build, and optimize large-scale data pipelines and lakehouses using Databricks and Apache Spark. Collaborate with cross-functional teams to implement machine learning pipelines and ensure efficient data integration from multiple sources.