Data Engineer - Python and Spark at Citi

Chennai, tamil nadu, India -

Full Time

Start Date

Immediate

Expiry Date

20 Mar, 26

Salary

0.0

Posted On

20 Dec, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Spark, Hadoop, Scala, Java, Hive, Kafka, Impala, Unix Scripting, SQL, NoSQL, ETL, Data Modeling, Big Data, Data Pipelines, Analytical Skills

Industry

Financial Services

Description

Ensuring high quality software development, with complete documentation and traceability Develop and optimize scalable Spark Java-based data pipelines for processing and analyzing large scale financial data Design and implement distributed computing solutions for risk modeling, pricing and regulatory compliance Ensure efficient data storage and retrieval using Big Data Implement best practices for spark performance tuning including partition, caching and memory management Maintain high code quality through testing, CI/CD pipelines and version control (Git, Jenkins) Work on batch processing frameworks for Market risk analytics Promoting unit/functional testing and code inspection processes Work with business stakeholders and Business Analysts to understand the requirements Work with other data scientists to understand and interpret complex datasets 5- 8 Years of experience in working in data eco systems. 4-5 years of hands-on experience in Hadoop, Scala, Java, Spark, Hive, Kafka, Impala, Unix Scripting and other Big data frameworks. 3+ years of experience with relational SQL and NoSQL databases: Oracle, MongoDB, HBase Strong proficiency in Python and Spark Java with knowledge of core spark concepts (RDDs, Dataframes, Spark Streaming, etc) and Scala and SQL Data Integration, Migration & Large Scale ETL experience (Common ETL platforms such as PySpark/DataStage/AbInitio etc.) - ETL design & build, handling, reconciliation and normalization Data Modeling experience (OLAP, OLTP, Logical/Physical Modeling, Normalization, knowledge on performance tuning) Experienced in working with large and multiple datasets and data warehouses Experience building and optimizing ‘big data' data pipelines, architectures, and datasets. Strong analytic skills and experience working with unstructured datasets Ability to effectively use complex analytical, interpretive, and problem-solving techniques Experience with Confluent Kafka, Redhat JBPM, CI/CD build pipelines and toolchain - Git, BitBucket, Jira Experience with external cloud platform such as OpenShift, AWS & GCP Experience with container technologies (Docker, Pivotal Cloud Foundry) and supporting frameworks (Kubernetes, OpenShift, Mesos) Experienced in integrating search solution with middleware & distributed messaging - Kafka Highly effective interpersonal and communication skills with tech/non-tech stakeholders. Experienced in software development life cycle and good problem-solving skills. Excellent problem-solving skills and strong mathematical and analytical mindset Ability to work in a fast-paced financial environment Bachelor's/University degree or equivalent experience in computer science, engineering, or similar domain ------------------------------------------------------ For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------

Responsibilities

Develop and optimize scalable Spark Java-based data pipelines for processing large scale financial data. Ensure high quality software development with complete documentation and maintain high code quality through testing and CI/CD pipelines.