Data Engineer Intern at Cornspring

New York, New York, United States -

Full Time

Start Date

Immediate

Expiry Date

07 Jun, 26

Salary

30.0

Posted On

09 Mar, 26

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Sql, Pandas, PostgreSQL, Data Pipelines, Data Modeling, Data Integrity, System Thinking, Performance Tuning, Debugging, Data Unit Testing, Communication, Unstructured Data Processing, Azure Cloud Services, Generative Ai, Large Language Models

Industry

Financial Services

Description

Term: June – August 2026 (3 months, full-time) Permanent positions may be offered to selected candidates upon successful completion of their internship, subject to performance. About us Cornspring is an innovative FinTech start-up with a mission to empower Family Offices and Asset Owners with real-time, AI-driven data intelligence and portfolio insights. We are tackling one of the most complex and valuable challenges in finance. Our clients operate at the highest levels of global markets, managing billions in assets, yet they remain constrained by legacy systems that are slow, fragmented, and outdated. Cornspring is redefining Family Office services by applying state-of-the-art generative AI and Large Language Models to investment, accounting, and operational data. Our platform delivers faster insights, greater transparency, and entirely new ways of interacting with financial information. The role We are looking for an enthusiastic and driven Data Engineer Intern to build and maintain ingestion pipelines and validation / reconciliation logic. If you like system thinking, performance, reliability, and debugging complex problems with strong engineering discipline, we would love to speak with you. Key responsibilities Build efficient data pipelines for financial data flows whilst ensuring data and analytics are leveraged for future growth. Monitor data movement, transformation logic and performance of all pipelines as required to maintain data integrity and quality supplied to ML processes. Designing and implementing data solutions for new data and servicing initiatives and providing continuous oversight for existing solutions to maintain data accuracy, service quality and solution performance. Educational Background and Professional Experience Undergraduate (junior/senior preferred) or graduate students in Computer Science, Data Science, or a related field. Experience designing, building, and maintaining data pipelines for large-scale data solutions. Core Technical Proficiencies Proficiency in Python, pandas and data science libraries plus solid SQL skills for querying and manipulating large datasets. Experience with PostgreSQL is preferred. Ability to design scalable, secure data models and implement best practices for data storage and retrieval. Familiarity with AI for coding, machine learning applications and approaches to maintain high-quality, compliant data for AI-driven use cases. Experience with data unit testing is preferred. Good communication skills to collaborate with cross-functional teams - including data scientists, AI engineers, and business stakeholders - to deliver secure, high-performing data solutions. Experience with unstructured data processing is a plus. Experience with Azure cloud services is a plus. Compensation: $20-30 per hour Hybrid working: three days per week in our Manhattan office. An opportunity to work at the forefront of AI-driven finance, tackling genuinely complex and high-impact challenges. Hands-on exposure to advanced ML, LLM, and data technologies within real production systems. A collaborative, high-ownership start-up environment offering exceptional learning opportunities. The chance to help shape the future of Family Office and Asset Management technology.

Responsibilities

The intern will be responsible for building and maintaining efficient data ingestion pipelines and validation/reconciliation logic for financial data flows. This includes monitoring pipeline performance and designing data solutions for new initiatives while ensuring data accuracy for ML processes.