Senior Data Engineer at Sigma Software

Warsaw, Masovian Voivodeship, Poland -

Full Time

Start Date

Immediate

Expiry Date

15 Sep, 26

Salary

0.0

Posted On

17 Jun, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, SQL, Apache Spark, PySpark, Databricks, Azure, Kafka, ETL/ELT, Data Modelling, Apache Airflow, Terraform, RAG Pipeline Design, LLM Integration, dbt, MLflow, Data Governance

Industry

Software Development

Description

Company Description Are you passionate about building cutting-edge, AI-ready data platforms from the ground up? We are looking for a Senior Data Engineer to join our Data Engineering Team and lead high-impact, greenfield initiatives. You will work on building modern cloud-native data platforms, migrating on-premises legacy systems to the cloud, and laying the architectural foundation for AI-ready data infrastructure. In this role, you will collaborate closely with Machine Learning, Data Science, and Product teams, serving as a key technical contributor and thought leader. You will also drive R&D efforts around agentic AI architectures, event-driven systems, and LLM-ready data pipelines – turning architectural concepts into production-grade solutions. Job Description Design and build scalable, cloud-native data platforms from greenfield to production Implement near-real-time ingestion pipelines using event-driven patterns Define and enforce platform standards, including Data Lake / Lakehouse principles, medallion architecture, and data contracts Refactor and optimise existing Spark and PySpark scripts for performance and maintainability Introduce best practices for code quality, testing, and CI/CD across data pipelines Drive adoption of AI tooling and agentic workflows within the data engineering team Ensure data quality, observability, and reliability across all pipelines and platforms Develop self-service tooling and microservices to simplify platform usage for other teams Qualifications 5+ years of professional experience in Data Engineering Strong Python and SQL development skills for pipeline development and optimisation Proficiency in Apache Spark / PySpark, including query optimisation and performance tuning Hands-on experience with Databricks (preferred) or Snowflake Experience with at least one major cloud provider: Azure (preferred), AWS, or GCP Experience with stream processing technologies (Kafka, Spark Structured Streaming) Solid understanding of ETL/ELT patterns, data modelling (dimensional, Data Vault), and data warehousing Experience with orchestration tools (Apache Airflow, Azure Data Factory, or equivalent) Knowledge of Infrastructure as Code (Terraform or equivalent) Understanding of production-grade system requirements: reliability, scalability, observability, and performance Upper-Intermediate English level WILL BE A PLUS Familiarity with RAG pipeline design and LLM integration patterns Knowledge of data governance frameworks and tools (Unity Catalog, Apache Atlas, or similar) Experience with dbt for data transformation and modelling Familiarity with MLflow, Feature Stores, or ML platform integration Additional Information PERSONAL PROFILE Self-driven and proactive in identifying improvements Comfortable working in a fast-paced, innovative environment Strong problem-solving mindset with attention to detail Open to experimenting with emerging technologies and approaches

Responsibilities

Design and build scalable, cloud-native data platforms and near-real-time ingestion pipelines from the ground up. Lead the migration of legacy systems to the cloud and develop AI-ready data infrastructure and agentic workflows.