Senior Software Engineer - Plazma Team

at  Treasure Data

Vancouver, BC, Canada -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate19 Dec, 2024USD 120000 Annual23 Sep, 20245 year(s) or aboveJava,Data Transformation,Relocation,It,Unstructured Data,Operations,Computer Science,Open Source,Data Security,Cassandra,Mysql,Analytical Skills,Design,Data Integration,Data Privacy,Foss,Python,Nosql,Data Governance,Graphs,Software,Data VisualizationNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

TREASURE DATA:

At Treasure Data, we’re on a mission to radically simplify how companies use data to create connected customer experiences. Our sophisticated cloud-based customer data platform drives operational efficiency across the enterprise to deliver powerful business outcomes in a way that’s safe, flexible, and secure.
We are thrilled that Gartner Magic Quadrant has recognized Treasure Data as a Leader in Customer Data Platforms for 2024! It’s an honor to be acknowledged for our efforts in advancing the CDP industry with cutting-edge AI and real-time capabilities. View the report here.
Furthermore, Treasure Data employees are enthusiastic, data-driven, and customer-obsessed. We are a team of drivers—self-starters who take initiative, anticipate needs, and proactively jump in to solve problems. Our actions reflect our values of honesty, reliability, openness, and humility.

REQUIRED QUALIFICATIONS:

  • A BS or higher in Computer Science or equivalent experience
  • Deep understanding of the capabilities of Presto/Trino or Hadoop/Hive
  • Solid understanding of cloud architecture and services in public clouds like AWS, GCP, or Microsoft Azure
  • Experience in developing use cases, functional specs, design specs, ERDs etc.
  • Experience working with distributed scalable Big Data stores or NoSQL, including HDFS, S3, Cassandra, Big Table, etc.
  • At least 5 years’ experience:
  • Distributed computing with Java, and at least one of: Scala, Ruby, or Python
  • Working with and tuning the JVM
  • With distributed massive parallel processing (MPP) engines
  • Operating production scale Presto/Hadoop deployments
  • With MySQL, PostgreSQL or other open-source distributed databases/key-value stores
  • Strong analytical skills
  • A solid understanding of computer science (algorithms, data structures etc.)
  • Solid experience on project and team management and handling Big Data problems
  • Able to work independently as well as in a team
  • Strong capability in researching, evaluating and implementing new and improved data solutions for multi-tenant environments
  • Demonstrate experience designing, deploying and maintaining architectures across the following technical areas:
  • Data storage: petabyte scale architectures
  • Knowledge discovery: advanced knowledge in entity and relationship extraction from unstructured data
  • Data governance: experienced in developing and integrating software allowing for flexible and scalable data transformation with data quality controls
  • Data visualization: knowledge of tools that are cost-effective and make it easy for end users to better understand and produce reports and graphs
  • Data Security and Privacy: knowledge of data privacy and security and system implementation

It would be nice if you had:

  • Familiar with microservices-based software architecture
  • Expertise in Data Integration patterns
  • Strong track record to drive rapid prototyping and design for Big Data
  • Experience with extending Free and Open–Source Software (FOSS) or COTS products
  • Strong IT & Security skill sets and knowledge
  • Experience with the design and development of multiple object–oriented systems
  • Good understanding of ‘infrastructure as code’ and operations

PHYSICAL REQUIREMENTS:

Must be located in the Greater Vancouver, BC CAN area. Relocation not supported.

TRAVEL REQUIREMENTS:

Travel requirements typically amount to approximately 5% of the year, including one week in Japan annually, another in Mountain View, CA, and possibly an additional week elsewhere.

Responsibilities:

YOUR ROLE:

The Query Engine team at Treasure Data is one of the essential elements of our CDP solution and is part of the Core Services group, which supports customer data ingestion and availability at a rate of 70B records per day. You are expected to play an influential role in the team by shaping the future of our Hadoop/Hive & Presto/Trino query engines. This includes maintaining technical excellence to address challenges that currently lack industry-wide solutions, as well as the ability to understand industry movements and translate them into tangible milestones. Our team consists of Big Data experts across Japan, Korea and Canada who are passionate about OSS contribution, and we take pride in the quality of service we offer.

RESPONSIBILITIES & DUTIES:

  • Work as a technical lead for designing and developing Presto/Trino & Hadoop/Hive products
  • Be responsible for providing technical thought leadership and solution expertise around Presto/Trino & Hadoop/Hive technologies. This includes technology assessment, use case development, as well as solution outline and design for modern data architectures
  • Be responsible for architecture vision and design of next generation data infrastructure based on Presto/Trino & Hadoop/Hive technology
  • Research in the areas of advanced Big Data techniques, including data ingestion, data processing, data integration, data access, data visualization, data discovery, statistical methods, database design, and implementation
  • Define and achieve the strategic roadmap for the Big Data architecture and design the overall data platform
  • Establish standards and guidelines for the design & development, tuning, deployment, and maintenance of advanced data access frameworks and distributed systems
  • Define new data infrastructure platforms to compute data. Research and develop new data management solutions, approaches and techniques for data systems
  • Document architectural standards and technology frameworks
  • Work with management and the product team to set up the roadmap for Presto/Trino & Hadoop/Hive related products based on operational needs and customer requested features
  • Lead the Presto/Trino & Hadoop/Hive team to design, develop, and test related products
  • Mentor and train new members in the team
  • Version and release management of Presto and Hive products
  • Evaluate, test, and set a base version
  • Back port any needed patches from trunk, which contains the latest cutting-edge version of the project, but therefore may also be the most unstable version
  • Deploy new customer-facing features for Presto/Trino & Hadoop/Hive
  • Coordinate with support and product teams on product releases
  • Make contributions to the Presto/Trino & Hadoop/Hive open source community
  • Contribute to Presto/Trino & Hadoop/Hive open source community with bug fixes and new features
  • Present topics in Presto/Trino & Hadoop/Hive related technical conferences
  • Work with the Site Reliability team to automate Presto/Trino & Hadoop/Hive cluster operations to reduce operational overhead
  • Design, develop, and evaluate metrics to ensure system health and plan infrastructure capacity of clusters
  • Design and develop scripts to automatically start and stop clusters and switch traffic between active clusters for load balancing of customers’ workloads
  • Design and develop failure recovery tools to automatically detect the occurrence of faults and recover faulty clusters
  • Provide in-depth support services to Presto/Trino & Hadoop/Hive customers
  • Take responsibility for on-call to support Presto/Trino & Hadoop/Hive customers
  • Deal with escalations on product defects and performance issues, lead and perform in depth troubleshooting of Presto/Trino & Hadoop/Hive related systems
  • Design and develop custom user-defined functions (UDF)


REQUIREMENT SUMMARY

Min:5.0Max:10.0 year(s)

Information Technology/IT

IT Software - DBA / Datawarehousing

Software Engineering

BSc

Computer Science

Proficient

1

Vancouver, BC, Canada