Senior Site Reliability Engineer at Bakkt

, , United States -

Full Time

Start Date

Immediate

Expiry Date

06 Apr, 26

Salary

0.0

Posted On

06 Jan, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Site Reliability Engineering, Monitoring Systems, Incident Management, Client Support, Automation, Process Improvement, Java, Springboot, GIT, SQL, DataDog, Cloud Platforms, Python, JIRA, Linux, Windows

Industry

Financial Services

Description

About Bakkt: Bakkt is architecting the future of finance, fueling a revolutionary "finternet" built on the transformative power of cryptocurrency. We're unlocking novel avenues for accessing, trading, and engaging with digital assets, forging a more interconnected and efficient financial ecosystem for individuals and institutions alike. Join us at the vanguard of this groundbreaking evolution! As a Site Reliability Engineer, you will be responsible for closely monitoring our production environments, swiftly addressing issues, and applying creative solutions to ensure the seamless operation of our platform. You will utilize your natural curiosity and strong problem-solving skills to investigate and resolve technical issues across our applications, services, databases & infrastructure. Responsibilities * Implement and manage robust monitoring systems to continuously track the functional and non-functional health and performance of our production systems. * Proactively identify anomalies and potential issues before they impact our clients. * Client Support: * Partner with software engineering, project management and customer success teams to respond to client requests and support inquires. * Work closely with our clients to provide support during integration, and ensure a positive experience. * Incident Management: * Lead escalation remediation's by working across multiple teams such as software engineering, devops, and project management for web applications and services running in a 24/7, always on, cloud platform environment. * Participate in an on-call rotation to address and resolve critical incidents outside of regular business hours. * Operations: * Execute and develop operational procedures necessary for service requests and incident response. * Maintain critical platform support knowledge, such as customer contact lists, vendor escalation procedures, scheduled job inventories, and operational playbooks. * Support planning and execution of production changes and software releases. * Automation: * Develop scripts and tools to automate repetitive tasks, streamline workflows, and improve the efficiency of the production support process. * Assist in the automation of customer operational tasks and ensures alignment with business requirements regarding customer facing processes such as customer order reconciliation. * Ensure timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, system monitoring and event log management. * Continuous Improvement: * Actively participate in process improvement initiatives, suggesting enhancements to observability, logging strategies, incident response procedures, and support workflows. Requirements * A bachelor’s degree in Computer Science, Information Technology or equivalent * 5+ years of application support and production support experience supporting cloud-based platforms using an SRE support model. * Proven track record in a production support/SRE role, demonstrating your ability to monitor and troubleshoot complex systems in highly available production environments. * Experience with common development tools and practices, including Java-based, Springboot environments and source control tools, such as GIT in a team environment * Demonstrated ability to understand application logs and and supporting various monitoring and visualization tools (e.g. Alertsite, LogStash, DataDog) * Excellent communication skills, both written and verbal, for effective interaction with technical and non-technical stakeholders. * Self-starter who can work independently and effectively across functional team environments. * Proven ability to learn new IT technologies and disciplines. Preferred * Ability to read and interpret Java, Angular, SQL and other software coding languages * Experience with GCP, Google Kubernetes Engine, Google Compute Engine * Experience with n-tier web and services application architectures and in Java-based, Springboot and Tomcat Environment. * Working knowledge of SQL Server * Experience with JIRA or other Service Desk tools * Experience with multiple OS platforms (Linux, Windows) * Experience with Mongo and scripting language like python Bakkt is devoted to having diversity in its workforce and is proud to be an equal opportunity employer. Bakkt does not make any employment decisions based on race, color, religion, sex, national origin, veteran status, disability, age, sexual orientation, gender identity of any other characteristic protected by law

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

The Senior Site Reliability Engineer will monitor production environments, address issues, and ensure seamless platform operation. Responsibilities include incident management, client support, and developing automation tools.