Site Reliability Engineer

at  VERSENT

Melbourne VIC 3000, Victoria, Australia -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate24 Oct, 2024Not Specified24 Jul, 2024N/AMetrics,Code Review,Ec2,Programming Languages,Secure Coding,Python,Logging,Typescript,Backup,Github,Testing,AvailabilityNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Stax is a home-grown, unique native AWS cloud management platform that provides the consistency, confidence and velocity customers need – straight out of the box. Our pre-built platform works hard beneath the surface to make migrating, maintaining and operating on AWS seamless. Customers have the capability to focus on their core business and innovate safely and securely. It’s why we decided to build a native AWS cloud management platform and be the best at it.
We value clarity, consistency and capability. Above all, we embrace the changes that come through an intelligent application of knowledge. We’ve cultivated an environment based on creativity. When people who care about their craft are given the freedom to explore possibilities without limits, amazing things happen. There are no cool cliques, just a hard-working team generating ideas and devising solutions in a creative and collaborative workspace.
Versent is looking for a passionate Site Reliability Engineer to join our (remote) Product Engineering team in the AEST timezone.
You will be working on an AWS cloud and cost management platform supporting over 30 customers worldwide.
You will be working with an established Versent engineering team and have the opportunity to collaborate with our customer’s own engineering teams, as well as SRE peers.
In this role you will be proactively monitoring the health of the platform to ensure availability, performance, and reliability. Day-to-day you will be completing deployments into production, responding to customer issues, resolving technical debt, and developing new features to improve the operational resiliency and efficiency of the product.
This role has an on-call rotation to respond and resolve operational incidents and implement corrective engineering.
You’ll get hands-on experience in contributing to detailed design work, story elaboration, participating in AGILE ceremonies and producing engineering artefacts to the agreed team ways of working standards. You’ll be able to independently contribute with the support of Lead and Senior Engineers.
The SRE’s are responsible for maintaining the availability of the platform for its customers and administrators.

Key Responsibilities on a day to day,

  • Maintain 24/7 support coverage for all production environments
  • Ensure all service desk tickets are acknowledged and completed within the given SLAs
  • Develop and maintain logging, monitoring and alerting on all APIs, factories, and network infrastructure
  • Develop and maintain backup and restore processes to ensure business continuity of the product within specified non-functional requirements
  • Provide and develop engineering improvements for system stability across the platform
  • Provide automated systems to allow self-service for wider internal teams to consume
  • Develop automation tooling to assist with manual processes
  • Actively maintain platform documentation and runbooks
  • Holding and recording post incident reviews, identifying route causes, and implementing corrective engineering
  • Produce suitably engineered code solutions to platform and operations problems
  • Act as an engineer to solve problems using domain knowledge and contribute to engineering outputs to a production level quality, through the production of code artefacts.
  • Take on technical advice from leaders via code reviews, technical guidance and training.
  • Complete stories within a specified timeline as per the agreed acceptance criteria
  • Contribute to work elaboration and design

Primary Skills required

  • AWS Ecosystem: Lambda, EC2, S3, VPC networking, RDS, IAM, Step Functions, CloudFormation
  • Software Development Practices: CI/CD (Buildkite, CodeBuild, etc), VCS (Git, Github), code review, testing, secure coding, containerisation
  • Service Reliability: Backup, Logging, Alerting, Metrics & RTO/RPO processes
  • Programming Languages: Python, Golang, TypeScript
  • Other: Excellent written and verbal communication, team collaboration, self-direction and initiative, on-call availability

Our values reflect the way we work. We’re a casual, inclusive bunch, with team members from a variety of backgrounds collaborating as a team to overcome challenges. Everyone is given space to learn and develop their skills and knowledge. We support each other in all ventures, whether attaining a new AWS certification or trying their hand at baking sourdough or brewing beer. We create remarkable experiences for our customers and we treat others the way we would like to be treated

Responsibilities:

  • Maintain 24/7 support coverage for all production environments
  • Ensure all service desk tickets are acknowledged and completed within the given SLAs
  • Develop and maintain logging, monitoring and alerting on all APIs, factories, and network infrastructure
  • Develop and maintain backup and restore processes to ensure business continuity of the product within specified non-functional requirements
  • Provide and develop engineering improvements for system stability across the platform
  • Provide automated systems to allow self-service for wider internal teams to consume
  • Develop automation tooling to assist with manual processes
  • Actively maintain platform documentation and runbooks
  • Holding and recording post incident reviews, identifying route causes, and implementing corrective engineering
  • Produce suitably engineered code solutions to platform and operations problems
  • Act as an engineer to solve problems using domain knowledge and contribute to engineering outputs to a production level quality, through the production of code artefacts.
  • Take on technical advice from leaders via code reviews, technical guidance and training.
  • Complete stories within a specified timeline as per the agreed acceptance criteria
  • Contribute to work elaboration and desig


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

Graduate

Proficient

1

Melbourne VIC 3000, Australia