Site Reliability Engineer
at VERSENT

Melbourne VIC 3000, Victoria, Australia -

Start Date	Expiry Date	Salary	Posted On	Experience	Skills	Telecommute	Sponsor Visa
Immediate	24 Oct, 2024	Not Specified	24 Jul, 2024	N/A	Metrics,Code Review,Ec2,Programming Languages,Secure Coding,Python,Logging,Typescript,Backup,Github,Testing,Availability	No	No

Add to Wishlist Apply All Jobs

Required Visa Status:

Citizen	GC
US Citizen	Student Visa
H1B	CPT
OPT	H4 Spouse of H1B
GC Green Card

Employment Type:

Full Time	Part Time
Permanent	Independent - 1099
Contract – W2	C2H Independent
C2H W2	Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Stax is a home-grown, unique native AWS cloud management platform that provides the consistency, confidence and velocity customers need – straight out of the box. Our pre-built platform works hard beneath the surface to make migrating, maintaining and operating on AWS seamless. Customers have the capability to focus on their core business and innovate safely and securely. It’s why we decided to build a native AWS cloud management platform and be the best at it.
We value clarity, consistency and capability. Above all, we embrace the changes that come through an intelligent application of knowledge. We’ve cultivated an environment based on creativity. When people who care about their craft are given the freedom to explore possibilities without limits, amazing things happen. There are no cool cliques, just a hard-working team generating ideas and devising solutions in a creative and collaborative workspace.
Versent is looking for a passionate Site Reliability Engineer to join our (remote) Product Engineering team in the AEST timezone.
You will be working on an AWS cloud and cost management platform supporting over 30 customers worldwide.
You will be working with an established Versent engineering team and have the opportunity to collaborate with our customer’s own engineering teams, as well as SRE peers.
In this role you will be proactively monitoring the health of the platform to ensure availability, performance, and reliability. Day-to-day you will be completing deployments into production, responding to customer issues, resolving technical debt, and developing new features to improve the operational resiliency and efficiency of the product.
This role has an on-call rotation to respond and resolve operational incidents and implement corrective engineering.
You’ll get hands-on experience in contributing to detailed design work, story elaboration, participating in AGILE ceremonies and producing engineering artefacts to the agreed team ways of working standards. You’ll be able to independently contribute with the support of Lead and Senior Engineers.
The SRE’s are responsible for maintaining the availability of the platform for its customers and administrators.

Key Responsibilities on a day to day,

Maintain 24/7 support coverage for all production environments
Ensure all service desk tickets are acknowledged and completed within the given SLAs
Develop and maintain logging, monitoring and alerting on all APIs, factories, and network infrastructure
Develop and maintain backup and restore processes to ensure business continuity of the product within specified non-functional requirements
Provide and develop engineering improvements for system stability across the platform
Provide automated systems to allow self-service for wider internal teams to consume
Develop automation tooling to assist with manual processes
Actively maintain platform documentation and runbooks
Holding and recording post incident reviews, identifying route causes, and implementing corrective engineering
Produce suitably engineered code solutions to platform and operations problems
Act as an engineer to solve problems using domain knowledge and contribute to engineering outputs to a production level quality, through the production of code artefacts.
Take on technical advice from leaders via code reviews, technical guidance and training.
Complete stories within a specified timeline as per the agreed acceptance criteria
Contribute to work elaboration and design

Primary Skills required

AWS Ecosystem: Lambda, EC2, S3, VPC networking, RDS, IAM, Step Functions, CloudFormation
Software Development Practices: CI/CD (Buildkite, CodeBuild, etc), VCS (Git, Github), code review, testing, secure coding, containerisation
Service Reliability: Backup, Logging, Alerting, Metrics & RTO/RPO processes
Programming Languages: Python, Golang, TypeScript
Other: Excellent written and verbal communication, team collaboration, self-direction and initiative, on-call availability

Our values reflect the way we work. We’re a casual, inclusive bunch, with team members from a variety of backgrounds collaborating as a team to overcome challenges. Everyone is given space to learn and develop their skills and knowledge. We support each other in all ventures, whether attaining a new AWS certification or trying their hand at baking sourdough or brewing beer. We create remarkable experiences for our customers and we treat others the way we would like to be treated

Responsibilities:

Maintain 24/7 support coverage for all production environments
Ensure all service desk tickets are acknowledged and completed within the given SLAs
Develop and maintain logging, monitoring and alerting on all APIs, factories, and network infrastructure
Develop and maintain backup and restore processes to ensure business continuity of the product within specified non-functional requirements
Provide and develop engineering improvements for system stability across the platform
Provide automated systems to allow self-service for wider internal teams to consume
Develop automation tooling to assist with manual processes
Actively maintain platform documentation and runbooks
Holding and recording post incident reviews, identifying route causes, and implementing corrective engineering
Produce suitably engineered code solutions to platform and operations problems
Act as an engineer to solve problems using domain knowledge and contribute to engineering outputs to a production level quality, through the production of code artefacts.
Take on technical advice from leaders via code reviews, technical guidance and training.
Complete stories within a specified timeline as per the agreed acceptance criteria
Contribute to work elaboration and desig

REQUIREMENT SUMMARY

Experience:Min:N/AMax:5.0 year(s)

Industry:Information Technology/IT

Functional area of job:IT Software - Application Programming / Maintenance

Domain:Software Engineering

Qualifications:Graduate

English Proficiency:Proficient

Number of posts:1

Address of job:Melbourne VIC 3000, Australia

Site Reliability Engineer
at VERSENT

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

INDIA

AUSTRALIA

UNITED ARAB EMIRATES

Site Reliability Engineerat VERSENT

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

Site Reliability Engineer
at VERSENT