SITE RELIABILITY ENGINEER at Trinity HR Solutions Pte Ltd
Singapore, Southeast, Singapore -
Full Time


Start Date

Immediate

Expiry Date

08 Sep, 25

Salary

9900.0

Posted On

09 Jun, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Kubernetes, Computer Engineering, Continuous Integration, Firewalls, Jenkins, Shell Scripting, Computer Science, Kibana, Aws, Organization Skills

Industry

Information Technology/IT

Description

We are looking for highly motivated individuals who are interested to join our Site Reliability organization for our next generation of Investment and Insurance products as a Site Reliability Engineer. Successful candidates will be working closely with our internal users and development teams to ensure production systems stability in a highly competitive environment.

KEY ACCOUNTABILITIES

  • Manage, monitor and operate the system to ensure all business functions are running smoothly.
  • Work across teams to continually review, provide feedback, implement best practices to improve the efficiency of the systems and drive future innovation.
  • Manage on-going changes while retaining high levels of service availability to our customer base.
  • Pragmatically identify root cause for production incidents and lead to implement necessary actions to prevent recurrence.
  • Drive incident management process and support a blameless post-mortem culture.
  • Automate the system operations to reduce Toil and attain high level of efficiency.

REQUIREMENTS

  • Bachelor’s Degree/Diploma in Computer Science, Computer Engineering, or Computer Application. Equivalent experience may be considered.
  • 3+ years of experience working in supporting critical applications using API driven technologies
  • 2+ years of hands on experience in Python development (preferably with RESTFUL APIs)
  • 2+ years of working with a modern stack (AWS, PCF, containers, or Kubernetes)
  • 2+ years of Continuous Integration and Continuous Delivery experience through Jenkins or equivalent.
  • Experience with modern observability tools such as Grafana, Kibana, or Prometheus preferred.
  • Experience working in an Agile (SAFE or Kanban) environment preferred.
  • Knowledge and/or experience using SQL and Linux Shell scripting
  • Basic understanding of firewalls, load balancers, and networking concepts.
  • Communication skills with all levels and team work spirits are essential.
  • Proactive with good analytical and organization skills.
  • Ability to work independently and multi-tasking .
Responsibilities

JOB PURPOSE

  • Working with our business units, development teams, and many other units to help maintain the high quality and service level objectives of our systems.
  • Optimize the supportability of systems through automation and applying basic SRE principles such as blameless post-mortem, error budget, and automation.
  • Provide production support for the application domain when applicable.

RESPONSIBILITIES

  • Participate in platform operations management and capacity management.
  • Coordinate and implement platform/infrastructure upgrades and releases with technical and business teams.
  • React to critical issues immediately - troubleshoot, investigate and apply appropriate solutions to normalise systems operations.
  • Provide off-hour/weekend support to ensure production systems stability.
  • Troubleshoot problems across a wide range of technical skills (development, CI/CD, infrastructure, etc)
  • Maintain awareness of relevant technical and product trends with self-learning and job shadowing.
  • Create and maintain the operational documents to reflect system changes and upgrades.
  • Ability to communicate effectively, professionally and comfortably, both verbally and in writing across all levels.
Loading...