Senior Site Reliability Engineer - Cloud Applications at Opswerks

Mandaluyong, Mandaluyong, Philippines -

Full Time

Start Date

Immediate

Expiry Date

28 Apr, 25

Salary

0.0

Posted On

28 Jan, 25

Experience

3 year(s) or above

Remote Job

Telecommute

Sponsor Visa

Skills

Ansible, Microservices, Communication Skills, Splunk, Operations, Version Management, Rhel, Puppet, Servicenow, Messaging, Yaml, Json, Infrastructure, Centos, Kubernetes, Distributed Systems, Hosted Services, Salt, Docker, Git, Content Delivery, Troubleshooting

Industry

Information Technology/IT

Description

YOUR QUALIFICATIONS

Bachelor’s degree in any Information Technology or Engineering course
Demonstrated ability in supporting critical production services and improving operations through automations and process enhancements.
Subject Matter Expert on the following subjects: Platform as a Service support, Distributed Systems and Microservices particularly on the fields of hosted services such as Content Delivery, Messaging, API gateways and proxies.
Strong communication skills, both written and verbal.
At least 5 years’ experience working with the following:
Linux Administration: RHEL, CentOS, or other Unix-like systems.
Server and Infrastructure Troubleshooting: Hardware and OS Configuration
Logging and monitoring: Splunk, Grafana, Prometheus
Container Orchestration: Docker, Kubernetes
Incident management: PagerDuty, ServiceNow
Data serialization formats and structured systems: APIs, JSON, YAML.
At least 3 years’ experience working with the following:
Distributed Application Support: Experience in supporting several applications running in microservices implementation.
Version Management and CICD: Git, Spinnaker
Infrastructure Config Management: Puppet, Ansible, Salt

Responsibilities

Serve as subject matter expert for distributed application systems that reside in hybrid cloud platforms.
Champion and drive operational improvements using insights from metrics and customer feedback.
Lead incident response and post-incident reviews.
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
Manage and maintain enterprise applications and cloud-based systems using tools and frameworks designed for secure and scalable in-house deployments.
Monitor and optimize the health and performance of applications and platforms.
Debug problems reported by partners and end-users using in-depth log analysis and stack traces.
Create comprehensive documentation for operational procedures and environment setup.
Eliminate operational toil through automation or process improvements.
Be a member of a 24x7 shifting rotation.