Senior Site Reliability Engineer - Cloud Applications at Opswerks
Mandaluyong, Mandaluyong, Philippines -
Full Time


Start Date

Immediate

Expiry Date

28 Apr, 25

Salary

0.0

Posted On

28 Jan, 25

Experience

3 year(s) or above

Remote Job

No

Telecommute

No

Sponsor Visa

No

Skills

Ansible, Microservices, Communication Skills, Splunk, Operations, Version Management, Rhel, Puppet, Servicenow, Messaging, Yaml, Json, Infrastructure, Centos, Kubernetes, Distributed Systems, Hosted Services, Salt, Docker, Git, Content Delivery, Troubleshooting

Industry

Information Technology/IT

Description

YOUR QUALIFICATIONS

  • Bachelor’s degree in any Information Technology or Engineering course
  • Demonstrated ability in supporting critical production services and improving operations through automations and process enhancements.
  • Subject Matter Expert on the following subjects: Platform as a Service support, Distributed Systems and Microservices particularly on the fields of hosted services such as Content Delivery, Messaging, API gateways and proxies.
  • Strong communication skills, both written and verbal.
  • At least 5 years’ experience working with the following:
  • Linux Administration: RHEL, CentOS, or other Unix-like systems.
  • Server and Infrastructure Troubleshooting: Hardware and OS Configuration
  • Logging and monitoring: Splunk, Grafana, Prometheus
  • Container Orchestration: Docker, Kubernetes
  • Incident management: PagerDuty, ServiceNow
  • Data serialization formats and structured systems: APIs, JSON, YAML.
  • At least 3 years’ experience working with the following:
  • Distributed Application Support: Experience in supporting several applications running in microservices implementation.
  • Version Management and CICD: Git, Spinnaker
  • Infrastructure Config Management: Puppet, Ansible, Salt
Responsibilities
  • Serve as subject matter expert for distributed application systems that reside in hybrid cloud platforms.
  • Champion and drive operational improvements using insights from metrics and customer feedback.
  • Lead incident response and post-incident reviews.
  • Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
  • Manage and maintain enterprise applications and cloud-based systems using tools and frameworks designed for secure and scalable in-house deployments.
  • Monitor and optimize the health and performance of applications and platforms.
  • Debug problems reported by partners and end-users using in-depth log analysis and stack traces.
  • Create comprehensive documentation for operational procedures and environment setup.
  • Eliminate operational toil through automation or process improvements.
  • Be a member of a 24x7 shifting rotation.
Loading...