Senior Site Reliability Engineer - Infrastructure at Opswerks
Mandaluyong, Mandaluyong, Philippines -
Full Time


Start Date

Immediate

Expiry Date

28 Apr, 25

Salary

0.0

Posted On

28 Jan, 25

Experience

5 year(s) or above

Remote Job

No

Telecommute

No

Sponsor Visa

No

Skills

Load Balancing, Version Management, Jenkins, Rhel, Reverse Proxy, Infrastructure, Automation, Centos, Ubuntu, Ansible, Git, Scheduling, Security, Puppet, Firewall, Ssl Certificates, Netscaler, Operations

Industry

Information Technology/IT

Description

YOUR QUALIFICATIONS

  • Bachelor’s degree in any Information Technology or Engineering course
  • Demonstrated ability in supporting critical production services and improving operations through automations and process enhancements.
  • At least 5 years of experience working with any technologies in the following domains:
  • Linux Systems Administration: RHEL, CentOS, Ubuntu or other *Nix systems
  • Container Orchestration and Scheduling: Docker, Kubernetes or similar
  • Infrastructure-as-a-Code: Puppet, Ansible, Chef, Terraform or similar
  • Logging and monitoring: Prometheus, Grafana, Splunk, MRTG
  • Version Management and CI/CD: Git, Jenkins or similar
  • Networking: Core Networking Concepts, Load Balancing (NetScaler, F5 or similar), Reverse Proxy (Nginx), Software Defined Networks (SDN), DHCP, DNS
  • Security: SSL certificates, Firewall, ACLs
  • Has strong experience managing Datacenter lifecycle (build, operate, decommission) practices.
  • Experience in leading infrastructure related projects to success.
  • Demonstrated ability in supporting critical production services and improving operations through automation and enhancements.
  • Solid understanding of distributed computing principles, platform operations, and best practices in Data Engineering and DevOps workflows.
Responsibilities
  • Serve as subject matter expert for infrastructure operations at scale by sharing knowledge amongst peers, documenting best practices, and performing root cause analysis or recurring/high impacting incidents.
  • Lead incident response, triaging, and customer communications during system outages and production issues.
  • Lead and run periodic sync-up meetings with our customers to discuss updates, get clarity, and solicit feedback.
  • Gather and analyze operations metrics regularly to make informed decisions in driving operational and process improvements.
  • Develop or contribute to existing tools and automated solutions to improve efficiency in operations.
  • Create comprehensive documentation for operational procedures and environment setup.
  • Eliminate operational toil through automation or process improvements.
  • Be a member of a 24x7 shifting rotation.
Loading...