Site Reliability Engineer (SRE) - Ads / Monetization Platform at Two95 International Inc
Johor Bahru, Johor, Malaysia -
Full Time


Start Date

Immediate

Expiry Date

17 Jul, 26

Salary

0.0

Posted On

18 Apr, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Site Reliability Engineering, Python, Go, C++, Linux, Unix, Networking, SQL, Automation, Observability, Cloud Platforms, Terraform, Ansible, Docker, Kubernetes, Prometheus

Industry

technology;Information and Media

Description
Role Summary As a Site Reliability Engineer (SRE), you will build and operate highly available, globally distributed advertising/monetization services. You will improve reliability, scalability, and operability through automation, observability, incident management, and sound engineering practices. Key Responsibilities Own reliability across the service lifecycle: design reviews, capacity planning, launch, deployment, operations, and continuous improvement. Build and operate highly available services across multiple regions/data centers; improve resilience, latency, and scalability. Develop automation and tooling to reduce toil (deployment, remediation, runbooks, self-healing) using scripting and software engineering best practices. Define and implement SLOs/SLIs/SLAs; create dashboards and alerting to track service health (availability, latency, errors, saturation). Lead sustainable incident response: triage, mitigation, root-cause analysis (RCA), and blameless postmortems with actionable follow-ups. Collaborate with software engineering, security, and compliance stakeholders to meet data governance and regulatory requirements. Must-have Qualifications 3+ years of experience in SRE, DevOps, systems engineering, or production operations for large-scale services. Strong coding skills in one language: Python or Go or C++ (Java acceptable). Solid Linux/Unix fundamentals: processes, memory/CPU, filesystems, permissions, and troubleshooting. Networking fundamentals in cloud environments: TCP/IP, DNS, HTTP/HTTPS, load balancing, basic security concepts. SQL proficiency and experience with data workflows/ETL is a plus for ads/analytics-related systems. Strong communication, ownership mindset, and ability to work effectively across global teams. Preferred Qualifications Experience supporting advertising, recommendation, or high-traffic consumer internet platforms. Hands-on experience with cloud platforms (AWS/GCP/Azure) and infrastructure-as-code (Terraform/Ansible). Experience with containers and orchestration (Docker, Kubernetes). Observability experience with tools such as Prometheus, Grafana, ELK/Splunk, OpenTelemetry. Experience operating large data systems (streaming, distributed storage/compute) and performance tuning.
Responsibilities
You will build and operate highly available, globally distributed advertising and monetization services while improving reliability and scalability. Responsibilities include defining SLOs, leading incident response, and developing automation to reduce operational toil.
Loading...