Site Reliability Engineer (SRE) - Ads / Monetization Platform at Two95 International Inc
Kuala Lumpur, Kuala Lumpur, Malaysia -
Full Time


Start Date

Immediate

Expiry Date

17 Jul, 26

Salary

0.0

Posted On

18 Apr, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Site Reliability Engineering, Python, Go, C++, Linux, Unix, Networking, TCP/IP, DNS, Load Balancing, SQL, Automation, Observability, Incident Management, Cloud Platforms, Kubernetes

Industry

technology;Information and Media

Description
Role Summary As a Site Reliability Engineer (SRE), you will build and operate highly available, globally distributed advertising/monetization services. You will improve reliability, scalability, and operability through automation, observability, incident management, and sound engineering practices. Key Responsibilities Own reliability across the service lifecycle: design reviews, capacity planning, launch, deployment, operations, and continuous improvement. Build and operate highly available services across multiple regions/data centers; improve resilience, latency, and scalability. Develop automation and tooling to reduce toil (deployment, remediation, runbooks, self-healing) using scripting and software engineering best practices. Define and implement SLOs/SLIs/SLAs; create dashboards and alerting to track service health (availability, latency, errors, saturation). Lead sustainable incident response: triage, mitigation, root-cause analysis (RCA), and blameless postmortems with actionable follow-ups. Collaborate with software engineering, security, and compliance stakeholders to meet data governance and regulatory requirements. Must-have Qualifications 3+ years of experience in SRE, DevOps, systems engineering, or production operations for large-scale services. Strong coding skills in one language: Python or Go or C++ (Java acceptable). Solid Linux/Unix fundamentals: processes, memory/CPU, filesystems, permissions, and troubleshooting. Networking fundamentals in cloud environments: TCP/IP, DNS, HTTP/HTTPS, load balancing, basic security concepts. SQL proficiency and experience with data workflows/ETL is a plus for ads/analytics-related systems. Strong communication, ownership mindset, and ability to work effectively across global teams. Preferred Qualifications Experience supporting advertising, recommendation, or high-traffic consumer internet platforms. Hands-on experience with cloud platforms (AWS/GCP/Azure) and infrastructure-as-code (Terraform/Ansible). Experience with containers and orchestration (Docker, Kubernetes). Observability experience with tools such as Prometheus, Grafana, ELK/Splunk, OpenTelemetry. Experience operating large data systems (streaming, distributed storage/compute) and performance tuning.
Responsibilities
You will build and operate highly available, globally distributed advertising and monetization services while improving reliability and scalability. Responsibilities include defining SLOs/SLIs, leading incident response, and developing automation to reduce operational toil.
Loading...