Site Reliability Engineer (Junior/Middle) at EPOS
Ho Chi Minh City, , Vietnam -
Full Time


Start Date

Immediate

Expiry Date

15 Mar, 26

Salary

0.0

Posted On

15 Dec, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Kubernetes, Cloud Platforms, Containers, Orchestration Technologies, Infrastructure as Code, Automation Tooling, Monitoring Systems, Logging Systems, Alerting Systems, Microservices Observability, Scripting Languages, CI/CD Pipelines, English Communication

Industry

Financial Services

Description
About Us Established in 2009, Floating Cube Studios - EPOS Vietnam serves as the technical hub of EPOS Singapore, a leading provider of cutting-edge Point-of-Sale (POS) and SaaS solutions. Backed by Ant International — a global leader in digital payments and financial technology — we play a pivotal role in developing innovative, scalable, and user-centric digital solutions that power EPOS products. Our technologies enable thousands of SMEs in Singapore, and with ongoing expansion plans across Asia, to digitize and grow their operations through cost-effective and reliable platforms. At Floating Cube Studios - EPOS Vietnam, we cultivate a collaborative and dynamic culture driven by innovation and a passion for transforming businesses through technology. Join us to shape the future of digital solutions across Asia and beyond! Key Contributions Design and operate scalable, resilient, and distributed systems across on-premises and/or cloud environments. Manage resource provisioning, utilization, capacity planning, and cost optimization. Build and maintain observability systems (metrics, logs, traces) to ensure high availability and fast incident detection. Analyze performance issues and drive improvements in reliability, latency, and system efficiency Partner with development teams to enhance stability, performance, and deployment quality. Develop automation tools, pipelines, and APIs to streamline operational workflows. Containerize and automate applications/services to improve consistency and deployment speed Implement security, compliance, and configuration-management standards across environments. Build and maintain CI/CD pipelines to ensure reliable and repeatable software releases. Perform incident response, troubleshooting, root-cause analysis, and maintain clear operational documentation. Bachelor’s Degree in Computer Science / Information Technology / Software Development or related fields. Must have: At least 2+ years of hands-on experience in a relevant SRE role. Solid experience running Kubernetes and cloud platforms (AWS/GCP/Azure) at scale. Strong proficiency with containers and orchestration technologies (Docker, Kubernetes). Skilled in Infrastructure as Code and automation tooling (Terraform, Ansible). Strong hands-on experience building and maintaining monitoring, logging, and alerting systems (Prometheus, Grafana, ELK, or similar). Hands-on expertise with microservices observability, log analysis, and monitoring tools (ELK Stack, Prometheus, Grafana, ClickHouse). Practical skills in at least one scripting language (Python, Bash, or Go) for automation and tooling. Familiarity with CI/CD pipelines and the ability to maintain/integrate tools such as GitLab CI, Jenkins, or GitHub Actions. Good/Fluent English communication is mandatory. We are a multinational, product-driven company specializing in proprietary POS solutions — developing in-house and delivering directly to our worldwide customers. Benefits Recognition & Rewards: Performance Bonus (subject to the company’s business results and the employee’s performance evaluation) Biannual Performance Review and Salary Adjustment Comprehensive Insurance Coverage: Full government public insurance contributions based on gross salary Premium health insurance Annual health check Clear career development and growth structure; Training sessions and Learning workshops 14 days of annual leave and one additional day of leave for every year of service Laptop/MacBook and top-notch facilities are provided based on each role Agile/Scrum-based internal workflows for efficient and collaborative development Company trips, parties and regular team-building activities; Weekly happy hour, coffee, snacks, and board games Overseas travel opportunities based on the individual performance and policies for each evaluation period Working Environment & Culture International Workplace: English-speaking environment Positive and Open-Minded Culture: Engineers are encouraged to propose innovative solutions that enhance productivity and code quality 1-on-1 Mentorship: Monthly coffee sessions with managers offer personalized feedback, goal setting, and career development opportunities Flexible Working Hours: Promote work-life balance and individual productivity
Responsibilities
The Site Reliability Engineer will design and operate scalable systems, manage resource provisioning, and build observability systems to ensure high availability. They will also partner with development teams to enhance system performance and develop automation tools.
Loading...