OpenShift Platform Engineer (GenAI) at Unison Group
Singapore, , Singapore -
Full Time


Start Date

Immediate

Expiry Date

07 Jul, 26

Salary

0.0

Posted On

08 Apr, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

OpenShift, Kubernetes, Docker, Infrastructure Engineering, Capacity Planning, Disaster Recovery, Oracle Database, AWS, Terraform, CloudFormation, Security, Vulnerability Management, CI/CD, Observability, Hybrid Cloud, GenAI

Industry

Business Consulting and Services

Description
Overview We are seeking an experienced Senior GenAI Platform Engineer / OpenShift SME to lead and manage enterprise-scale infrastructure supporting GenAI applications. This role focuses on OpenShift platform engineering, hybrid cloud environments, disaster recovery (DR), and security for highly scalable and resilient AI platforms. Requirements 10+ years of experience in infrastructure engineering / platform engineering. Strong expertise in managing OpenShift (OCP) in enterprise production environments. Hands-on experience in infrastructure sizing, capacity planning, and performance tuning for AI workloads. Experience supporting Oracle Database from an infrastructure/application standpoint. Strong knowledge of certificate management, secrets handling, and key management. Experience with CI/CD pipelines and infrastructure automation. Solid background in security, vulnerability management, and compliance. Proven experience in designing and implementing Disaster Recovery (DR) solutions. Experience with AWS cloud services and hybrid cloud environments. Strong experience with Docker and Kubernetes. Excellent coordination and stakeholder management skills across cross-functional teams. Key Responsibilities Lead and manage end-to-end infrastructure for enterprise GenAI applications hosted on OpenShift (OCP). Own capacity planning, sizing, and performance optimization of OpenShift clusters and related infrastructure components. Manage and optimize infrastructure including Oracle DB, Redis, Elastic DB, PostgreSQL, Dell ECS storage, and Linux environments (RedHat/Ubuntu). Design and implement Disaster Recovery (DR) strategies ensuring high availability, resilience, and business continuity. Lead E2E DR setup including replication, failover, testing, and documentation in collaboration with infra and network teams. Manage certificate lifecycle (TLS/SSL), secrets, and key management across platforms. Implement vulnerability management, patching, and remediation across Kubernetes, containers, and infrastructure. Support and coordinate penetration testing and address security findings. Work with AWS services (EC2, VPC, CloudWatch, Lambda, Bedrock) in hybrid cloud environments. Build and maintain infrastructure automation using Terraform and CloudFormation. Manage observability using monitoring, logging, alerting tools, and Control-M schedulers. Collaborate with DevOps, Security, and Development teams for platform reliability and performance. (Bonus) Work with or support open-weight LLM models for AI/ML use cases.
Responsibilities
Lead and manage end-to-end infrastructure for enterprise GenAI applications hosted on OpenShift. Design and implement disaster recovery strategies while optimizing performance for databases and cloud-based AI workloads.
Loading...