Site Reliability Engineer – Telco Network Applications & 5G
Position Summary:
We are seeking a Site Reliability Engineer (SRE) with strong Telco domain expertise to support the onboarding, integration, and ongoing reliability of 5G network applications on our cloud-native infrastructure—preferably built on Red Hat OpenShift. This role combines deep knowledge of 5G architecture with cloud-native principles, ensuring both Day 1 readiness and Day 2 operational excellence of critical Telco applications.
As a technical leader, you will work closely with development, platform engineering, and operations teams to build scalable, resilient, and observable 5G application environments across hybrid or private cloud platforms.
Key Responsibilities:
Day 1 – Application Onboarding & Platform Readiness:
- Partner with Telco vendors and platform teams to lead the onboarding of 5G network functions (e.g., AMF, SMF, UPF, gNB, NRF) onto OpenShift/Kubernetes-based platforms.
- Define and implement deployment pipelines, automation scripts, and readiness validation for CNF/VNF applications.
- Ensure applications meet availability, performance, and configuration standards in a CaaS environment.
- Support Helm chart customizations, manifest tuning, and initial network service definitions for cloud-native workloads.
Day 2 – Reliability, Monitoring, and Incident Response:
- Design and implement observability and alerting strategies using Prometheus, Grafana, ELK, Fluentd, OpenTelemetry, or similar tooling.
- Own reliability metrics (SLOs/SLIs), dashboards, and health checks for production 5G applications.
- Lead postmortem reviews and continuous improvement initiatives based on incident analysis and platform telemetry.
- Create runbooks, auto-remediation scripts, and tools to accelerate incident resolution and reduce MTTR.
Technical Leadership & Collaboration:
- Act as the SRE lead for 5G application performance and uptime, influencing cloud-native and Telco cloud strategy.
- Mentor and guide junior SREs, helping them develop deep expertise in Telco observability and incident response.
- Collaborate with application developers, network engineers, and operations teams to resolve issues and optimize deployment workflows.
- Drive automation, scalability, and fault-tolerance improvements across the full lifecycle of Telco applications.
Required Qualifications:
- Bachelor’s or Master’s in Computer Science, Telecommunications, or related technical field.
- 7+ years in network operations, cloud infrastructure, or SRE roles with 2+ years working on Telco/5G network functions.
- Hands-on experience with Red Hat OpenShift, Kubernetes, Helm, Operators, and containerized CNF onboarding.
- Proficient in building monitoring, logging, and alerting systems for Telco workloads.
- Strong scripting skills (e.g., Python, Bash, Ansible) and experience with CI/CD pipelines.
- Solid understanding of 5G architecture (e.g., AMF, SMF, UPF, NSSF, PCF) and their operational requirements.
Preferred Qualifications:
- Certifications: Red Hat OpenShift, CKA, CKAD, or relevant cloud/SRE certifications.
- Experience working with vendors like Nokia, Ericsson, Mavenir, or similar Telco ecosystem partners.
- Familiarity with network service orchestration, SDN, NFV, and cloud-native networking models (Multus, SR-IOV).
- Background in managing highly available, distributed systems with Telco-grade SLAs.
What We Offer:
- Key leadership role in the evolution of 5G networks into fully cloud-native, observable platforms.
- Exposure to real-world 5G and Telco transformation programs with global scale.
- Collaborative and agile team environment with a focus on technical excellence.
- Competitive compensation, benefits, and opportunities for certification and training.
Job Type: Contract
Pay: $45.00 - $50.00 per hour
Work Location: In perso