Principal AI Site Reliability Engineer at Oracle Risk Management Services

, , United States -

Full Time

Start Date

Immediate

Expiry Date

05 Sep, 26

Salary

0.0

Posted On

08 Jun, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Site Reliability Engineering, Cloud Infrastructure, Kubernetes, Terraform, Prometheus, Grafana, Python, Java, Go, Docker, CI/CD, Distributed Systems, Data Warehousing, ETL Frameworks, Incident Response, Root Cause Analysis

Industry

IT Services and IT Consulting

Description

As a Principal Site Reliability Engineer, you will play a pivotal role in building and operating the Oracle HealthPatient Portal. In this role, you will design, build, and operate highly reliable, scalable infrastructure that supports Commercial and Federal customers. You will also contribute to the next evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. You will work within a globally distributed team to deliver robust solutions that handle massive load by the end users with precision and performance, while continuously improving system reliability and operational excellence. U.S. citizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire. Required Skills Infrastructure & Reliability Experience building and operating high-availability, fault-tolerant systems Strong understanding of distributed systems, performance monitoring, and resiliency patterns Experience with incident response, root-cause analysis, and production troubleshooting Cloud Ecosystems Experience with one or more cloud environments OCI, AWS/Azure DevOps/SRE Practices Advanced competency in CI/CD pipelines (Jenkins, Kubernetes) Infrastructure as Code (Terraform) Observability tools (Prometheus, Grafana) Strong focus on automation-first operations Data Technologies • Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake) • Experience with ETL frameworks and large-scale data processing • Understanding of columnar storage systems Programming & Tools Proficiency in Python, Java, or Go Experience with Docker, Kubernetes, and shell scripting Problem-Solving Strong troubleshooting skills with ability to perform root-cause analysis Experience resolving complex production issues in distributed systems Operational Excellence Apply DevOps/SRE practices to automate deployments and operations Enhance observability using Prometheus/Grafana and AI-driven insights Incident Response Participate in on-call rotations Implement preventative and automated remediation solutions Collaboration Work closely with engineers to execute technical roadmaps Contribute to code reviews and infrastructure improvements What You Bring 7+ years of software engineering, cloud infrastructure, SRE, or DevOps experience Proven ownership of production system reliability in cloud environments Core Expertise Cloud infrastructure design and automation Distributed systems and performance optimization Data warehousing and ETL frameworks Technical Skills Terraform, Docker, Kubernetes Observability stacks (Prometheus, Grafana) Python, Java, or Go Additional Strengths Strong problem-solving mindset with a focus on automation and scalability Experience improving system reliability through intelligent automation Preferred Qualifications Experience in healthcare or regulated environments (HIPAA, compliance frameworks) Experience working in environments requiring security clearance Experience building self-healing or autonomous infrastructure systems Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives. True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com [accommodation-request_mb@oracle.com] or by calling 1-888-404-2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Design, build, and operate highly reliable and scalable infrastructure for the Oracle Health Patient Portal. Advance cloud operations through automation, observability, and AI-assisted reliability practices for commercial and federal customers.