Compute Site Reliability Engineer (SRE) - Kubernetes at Apple

Seattle, Washington, United States -

Full Time

Start Date

Immediate

Expiry Date

08 Jan, 26

Salary

0.0

Posted On

10 Oct, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Kubernetes, Linux Systems Administration, Scripting Languages, Monitoring Tools, CI/CD Pipelines, DevOps Practices, Containerization, Problem-Solving, Organizational Skills, Communication Skills, Automation, Infrastructure as Code, Java, Go, Cloud Services, Disaster Recovery

Industry

Computers and Electronics Manufacturing

Description

Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a site reliability engineer to help support and scale cloud services for thousands of development and operations engineers. This is a hands-on role to maintain and enhance SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications. DESCRIPTION As a Compute Site Reliability Engineer, you will be responsible for maintaining, monitoring, and improving the reliability, scalability, and performance of our Kubernetes-based infrastructure. You’ll work closely with senior SREs, developers, and other engineers to ensure high availability and optimize our containerized applications. This is a fantastic opportunity for someone eager to grow their expertise in Kubernetes and cloud-native technologies. As an SRE at Apple, you will: * Operate, monitor, and triage all aspects of our production and non-production environments. * Design, build and implement innovative solutions for previous, present and future issues. * Prepare alert handling procedures, runbooks, and collaborate with other SRE teams. * Participate in on-call rotations to troubleshoot and resolve production issues, minimizing downtime. * Automate deployment and orchestration of services into the cloud environment as well as other routine processes. * Actively participate in capacity planning, scale testing, and disaster recovery exercises. MINIMUM QUALIFICATIONS Bachelor's Degree in Computer Science, an engineering-related field, or equivalent related experience. Basic understanding of Kubernetes architecture, including Pods, Deployments, Services, and ConfigMaps. Familiarity with Linux systems administration and command-line tools. Experience with scripting languages like Bash, Python, or Go. Knowledge of monitoring tools such as Prometheus, Grafana, or similar. Exposure to CI/CD pipelines and DevOps practices. Awareness of containerization. Strong problem-solving skills and a willingness to learn new technologies. Outstanding organizational and communications skills PREFERRED QUALIFICATIONS Strong verbal and written communication skills Automation advocate - you truly believe in removing operational load via software. Familiarity with Infrastructure as Code (IaC) tools like Puppet A strong sense of ownership. At the same time, you're a great teammate who communicates clearly and transparently - Self-motivated, inquisitive, and always looking to learn more. Experience managing, scaling, and troubleshooting Java and Go applications CNCF Kubernetes Administration certification

Responsibilities

As a Compute Site Reliability Engineer, you will maintain, monitor, and improve the reliability and performance of Kubernetes-based infrastructure. You will also automate deployment and participate in capacity planning and disaster recovery exercises.