Site Reliability Engineer, ASE Block Storage

at  Apple

Cupertino, California, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate26 Nov, 2024USD 264200 Annual31 Aug, 2024N/ACapacity Planning,Storage Solutions,Code,Distributed Systems,Provisioning,Data Migration,Backup,Linux,Kubernetes,Disaster Recovery,Storage Systems,MicroservicesNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

SUMMARY

Posted: Jun 24, 2024
Weekly Hours: 40
Role Number:200556709
Apple Cloud infrastructure is vast, and the storage SRE teams of Apple Cloud are building and running the next generation distributed storage systems to support Apple’s most critical services. Operating at our scale, across multiple geographically dispersed data centers, and servicing users with exceptionally large data presents unique challenges. As a storage SRE at Apple, you’ll need to solve these problems using your deep understanding of storage, data analysis, programming, teamwork, and expertise in Linux system internals. Storage SREs at Apple involve themselves across the full infrastructure stack; from tuning the block storage layer to content delivery network traffic management.

DESCRIPTION

We are looking for seasoned software and systems engineers to join the Block Storage SRE team at Apple. The role involves tremendous amount of individual responsibility and influence over the direction the platform, shaping its use by many critical Apple Cloud services for years to come. You are someone with ideas and real passion for software delivered as a service to improve reuse, efficiency, and simplicity. This engineer’s work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features. At Apple Cloud, we run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software development & deployment, logging, and monitoring. You’ll learn these tools and have opportunities to improve them. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.

  • 5+ years of experience in a Site Reliability Engineer or Infrastructure Software Development role.
  • Acute drive to automate manual operations and to improve them with well defined and tested APIs.
  • Awareness of best practices for deployment of storage systems - implication of physical and virtual deployment models to change management, failure domains, hardware lifecycle management, etc.
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
  • Experienced in SRE principles, such as monitoring, alerting, error budgets, fault analysis, and other common concepts in reliability engineering. Skilled at identifying opportunities to reduce manual work through enhancements in code and processes
  • Kubernetes Operator development experience.
  • Familiarity with relational & non-relational databases (such as Cassandra, Postgres, & RocksDB).
  • BS or MS in Computer Science or equivalent industry experience

PREFERRED QUALIFICATIONS

  • Experience in building, operating, and scaling distributed storage systems in a private, public, or hybrid cloud environment.
  • The ability to design, author, review, and release code in one or more high level language (e.g. Go (preferred), Rust, Python, and/or Java, etc.).
  • Good understanding of block, object, and file storage solutions in Linux (such as LVM, XFS, ext4, S3, Ceph, Gluster, NFS).
  • Familiarity with microservices architecture and container orchestration with Kubernetes.
  • Understanding of Linux internals, standard networking protocols, and distributed systems.
  • Experience with provisioning, data migration, backup & recovery, at-scale testing, disaster recovery, and capacity planning.

Responsibilities:

  • 5+ years of experience in a Site Reliability Engineer or Infrastructure Software Development role.
  • Acute drive to automate manual operations and to improve them with well defined and tested APIs.
  • Awareness of best practices for deployment of storage systems - implication of physical and virtual deployment models to change management, failure domains, hardware lifecycle management, etc.
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
  • Experienced in SRE principles, such as monitoring, alerting, error budgets, fault analysis, and other common concepts in reliability engineering. Skilled at identifying opportunities to reduce manual work through enhancements in code and processes
  • Kubernetes Operator development experience.
  • Familiarity with relational & non-relational databases (such as Cassandra, Postgres, & RocksDB).
  • BS or MS in Computer Science or equivalent industry experienc


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

BSc

Computer Science

Proficient

1

Cupertino, CA, USA