Site Reliability Engineer - Splunk Cloud Services at Splunk

Remote, Oregon, USA -

Full Time

Start Date

Immediate

Expiry Date

12 Jun, 25

Salary

0.0

Posted On

12 Mar, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Algorithms, Kubernetes, Data Structures, Architecture, Azure, Security, Traffic Management, Network Architecture, Infrastructure, It, Disaster Recovery, Service Development, Root Cause, Splunk, Distributed Systems

Industry

Information Technology/IT

Description

Splunk, a Cisco company, is building a safer and more resilient digital world with an end-to-end full stack platform made for a hybrid, multi-cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Our customers love our technology, but it’s our caring employees that make Splunk stand out as an amazing career destination. No matter where in the world or what level of the organization, we approach our work with kindness. So bring your work experience, problem-solving skills and talent, of course, but also bring your joy, your passion and all the things that make you, you. Come help organizations be their best, while you reach new heights with a team that has your back.

QUALIFICATIONS:

You have experience or an interest in working with regulated computing environments such as FISMA and/or FedRAMP and are enthusiastic about doing it better.
You have worked with Kubernetes, EKS, GKE or AKS and the associated ecosystems. Kubernetes certifications or an interest in obtaining these certifications are a plus, such as those from the Cloud Native Computing Foundation; Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS).
You enjoy building and running distributed systems at scale in production. You understand the challenges and trade-offs to be made when building and deploying systems to production.
You have a good understanding of linux systems (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs, etc)
Experience with at least one programming language, preferably golang (go) or python. Knowledge of working with and automating linux systems tasks using this language is required, including working with configuration files and system services. Knowledge of common data structures and algorithms, as well as their performance characteristics is required.
Knowledge of standard methodologies related to security, performance, and disaster recovery.
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
You have assembled Open Source components into cohesive services.
You are interested in working hard to make the users of Splunk’s products happier every day.
You must be a US Citizen working on US soil to be considered.
Requires a minimum of 5 years of related experience with a Bachelor’s degree; or 3 years and a Master’s degree

PREFERRED SKILLS:

Experience monitoring cloud environments with Splunk.
Experience with development and deployment in a hosted cloud environment, preferably AWS, Azure or GCP. Cloud certifications are a plus or an interest in obtaining these certifications, such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Google Associate Cloud Engineer (ACE).
Experience with large scale distributed cloud service development, infrastructure, traffic management and architecture.
Experience with distributed architectures/systems with optimized and scalable software that operates on a large number of nodes.

Responsibilities

ROLE SUMMARY:

Splunk’s Cloud Services group is looking for a Site Reliability Engineer to help lead, design and build the next generation of our large scale cloud offering. You will be working on core services and applications that form the primitives for our current and future cloud service offerings. Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations of SRE, observability, Chaos Engineering and DevOps. This role is highly visible and impactful to the organization and will help shape Splunk’s Engineering culture for years to come. Your job, in a nutshell, is to make every team around you better… including your own!

WHAT YOU’LL GET TO DO

Own Splunk Cloud in FedRAMP environments.
Work across the organization to deliver quality products that delight Splunk’s passionate users.
Work with teams of tight-knit engineers who are building a state-of-the-art, cloud-based environment for massive-scale data processing.