Senior Site Reliability Engineer II Lead at AKAMAI TECHNOLOGIES INC

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

21 Apr, 26

Salary

0.0

Posted On

21 Jan, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Technical Leadership, Mentorship, Collaboration, Automation, Python, Golang, Infrastructure Automation, Linux Administration, Docker, Kubernetes, Observability Tools, SLOs, System Reliability, Incident Resolution, Cloud Interfaces, APIs

Industry

technology;Information and Internet

Description

Do you like collaborating across teams to solve complex problems? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly-skilled Site Reliability team! Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We create solutions that manage our Compute platform, focusing on cloud interfaces - Compute Portals and APIs. We do this while maintaining Akamai's mission to make life better for billions of people, billions of times a day. Partner with the best In this role, you'll ensure the operation and uptime of our Compute services and infrastructure. You'll supervise and maintain our critical infrastructure. You'll collaborate with cross-functional teams to create tooling and software that monitors and improves the reliability of our systems. You'll work with various technologies as we release brand new applications and modernize our existing tooling. As a Senior II Site Reliability Engineer Lead, you will be responsible for: Providing technical leadership, mentorship, and support to SRE and project teams, fostering collaboration and motivation Defining requirements during the product lifecycle to influence design, standards, and operational readiness. Partnering with engineering, operations, and support teams to ensure availability, reliability, scalability, and usability of platforms. Troubleshooting and resolve complex system issues through proactive investigation, automation, and systems programming Developing and enhancing automation tools to streamline daily operations, reduce manual effort (toil), and improve performance. Managing and improving Compute identity & access management platforms to accelerate issue detection and remediation. Participating in on-call rotations, leading incident resolution, and contributing to robust, stable code delivery alongside other teams. Do what you love To be successful in this role you will: Have a Bachelor's degree in Computer Science or equivalent, with relevant hands-on experience in infrastructure and software architecture at scale. Have proven track record of ownership and delivering results, including leading engineering teams (5+ members) and aligning with strategic business goals. Be proficient in reducing manual toil through automation using Python and/or Golang, with working knowledge of scripting languages. Be experienced in infrastructure automation tools like SaltStack, Terraform, and Ansible, and CI/CD tools such as Jenkins or CloudBees. Have expertise in Linux administration, Docker-based environments, and Kubernetes; skilled in optimizing performance using tools like Redis. Be familiar with observability tools, Prometheus, Grafana, Loki, Sentry, NewRelic, and web proxies such as Nginx/Envoy/HAProxy Have understanding of SLOs and system reliability principles. Work in a way that works for you FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply. Learn what makes Akamai a great place to work Connect with us on social and see what life at Akamai is like! We power and protect life online, by solving the toughest challenges, together. At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here. Working for you At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life: Your health Your finances Your family Your time at work Your time pursuing other endeavours Our benefit plan options are designed to meet your individual needs and budget, both today and in the future. About us Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you! #LI-Remote

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

You will ensure the operation and uptime of Compute services and infrastructure while supervising and maintaining critical infrastructure. Additionally, you will collaborate with cross-functional teams to create tooling and software that monitors and improves system reliability.