Site Reliability Engineer at AKAMAI TECHNOLOGIES INC

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

02 Jun, 26

Salary

0.0

Posted On

04 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Linux/Unix Systems, Networking Fundamentals, Distributed Systems, Microservices Architectures, Security, Compliance, Docker, Kubernetes, Java, Python, GoLang, PostGres, MySQL, AWS, Azure, Google Cloud

Industry

technology;Information and Internet

Description

Join our Global Services Engineering Team The Global Services Engineering Team develops and supports applications that empowers Global Services teams to focus on what they do best and deliver value to our customer and make data driven decisions, provide insights to business leaders to have better vision into the state of the business, and provide analytical Intelligence to proactively handle customer situations and additional business opportunities. Partner with the best As a Site Reliability Engineer, you are responsible for improving system availability, reliability, scalability, and performance. You will collaborate closely with cross-functional teams to define and track key performance indicators, enhance monitoring and alerting capabilities, and proactively investigate and resolve complex performance and reliability issues to ensure optimal system operation As a Site Reliability Engineer at Akamai, you will be responsible for: * Design, build, and maintain highly reliable, scalable, and performant systems * Define, measure, and monitor key service reliability metrics and SLAs * Develop and improve monitoring, alerting, and incident response processes * Proactively identify performance bottlenecks and reliability risks, and drive long-term solutions * Investigate, troubleshoot, and resolve complex production issues across distributed systems * Automate operational tasks to reduce manual effort and improve system efficiency * Participate in on-call rotations and lead incident management and post-incident reviews * Collaborate with engineering teams to influence system architecture and reliability best practices * Continuously improve deployment, release, and rollback processes to minimize risk and downtime * Enhance & Maintain CI/CD pipelines and other tooling as required Do what you love To be successful in this role you will: * At least 3 years in an SRE role * Strong understanding of Linux/Unix systems and networking fundamentals * Experience with distributed systems and microservices architectures * Strong understanding of security and compliance considerations in production environments * Strong knowledge and experience in orchestration and containerization technologies such as Docker & Kubernetes * Having good working knowledge of Java / Python / GoLang and follows common development practices and methodologies * Have good working knowledge of databases especially PostGres and MySQL * Good hands-on experience with cloud platforms, such as AWS, Azure, or Google Cloud * Hands-on experience with monitoring and observability applications such as Prometheus, Grafana, ELK Work in a way that works for you We recognize that everyone is different and that the way in which people want to work and deliver at their best is different for everyone too. In this role, we can offer the following flexible working patterns: We are happy to discuss flexible working options in this role, please discuss your requirements with the recruiter when you apply. Working with us At Akamai, we’re curious, innovative, collaborative and tenacious. We celebrate diversity of thought, and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you’ll thrive here. Working for you At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life: * Your health * Your finances * Your family * Your time at work * Your time pursuing other endeavors Our benefit plan options are designed to meet your individual needs and budget, both today and in the future. About us Innovating on a global scale, we deliver our customers a fast, smart and secure intelligent edge platform. Working against a backdrop of digital collaboration, our highly skilled teams build progressive solutions that have the scope to transform entertainment, business, and life in ways that we have yet to imagine.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

The Site Reliability Engineer is responsible for improving system availability, reliability, scalability, and performance by collaborating with cross-functional teams to define metrics, enhance monitoring, and resolve complex issues. Key duties include designing and maintaining highly reliable systems, developing alerting processes, investigating production issues, and automating operational tasks.