Senior Site Reliability Engineer I - SAP at Booking Holdings

, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

09 May, 26

Salary

0.0

Posted On

09 Feb, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

SAP Application Lifecycle, Source Control Management, Infrastructure Provisioning, Configuration Management, Unix/Linux Systems Internals, Networking, AWS Services, Automation, Incident Management, Observability, Capacity Planning, Technical Guidance, Critical Thinking, Continuous Improvement, Effective Communication, End To End System Ownership

Industry

technology;Information and Internet

Description

Senior Site Reliability Engineer At Booking.com, our mission is to make it easier for everyone to experience the world. And while that world might feel a little farther away right now, we’re busy preparing for when the world is ready to travel once more. With strategic long-term investments into what we believe the future of travel can be, we are opening career opportunities that will have a strong impact on our mission.An SRE has the additional responsibilities of fostering an active and thriving SRE community, leading the community by example of being an advocate of engineering, reliability and security best practices. B.Responsible. Systems Design (SAP) Create and evolve SAP solutions that ensure availability, scalability, latency, and efficiency across Booking.com’s SAP landscape—including core applications (e.g., S/4HANA on HANA), integration tiers (e.g., SAP BTP, CPI/PO), and application interfaces (e.g., OData, RFC, IDoc)—with robust monitoring, capacity planning, and performance tuning baked in. Operate with a product mindset in the SAP domain, balancing current customer outcomes with the future roadmap; design for generalizable patterns across SAP modules and integration layers (e.g., reusable CPI packages, ABAP frameworks, Fiori components, shared observability/tooling) so solutions can be leveraged by other teams. Technical Incident Management Take ownership of how to procedurally deal with emergency situations. SRE should write the playbook on how to deal with a system/service degrading or even a full outage Conduct post-mortem meetings (RFOs) to ensure learnings are applied and shared in case of incidents Take part in our incident management program by participating in on-call rotation. Be available to provide expertise and feedback for our service health program Automation and Toil Reduction Build automation and application orchestration to prevent recurrent problems and to reduce human labor Strategise and implement IT DR for Critical Applications (SAP Prefered) Observability (Monitoring and Alerting Improvements) Implement monitoring and alerting. This might not always be writing the software itself but could also be to create the best practices around how to monitor and alert for a system/service Engage in service capacity planning and demand forecasting, software performance analysis and system tuning Architectural Guidance Maintain holistic knowledge and understanding of a system/service instead of only knowing some fraction of the problem space Create, document and implement Booking Reliability Engineering best practices. Collaborate with other teams and tech POs to support them in building reliable and scalable systems/services for their users and stakeholders Influence the business and tech colleagues to adapt engineering, reliability and security best practices Community Involvement Take an active part in educating and skilling up members of our engineering community B.Skilled. Bachelor or Master degree Around 10+ years of experience in a similar role Technical knowledge and skills SAP Application Lifecycle: Oversee the full lifecycle of SAP applications: requirements, design, build, test, deployment, and operations. Expertise in source control management such as Git, Bitbucket & Infrastructure provisioning with Terraform. Solid hands-on experience with experience with configuration management tools ( Ansible & Puppet) Deep understanding of Unix/Linux systems internals and networking; this includes topics like: kernel, shell and client-server protocols Proficiency in Unix/Linux system administration (Redhat/CentOS) Networking: significant knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing) Extensive on design, configuration and implementation for a system/service in a large scale production environment (systems engineering and architectural skills) Expertise in various AWS services & their use cases. (EC2, Network, Lambda, IAM and more) Eagerness to keep up with latest developments in technology Connection with the worldwide SRE community Exhibit the following behaviours: be curious; be data driven; have a systematic problem solving approach; constantly aiming to improve systems/services Architectural Guidance Advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholder Critical Thinking Find solutions to difficult or complex issues by applying different skills and techniques like analytical thinking, lateral thinking, and logical reasoning Constructively improve existing ideas, plans and solutions by reviewing them in a critical yet constructive manner, initiating concrete improvements and articulating their rationale Continuous Quality and Process Improvement Identify opportunities for process, system and/or structural improvements, by applying an understanding of process flows and the methods that can be used to boost effectiveness and efficiency End to End System Ownership Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated and guide more junior members of the team in this topic. Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs and guide more junior members of the team in this topic. Reduce risk and obtain customer feedback by using continuous delivery and experimentation frameworks and guide more junior members of the team in this topic. Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic. Effective Communication Deliver clear, well-structured, and meaningful information to a target audience by using suitable communication mediums and language tailored to the audience Achieve mutually agreeable solutions by staying adaptable, communicating ideas in clear coherent language and practising active listening. Pre-Employment Screening If your application is successful, your personal data may be used for a pre-employment screening check by a third party as permitted by applicable law. Depending on the vacancy and applicable law, a pre-employment screening may include employment history, education and other information (such as media information) that may be necessary for determining your qualifications and suitability for the position.

Responsibilities

The Senior Site Reliability Engineer is responsible for designing and evolving SAP solutions to ensure system availability and efficiency, as well as managing technical incidents and automation efforts. They will also engage in community involvement and educate team members on best practices.