Software Development Engineer II, Chaos Engineering at Expedia Group
London EC1V 4EX, , United Kingdom -
Full Time


Start Date

Immediate

Expiry Date

08 Jun, 25

Salary

0.0

Posted On

09 Mar, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Consideration, Computer Science

Industry

Information Technology/IT

Description

Expedia Group brands power global travel for everyone, everywhere. We design cutting-edge tech to make travel smoother and more memorable, and we create groundbreaking solutions for our partners. Our diverse, vibrant, and welcoming community is essential in driving our success.

JOB SUMMARY:

As a Software Development Engineer II, you will be contributing the development and adoption of reliability, scalability and performance of the world’s largest online travel platform at scale within the Reliability Engineering organization.

POSITION OVERVIEW:

We’re looking for a skilled developer to join our Reliability Engineering organization as a Chaos Engineer. In this role, you’ll be responsible for identifying system weaknesses and helping build resilient, fault-tolerant architectures across Expedia.
You’ll collaborate closely with cross-functional teams to enhance fault injection capabilities, design and execute chaos experiments, analyze results, and drive solutions to improve service stability. We’re seeking a self-starter who is highly organized, communicative, and passionate about quality, reliability, and ensuring a seamless experience for travelers.

EXPERIENCE AND QUALIFICATION:

Bachelors or master’s degree in computer science or related technical field; or equivalent related professional experience. Expedia is committed to creating an inclusive work environment with a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, gender, sexual orientation, national origin, disability or age.

Responsibilities
  • Design, implement, and execute chaos experiments to test and improve the resilience of distributed systems.
  • Develop and maintain chaos engineering frameworks, tools, and automation, ensuring scalability and ease of use across teams.
  • Collaborate with cross-functional teams to identify areas for fault injection and embed chaos engineering practices into development workflows.
  • Analyse and interpret experiment results, providing actionable insights to enhance system stability and reliability.
  • Proactively monitor application health and address issues through effective troubleshooting, logging, and resolution of technical blockers.
  • Contribute to technical design, functional analysis, and implementation of mid-to-large-scale reliability projects, following best coding practices.
  • Promote best development methodologies through code reviews, software design discussions, and knowledge-sharing sessions.
  • Stay up to date with industry trends in chaos engineering, cloud architectures, and distributed systems to drive innovation and continuous improvement.
Loading...