Site Reliability Engineer (SRE) at Toyota Connected Europe

London EC1M, England, United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

02 Jul, 25

Salary

0.0

Posted On

02 Apr, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Good communication skills

Industry

Information Technology/IT

Description

ANOTHER THING WE’D LIKE TO MENTION…

There’s been a lot of research showing how marginalised groups of people may not apply for jobs unless they meet all of the requirements. We also know that many great people come from wide and wonderful backgrounds with different experiences - We value people who bring unique perspectives and add new knowledge to our team!
You might not feel like you “tick all the boxes” but we sincerely hope you’ll apply anyway because you could be exactly what we we’re looking for! We want to build someone up

Responsibilities

ABOUT THE ROLE:

Cloud Engineering plays a vital role in the ongoing success of Toyota Connected Europe by providing the tools and processes necessary for Toyota Connected Europe to grow and scale globally in a consistent and robust manner.
Cloud Engineering strives to enable ever greater levels of agility, effectiveness, and innovation within the larger Toyota Connected Europe organization, partnering with the various product development teams to ensure broad technological and project goal alignment.
As a Site Reliability Engineer in this team, you will create, maintain, support and improve complex cloud operations for the largest automotive company in the world.
This person will apply their knowledge in a highly-energised, fast-paced, and innovative environment, empowering Toyota Connected Europe teams to create the next generation of connected vehicle solutions.
This is a progressive and collaborative environment; therefore, not only the skillset, but the passion for the above must be a fit. We mean it, we want to develop and grow someone into a superstar, we just need the potential!

WHAT YOU WILL DO:

Ensure the availability, performance, reliability, and scalability of applications and services.
Work collaboratively with Software Engineering to define infrastructure and deployment requirements.
Proactively identify and resolve production issues and develop tools, scripts, and frameworks to build and maintain operational efficiency.
Conduct routine application performance reviews and incident post-mortems to prevent future outages and improve system reliability.
Participate in on-call rotations, demonstrating problem-solving and decision- making abilities to ensure quick resolution of problems.
Develop and maintain monitoring tools and alerting systems.
Improve CI/CD pipelines, automate routine tasks, and enhance system security.Document all procedures and policies related to the managed systems.
What are we looking for?
Bachelor’s degree in computer science, Information Systems, or a related field, or equivalent work experience.
Circa 3 years of experience in a similar role, with demonstrable skills in managing high-traffic websites, applications, or critical services.
Strong knowledge of cloud computing platforms like AWS, GCP, or Azure.
Proficient in modern Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Strong experience with containerisation technologies like Docker, and orchestration systems like Kubernetes.
Solid understanding of continuous integration and continuous deployment (CI/CD) pipelines.
Familiarity with a scripting language like Python, Bash, or Go.
Familiarity with monitoring tools such as Datadog, Prometheus, Grafana, ELK stack, or similar.Strong problem-solving skills, excellent communication skills, and the ability to work independently or in a team.