Site Reliability Engineer

at  Dunelm

Leicester LE7, England, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate16 Aug, 2024Not Specified17 May, 2024N/AZsh,Google Cloud Platform,Aws,Amazon Web Services,Python,Teamwork,Typescript,Gitlab,Rum,Programming Languages,Software,Telemetry,Overcome Obstacles,Integration,Technology,Learning,DockerNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

OVERVIEW

This role can be based out of our London or Leicester offices but will be hybrid (i.e. work from home and office).

ABOUT US

Home. There’s no place like it. And there’s no feeling like helping people create the joy of feeling truly at home.
At Dunelm, that’s what we do. We’re the UK’s number one choice for homewares because we make home life lovelier for our customers. And the caring and supportive culture we’ve created makes this a place you’ll feel right at home too.
You might not think it, but remaining the first choice for savvy homeware-shoppers involves some pretty advanced tech. We’ve recently made our whole company serverless, making us the largest user of AWS Lambdas in Europe (2nd in the world), and we’re bringing more and more tech into our stores. But that’s just the beginning.

SITE RELIABILITY ENGINEERING

We’re searching for an Engineer to join our Site Reliability Engineering (SRE) team. The team is agile, data-driven, and motivated, comprising software and systems engineers. We’re dedicated to creating meaningful observability and monitoring solutions, while automating manual tasks, ensuring product quality and forging operational excellence. Our blame-free DevOps culture and collaboration is at the core of our approach.

The interview process for this role is really very simple:

  • First, we will have a roughly 45-minute video call where we will introduce Dunelm and the SRE function and provide more details about the role and expectations. We will also get to know you, your experience and what you can bring to this role, plus what your ambitions are for developing within the role.
  • Then, there will be a 90-minute technical discussion video call with other members of the SRE team. The questions will be open-ended, scenario-based and are designed to allow you to showcase your SRE knowledge, skills and adaptability.

We want everyone to be as comfortable as possible, so if you need any adjustments within the interview process, please let us know as soon as possible.
Dunelm is committed to becoming a fully inclusive business that is representative of our customers and locations. We recognise the value in diversity and welcome applications from all candidates regardless of sex, age, race, religion, ability, gender or sexual identity, socio-economic background or education. We are committed to making Dunelm a place where everyone can enjoy a successful career and have systems in place to support all our staff. We are seeking applications from candidates who share our values and celebrate diversity in all its forms.

ESSENTIAL SKILLS

  • Amazon Web Services: We run most of our back-end software in AWS Lambdas, with some containerised software (ECS / Fargate) and some cloud server based (EC2). You will need experience and good knowledge of all of these, general AWS networking principles (VPC, security groups etc.), plus other AWS services including, but not limited to: S3, EventBridge, SQS / SNS, RDS and DynamoDB.
  • Development Experience: You will be a solid developer, experienced in building high-quality, testable applications and tools. You will know how to create effective tests (unit and integration) and be familiar and comfortable with different ways of tackling a problem – for example pair and mob programming. Our stack is mainly TypeScript and Python, so experience with both would be distinctly advantageous.
  • Observability Knowledge: The fundamental aspects of observability, including telemetry and RUM are something you can explain in detail, and you know how to use them to effectively observe running software. You also understand sampling and how that can be used most effectively.
  • Problem-Solving Prowess: You possess exceptional problem-solving skills, capable of addressing intricate challenges that may arise within our technology landscape. You are also a ‘detective’, using your skills and knowledge to collect evidence that eliminates what a problem is not, leading you to the most likely cause(s).
  • Pipeline Expertise: You will understand how to create, deploy and troubleshoot CI / CD pipelines (we use Gitlab) to run tests / checks, create builds and ultimately deploy software in various environments.
  • Technology Proficiency: You have solid knowledge of various technologies, tools, frameworks and patterns related to the previous five points, including, but not limited to: IaC (e.g. Terraform, Pulumi, CDK), Lambda runtime programming languages (e.g. TypeScript, Python), containerised applications (Docker), event-driven architecture and POSIX-based shells (e.g. Bash, zsh).
  • Tech Enthusiasm: Your passion for technology drives you to explore and embrace the latest innovations continuously. This dedication to growth and learning is essential in staying ahead in our ever-evolving tech landscape.

DESIRABLE SKILLS

  • OpenTelemetry: Previous experience or demonstrated knowledge of working with OpenTelemetry solidifies your observability expertise, enhancing our monitoring and diagnostic capabilities.
  • Google Cloud Platform (GCP): Although Dunelm’s distributed systems are overwhelmingly deployed on AWS, we do have strategic deployments in GCP, so any working knowledge of this platform would demonstrate your breadth of cloud knowledge.
    Behaviours and Values

At Dunelm, our shared values of ‘act like owners’, ‘keep listening and learning’, ‘long-term thinking’, and ‘stronger together’ serve as the foundation for our success. These values guide us continuously, improving our practices and ensure we dedicate our time to what truly matters. As a Site Reliability Engineer, you will exemplify these key behaviours:

  • Teamwork:
  • Assume positive intent in your interactions with others.
  • Build trust with your team colleagues.
  • Offer support to your peers, demonstrating care, and don’t hesitate to ask for help when needed.
  • Communication:
  • Engage in effective communication with colleagues and customers.
  • Foster a shared understanding by keeping everyone informed.
  • Share valuable information and insights to help others excel in their roles.
  • Curiosity:
  • Cultivate a curious mindset, always seeking to discover something new.
  • Ask “why” and strive to understand, continuously improving your knowledge.
  • Acknowledge that you won’t always have all the answers and be willing to ask for help.
  • Adaptability:
  • Embrace change by trying new approaches and viewing mistakes as learning opportunities.
  • Continuously learn from experience, adapting your methods to achieve better results.
  • Be versatile and take on diverse tasks and responsibilities.
  • Innovation:
  • Think like an underdog, always hungry for improvement.
  • Overcome obstacles and maintain momentum in driving change.
  • Generate ideas and improvements – innovation can come from anyone.

Responsibilities:

WHAT YOU’LL BE DOING

As a Site Reliability Engineer at Dunelm, you will become a key member of the SRE team. You are motivated and enthusiastic and able to use your operational and engineering knowledge to help develop effective tools, observability solutions, pipelines and more to allow the wider engineering and platform teams at Dunelm the ability to create, update and release with confidence.

RESPONSIBILITIES:

  • Observability Development: Designing, building, deploying, running and – ultimately – owning observability tooling, such as dashboards, monitoring and alerting.
  • Embedded Consultancy: Working with other teams throughout the Dunelm technical space to help increase their SRE maturity level – mainly through helping to define Service Level Objectives (SLO) and Service Level Indicators (SLI), plus working on the integrations to help them produce the required telemetry for them.
  • DevOps Best-Practice Advocacy: Promoting a DevOps culture, through ‘shift-left’ testing, non-functional (security, performance etc.) testing, continuous integration and deployment and working with other teams to share the responsibilities of the software that is built.
  • Incident Response: Being available to help investigate incidents in real-time – sometimes out of normal working hours as part of our on-call rota. Helping to ascertain impact and find observability gaps during these investigations. Being part of ‘blameless post-mortems’, focusing on collaboration and knowledge-sharing.
  • Workflow: Helping to create and refine work tickets, breaking down larger pieces of work into actionable pieces. Ensuring relevant knowledge is shared with the rest of the team while working on these tickets and clearly articulating any blocking circumstances.
  • Code Quality and Risk Mitigation: Review the team’s output to ensure all code is highly maintainable, supportable, and minimises operational risk.
  • Mentorship and Coaching: Mentor and guide other team members, including less experienced engineers, providing feedback and coaching to help them reach their full potential.
  • Research and Learning: Researching new technologies and architectural patterns by conducting technical Proof of Concepts (PoC) and propose improvements to existing platforms as well as developing new solutions. You will also be given the opportunity to do team-based and independent learning on a regular basis, to improve yours and the team’s knowledge.

The interview process for this role is really very simple:

  • First, we will have a roughly 45-minute video call where we will introduce Dunelm and the SRE function and provide more details about the role and expectations. We will also get to know you, your experience and what you can bring to this role, plus what your ambitions are for developing within the role.
  • Then, there will be a 90-minute technical discussion video call with other members of the SRE team. The questions will be open-ended, scenario-based and are designed to allow you to showcase your SRE knowledge, skills and adaptability


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

Graduate

Proficient

1

Leicester LE7, United Kingdom