Senior Software Engineer SRE - IT Operations at Odido

2521 Den Haag, , Netherlands -

Full Time

Start Date

Immediate

Expiry Date

15 Nov, 25

Salary

0.0

Posted On

16 Aug, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Kubernetes, Scripting, Storage, Python, Azure, English, Distributed Systems, Database Systems, Windows Administration, Elasticsearch, Docker, Aws, Automation, Kibana, Troubleshooting

Industry

Information Technology/IT

Description

MUST-HAVE SKILLS AND QUALIFICATIONS

Experience on NodeJs, Python and Rest Services
Experience with public cloud platforms (AWS, Azure) and related technologies (Docker, Kubernetes, CloudFormation).
Strong understanding of storage, database systems, caching, queueing, and networking.
Experience in leading technical recoveries and troubleshooting distributed systems.
Ability to debug, optimize code, and automate routine operational tasks.
Solid foundation in Linux or Windows administration and troubleshooting.
Strong knowledge of monitoring/observability tools (Prometheus, Grafana, Kibana, Elasticsearch).
Understanding of Service Level Agreements (SLAs) and Service Level Objectives (SLOs).
Proficiency in at least one programming language for automation and scripting.
Excellent command of English, both written and spoken.

NICE-TO-HAVE SKILLS

Knowledge of AI-driven operational solutions for predictive monitoring.
Background in security practices and compliance for cloud environments.

PRESS ON THE BUTTON

Are you as excited about Odido as we are? Then we are probably a good match. We are looking forward to meet you! You can apply via the application button. Done in a minute

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

THIS IS WHAT YOU WILL DO

At Odido, we’re on a mission to become the most customer-driven telco in the Netherlands—and our IT landscape plays a crucial role in making that happen. As Senior Software Engineer SRE, you are at the heart of service reliability, building and maintaining the systems that keep millions of customers connected—no matter how intense the traffic or how high the stakes.
You’ll be the technical lead ensuring end-to-end resiliency across our applications and infrastructure. From proactive monitoring to self-healing automation, you prevent outages before they impact our customers. Imagine this: it’s 3 AM, an anomaly is detected—your smart monitoring and automated scripts spring into action, rerouting traffic and neutralizing the risk before anyone even notices. The next day, your team dives deep into post-incident reviews, continuously improving how we work, learn, and scale.
You’ll work closely with SRE leads, platform engineers, and software teams to drive automation, reliability engineering, and observability at every level. Your expertise in fault-tolerant design, distributed systems, and DevOps practices will directly impact our ability to deliver seamless customer experiences—whether it’s over mobile, broadband, or fiber.
And if you have telecom experience—perfect. It’s not just a bonus here, it’s essential. We’re not just building any digital platform; we’re building for the unique demands of a high-volume, always-on telecom environment. Without prior experience in telecom ecosystems, we unfortunately cannot proceed with your profile for this role.
You’ll bring deep technical knowledge, a collaborative mindset, and a passion for solving complex problems in fast-moving environments. Together, we’ll raise the bar on performance, security, and customer satisfaction—every day.
You’ll be part of a high-impact engineering team, collaborating with developers, platform engineers, and operations specialists to continuously improve Odido’s service reliability, scalability, and efficiency. Your work will drive the automation, instrumentation, and observability that power Odido’s digital services.

KEY RESPONSIBILITIES:

Continuous Improvement: Oversee and enhance incident-response processes, ensuring lessons learned translate into structural improvements.
Automation & Application as Code: Develop reusable patterns for automation, configuration management, and deployment across teams and products.
Service Ownership: Take full responsibility for several critical services, ensuring high availability and reliability.
Incident Management: Lead or participate in outage response calls, quickly resolving incidents and minimizing downtime.
Monitoring & Observability: Design and implement proactive monitoring strategies using tools like Prometheus, Grafana, and Kibana to improve system performance.
Troubleshooting & Debugging: Analyze and fix system issues in a complex distributed environment and application stack
Engineering Best Practices: Advocate for DevOps and SRE principles, mentoring junior engineers on automation and operational excellence.