Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Rem at Alex Staff Agency

, , Germany -

Full Time

Start Date

Immediate

Expiry Date

19 May, 26

Salary

8000.0

Posted On

18 Feb, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Linux/Ubuntu, Kubernetes, Rancher, Docker, Containerd, Ansible, Terraform, Prometheus, Thanos, Grafana, Sentry, Jenkins, Bash, Python, ClickHouse, MongoDB

Industry

Staffing and Recruiting

Description

About the project The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load. Infrastructure: ~100 servers (bare metal + VPS) active use of IaC Kubernetes clusters in production focus on stability, observability, and automation The project is long-term — not a hype startup, but a mature product with real users. What the work looks like This is a hands-on role with a clear time allocation: 60% — operations and incidents (including helping teams) 20% — infrastructure automation 20% — prototyping, improvements, technical initiatives There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week. Responsibilities Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting) Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews) Monitoring, alerting, backups, and regular recovery checks Development of service and infrastructure automation Development of CI/CD and release procedures Incident diagnosis and resolution, support for product teams Traffic analytics, bot and attack protection tools Responsibility for 24/7 platform stability What’s important 4+ years of experience operating Linux/Ubuntu infrastructure and production services Strong understanding of networking and troubleshooting Kubernetes (cluster operations), Rancher, Docker / containerd Hands-on experience with Ansible and Terraform Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry CI/CD: Jenkins Automation: Bash, Python Experience working with LVM Nice to have Experience working with blockchain nodes Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters Providers: Hetzner / OVHcloud Cloudflare (edge, DDoS), experience with AWS Handling abuse tickets with hosting providers Technology stack VPN: WireGuard, OpenVPN Databases: ClickHouse, MongoDB, Redis, PostgreSQL Applications: Node.js (pm2), php-fpm, Lua, Tarantool Supporting services: Go (operatorSDK), Ruby, Node.js, PHP 5,000 – 8,000 € net Format: office / hybrid / remote Location: Spain (Barcelona and suburbs) or remote (CET ±2) Full-time Opportunity to genuinely influence architecture and processes Mature engineering team and reasonable expectations

Responsibilities

The role involves operating production services and infrastructure, including server management, updates, and performance troubleshooting, alongside supporting and developing Infrastructure as Code using Terraform and Ansible.