Site Reliability Engineer (f/m/d) at arsys ES
Barcelona, Catalonia, Spain -
Full Time


Start Date

Immediate

Expiry Date

28 Dec, 25

Salary

0.0

Posted On

29 Sep, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Site Reliability Engineering, Linux, Kubernetes, Terraform, GitLab CI/CD, ArgoCD, Go, Python, Bash, Monitoring, Logging, Alerting, Prometheus, Grafana, ELK Stack, Incident Management, Automation

Industry

IT Services and IT Consulting

Description
We are looking for a Site Reliability Engineer (SRE) in the IONOS Applications team Tasks Contribute to the evolution of product infrastructure, integrating new services and applications into our cloud and Kubernetes environment. Ensure the stable and secure operation of our platforms. Perform in-depth analysis and optimization of distributed and highly scalable environments. Drive automation using tools such as Terraform, GitLab CI/CD, and ArgoCD, managing infrastructure declaratively and reproducibly. Analyze and resolve complex issues in distributed systems, contributing to the continuous improvement of the platform. Develop and maintain monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK Stack) to proactively detect bottlenecks and sources of error. Participate in on-call rotations, one week every 4 to 5 weeks. Collaborate with product development teams to organize joint projects. Manage incidents end-to-end: initial analysis, ticket creation, resolution, and follow-up through Problem Management. Have access to up to one day per week for learning and continuous training. Expectations Several years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Developer). Advanced expertise in Linux, container technologies, and especially Kubernetes. Experience with Infrastructure as Code (preferably Terraform), CI/CD pipelines (GitLab CI/CD, GitHub Actions), and Helm charts. Proficiency in at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks. Experience in operating and troubleshooting high-availability production environments. Knowledge of monitoring, alerting, and log analysis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icinga). A proactive, solution-oriented, and independent working style, with the ability to systematically analyze and sustainably resolve technical problems. Good command of English (spoken and written). En Arsys valoramos la diversidad y damos la bienvenida a todas las candidaturas independientemente de, por ejemplo, el sexo, la nacionalidad, el origen étnico o social, la religión, la discapacidad, la edad, así como la orientación y la identidad sexuales, las características físicas, el estado civil o cualquier otro factor irrelevante sujeto a la legislación aplicable.
Responsibilities
The Site Reliability Engineer will contribute to the evolution of product infrastructure and ensure the stable and secure operation of platforms. They will also analyze and resolve complex issues in distributed systems and develop monitoring solutions.
Loading...