Site Reliability Engineer (f/m/d)

at  Virtual Minds GmbH

Freiburg, Baden-Württemberg, Germany -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate25 Sep, 2024Not Specified26 Jun, 2024N/AScripting,Operators,Continuous Integration,Go,Code,Security,Collaboration,Jenkins,Languages,Python,Aws,Platform Management,Logging,Kubernetes,Communication SkillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

INTRODUCTION SENTENCE

Virtual Minds is a 100% subsidiary of ProSiebenSat.1 Media SE and stands for premium Adtech made in Europe for over 20 years. Whether SSP, DSP, DMP or adserving - as a first mover, we and our 200 employees always have the right solution for successful operations within the digital advertising market.
As a Site Reliability Engineer (SRE), you will play a crucial role in ensuring the reliability, availability, and performance of our Kubernetes platform and Software-as-a-Service (SaaS) applications deployed on Kubernetes. You will collaborate closely with our development, operations, and infrastructure teams to build, maintain, and optimize the systems that power our products. This position offers an excellent opportunity to work with cutting-edge technologies and contribute to the growth of a dynamic and innovative organization.

YOUR ESSENTIAL EXPERIENCE AND EDUCATION

  • Experience: Significant relevant experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role, with a strong focus on Kubernetes platform management and SaaS deployment.
  • Kubernetes Expertise: Proficiency in managing Kubernetes clusters and related tooling (e.g., Helm, kubectl, operators). Experience with container orchestration, service mesh, and Kubernetes networking.
  • Significant experience with AWS, especially services like EKS, MSK, RDS, S3, CloudTrail, CloudWatch, and deploying and managing the AWS infrastructure as code (Terraform & ArgoCD)
  • Programming and Scripting: Solid programming skills in languages such as Python or Go. Proficiency in scripting to automate tasks and develop tooling.
  • Monitoring and Logging: Experience with monitoring solutions (e.g., Prometheus, Grafana) and centralized logging platforms (e.g., ELK stack).
  • CI/CD: Knowledge of continuous integration and continuous deployment pipelines, preferably with tools like Jenkins, GitLab CI/CD, or Tekton.
  • Networking and Security: Understanding of networking concepts and security best practices in the context of Kubernetes and SaaS deployments.
  • Problem-Solving Skills: Strong analytical and problem-solving abilities to diagnose and resolve complex technical issues.
  • Collaboration and Communication: Excellent teamwork and communication skills to collaborate effectively with various teams and stakeholders.
  • Continuous Learning: A passion for staying up-to-date with the latest technologies, industry trends, and best practices in SRE and Kubernetes.

Responsibilities:

  • Kubernetes Platform Management: Design, deploy, and manage our Kubernetes platform to support scalable and reliable application deployments. Monitor and maintain the platform’s health, performance, and security.
  • SaaS Application Deployment: Oversee the deployment of our Software-as-a-Service applications on the Kubernetes platform. Implement best practices for application scalability, high availability, and disaster recovery.
  • Reliability and Availability: Implement robust monitoring, alerting, and logging systems to proactively identify and resolve potential issues. Ensure high system availability and quick incident response times.
  • Performance Optimization: Continuously optimize the Kubernetes infrastructure and SaaS applications to achieve maximum performance and efficiency. Conduct performance testing and tuning to meet or exceed service level objectives.
  • Incident Management: Participate in an on-call rotation to respond to incidents promptly and effectively. Conduct thorough post-incident reviews to identify root causes and implement preventive measures.
  • Automation and Tooling: Develop and maintain automation tools and scripts to streamline processes and improve the efficiency of operational tasks.
  • Security and Compliance: Implement security best practices for Kubernetes and SaaS applications. Collaborate with the security team to ensure compliance with industry standards and regulations.
  • Collaboration: Work closely with cross-functional teams, including development, infrastructure, and product management, to provide expertise and support throughout the software development lifecycle.
  • Continuous Improvement: Identify areas for improvement in the infrastructure, processes, and deployment methodologies. Propose and implement enhancements to increase system reliability and performance.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Network Administration / Security

Software Engineering

Graduate

Proficient

1

Freiburg, Germany