Software Engineer, Reliability Engineering

at  Grammarly Inc

Deutschland, , Germany -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate04 Jan, 2025Not Specified06 Oct, 20247 year(s) or aboveAzure,Reliability,Aws,Go,Software Development,Infrastructure Solutions,Customer ValueNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Grammarly is excited to offer a remote-first hybrid working model. Grammarly team members in this role must be based in Germany, and, depending on business needs, they must meet in person for collaboration weeks, traveling if necessary to the hub(s) where their team is based.
This flexible approach gives team members the best of both worlds: plenty of focus time along with in-person collaboration that fosters trust and unlocks creativity.

THE OPPORTUNITY

To achieve our ambitious goals, we’re looking for a Software Engineer to join our Reliability Engineering team as part of the wider Engineering Platform team. This role will build world-class, secure, and reliable cloud-native infrastructure solutions for Grammarly engineers that will scale with our user base.
Grammarly’s engineers and researchers have the freedom to innovate and uncover breakthroughs—and, in turn, influence our product roadmap. The complexity of our technical challenges is growing rapidly as we scale our interfaces, algorithms, and infrastructure. You can hear more from our team on our technical blog.
As a Reliability Engineer, you will be a key player in building and enhancing the reliability and observability of our services across the engineering organization. You will be part of a centralized team focused on improving incident management, introducing auto-scaling and resilience mechanisms, conducting chaos testing, and self-healing. Your work will be instrumental in establishing a center of excellence for reliability, establishing and evangelizing best practices, and developing tools to scale these practices across all engineering teams.

In this role, you will:

  • Use modern infrastructure management tools and services like AWS to build a massively scalable platform for Grammarly’s services.
  • Be an ambassador for Operational Excellence - building and continually improving incident management tooling and processes.
  • Implement proactive reliability improvements to reduce manual intervention and increase reliability. This includes automated deployment improvements, canary analysis, self-healing mechanisms, and autoscaling.
  • Manage cloud-native infrastructure solutions, such as cross-service infrastructure, Kubernetes clusters and deployments, auto-scaling tool sets, and service discovery.
  • Build solutions and frameworks to spin up, test, deploy, and observe Grammarly’s service reliability.
  • Participate in on-call incident response and escalation procedures.

QUALIFICATIONS

  • Has a minimum of 7 years of experience managing live production SaaS environments with high load.
  • Is experienced in working on a centralized reliability or SRE team configuration.
  • Hands-on experience with cloud-native infrastructure solutions such as container orchestration and service discovery in Kubernetes-based environments.
  • Background in software development or engineering roles with a focus on reliability.
  • Is knowledgeable on all things Reliability and how to scale those solutions across the engineering organization.
  • Is knowledgeable of AWS —or has deep expertise in Azure or GCP and is willing to learn AWS quickly.
  • Can deliver maintainable and high-quality code in Go or other languages.
  • Can communicate well and collaborate effectively, empathetically, and proactively on a tightly integrated team.
  • Embodies our EAGER values—is ethical, adaptable, gritty, empathetic, and remarkable.
  • Is inspired by our MOVE principles, which are the blueprint for how things get done at Grammarly: move fast and learn faster, obsess about creating customer value, value impact over activity, and embrace healthy disagreement rooted in trust.
  • Is able to meet in person for their team’s scheduled collaboration weeks, traveling if necessary to the hub where their team is based.

Responsibilities:

  • Use modern infrastructure management tools and services like AWS to build a massively scalable platform for Grammarly’s services.
  • Be an ambassador for Operational Excellence - building and continually improving incident management tooling and processes.
  • Implement proactive reliability improvements to reduce manual intervention and increase reliability. This includes automated deployment improvements, canary analysis, self-healing mechanisms, and autoscaling.
  • Manage cloud-native infrastructure solutions, such as cross-service infrastructure, Kubernetes clusters and deployments, auto-scaling tool sets, and service discovery.
  • Build solutions and frameworks to spin up, test, deploy, and observe Grammarly’s service reliability.
  • Participate in on-call incident response and escalation procedures


REQUIREMENT SUMMARY

Min:7.0Max:12.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Deutschland, Germany