Senior Site Reliability Engineer at Sinch
Atlanta, Georgia, USA -
Full Time


Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

179000.0

Posted On

04 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Aws, Elasticsearch, Google Cloud Platform, Python, Communication Skills, Distributed Databases, Reliability, Production Systems, Operations, Infrastructure, Team Culture, Amazon Web Services

Industry

Information Technology/IT

Description

Sinch is pioneering the way the world communicates. More than 150,000 businesses — including Google, Uber, Paypal, Visa, Tinder, and many others — rely on Sinch’s Customer Communications Cloud to power engaging customer experiences through mobile messaging, voice, and email.
Whether you need to verify users or craft omnichannel campaigns, Sinch makes it easy. Our AI-infused Super Network, APIs, and applications ensure you can connect with your customers reliably and securely, at every step of their journey.
At Sinch we “Dream Big”, “Win Together”, “Keep it simple”, and “Make it Happen”. These values are our foundation!
At Sinch Mailgun, we’re building the infrastructure that powers communication at internet scale. As one of the largest email providers in the world, our platform delivers billions of emails every day for developers, startups, and global enterprises alike.
We’re looking for a Senior Site Reliability Engineer to join our SRE team, the group responsible for keeping our systems fast, reliable, and secure. In this role, you will assist in shaping, scaling, and optimizing the critically important infrastructure that underpins each Mailgun service. You’ll work closely with product engineering teams to drive improvements, automate workflows, and ensure our systems meet the highest reliability standards.
This is more than just keeping the lights on. You’ll be engineering the future of a platform trusted by developers and companies around the globe, solving complex distributed systems challenges, and driving real-world innovation in how email infrastructure is built and operated.

REQUIREMENTS:

  • Strong background in infrastructure, operations, or software engineering with a focus on reliability.
  • Extensive experience working with cloud platforms such as Google Cloud Platform (GCP) or Amazon Web Services (AWS).
  • Proficiency in using configuration management tools like Terraform and Ansible to manage infrastructure.
  • Hands-on experience with modern monitoring and observability tools such as Prometheus, Grafana, and similar technologies.
  • Proven experience with distributed databases (e.g. Cassandra, Elasticsearch) and maintaining their health at scale.
  • Familiarity with distributed event stores and stream-processing platforms.
  • Strong coding skills in at least one modern programming language (Python, Go, etc.).
  • Expertise in running and maintaining production systems in a Linux environment and public cloud infrastructure.
  • Demonstrated expertise in architecting solutions for complex technical challenges, and the ability to lead initiatives from conception through to execution.
  • Strong interpersonal and communication skills, with a history of building effective relationships with cross-functional teams.
  • Ability to mentor and guide junior engineers, fostering a collaborative and inclusive team culture.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
  • Collaborate with other teams to define and implement system requirements.
  • Design, build, and maintain cloud-based microservices infrastructure.
  • Automate routine operational tasks and remediation processes to improve efficiency and reliability.
  • Proactively fix and resolve issues, collaborating with support teams, other engineering teams, and using monitoring tools to ensure system health.
  • Ensure that datastores operate efficiently and meet performance and availability goals.
  • Contribute to the team’s growth by mentoring junior engineers and sharing standard methodologies.
  • Plan and execute strategies for scaling systems and infrastructure as needs grow.
Loading...