Senior Site Reliability Engineer

at  ECHO360 INC

Ontario, Ontario, Canada -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate07 Nov, 2024USD 120000 Annual08 Aug, 20245 year(s) or aboveGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Description
We are seeking a highly motivated Senior Site Reliability Engineer with either SRE or DevOps experience that can help us develop and automate the various services we operate as part of the Echosystem. In this role, you will have the opportunity to drive availability and reliability across multiple engineering teams and work closely with them to ensure the operational aspects of managing services is automated and observable.
This position is FULLY REMOTE; we will consider candidates who are located in many, but not all, states within the United States. For US-based positions, candidates must be eligible to work in the United States for any employer.
Requirements

Responsibilities:

THE PRIMARY RESPONSIBILITIES FOR THIS ROLE INCLUDE:

  • Monitor and troubleshoot production incidents proactively, identifying and resolving issues quickly and efficiently.
  • Implement automated monitoring and alerting systems for early detection of potential problems.
  • Collaborate with development teams to perform deployments and rollbacks with minimal disruption.
  • Optimize the performance and scalability of our AWS infrastructure, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, and E2.
  • Write and maintain infrastructure code using Terraform and scripts to automate tasks and improve operational efficiency.
  • Proactively identify and address potential security vulnerabilities.
  • Participate in incident response and post-mortem analysis activities to identify root causes and prevent future occurrences.
  • Help onboard and mentor junior team members, sharing your knowledge and expertise.
  • Stay up to date on the latest cloud technologies and best practices for SRE.
  • Participate in a low-volume on-call rotation with other Site Reliability Engineers.
  • Explore new technologies and innovative solutions to improve service quality and speed to market.
  • Participate in technical discussions and deep dives with the other engineering and product teams.

THE IDEAL CANDIDATE FOR THIS ROLE WILL HAVE:

  • 5+ years of experience as a Site Reliability Engineer or similar role.
  • Strong understanding of AWS cloud services, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, and E2.
  • Experience with infrastructure automation tools like Ansible, Terraform, or CloudFormation.
  • Experience with monitoring and alerting tools like DataDog, Prometheus, Grafana, Kibana, and PagerDuty.
  • Experience with Cl/CD pipelines and deployment strategies.
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration skills.
  • Ability to work independently and take ownership of complex tasks.
  • Passion for technology and a desire to learn and grow.
  • Experience with Rancher, Cattleprod, Jenkins, TeamCity, PostgreSQL, and MongoDB.
  • Experience with security best practices and tools.
  • Experience working in a fast-paced, agile environment.
    Additional Job Details
    The base salary range for this position is $120,000 - $140,000 annually.
    Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills and experience. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work.


REQUIREMENT SUMMARY

Min:5.0Max:10.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Ontario, Canada