Site Reliability Engineer - SRE at SPS Commerce
Brampton, ON L6T 4B5, Canada -
Full Time


Start Date

Immediate

Expiry Date

08 Jul, 25

Salary

71300.0

Posted On

08 Apr, 25

Experience

1 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Ec2, Kubernetes, Infrastructure, Amazon Web Services, Task Execution, Python, Linux, It, Logging, Docker

Industry

Information Technology/IT

Description

Description:
SPS Commerce is a leading provider of cloud-based supply chain management solutions, serving a global network of retail trading partners. We foster a collaborative and inclusive work environment where innovation and continuous improvement are highly valued. Join SPS Commerce and be part of a dynamic team that’s transforming the global retail supply chain!

POSITION SUMMARY:

You would join the SPS Commerce Site Reliability Engineering (SRE) team who is responsible for delivering highly available platform services and deployment automation that empower our product engineering teams with services that are secure, reliable, cost effective, and enable product speed.
You would work within a fast-paced and collaborative environment, and you would partner with the development team to deliver market leading products and services. This role uses automation and other technologies to intelligently cope with challenging failures while collaborating with various engineering organizations to resolve failure risks at the source.
The SRE team at SPS approaches Operations as a software problem and aims to apply software engineering approaches to those problems.

REQUIRED QUALIFICATIONS:

  • 2+ years IT experience with a Bachelor’s degree; or 1 years and a Master’s degree; or a PhD without experience; or equivalent work experience
  • Experience with Golang
  • Experience in platform and service mesh technologies such as Docker, Kubernetes,
  • Experience administering Linux
  • Experience with Amazon Web Services (EC2, IAM, Route53, Cloud Formation)
  • Experience with immutable and scalable infrastructure (infrastructure as code concepts)
  • Demonstrated understanding of networking systems, various identity and authorization systems

PREFERRED EXPERIENCE

  • Experience working with Agile development methodology and task execution
  • Experience building or operating CI/CD pipelines or other deployment automation solutions
  • Experience with advanced monitoring solutions such as metrics platforms, logging, distributed tracing
  • Experience developing in Python
  • Experience with Terraform
Responsibilities
  • Engineer and maintain highly available, secure, and cost-effective container orchestration platforms such as Kubernetes and ECS
  • Engineer Continuous Integration & Continuous Delivery (CI/CD) solutions that simplify and improve software deployments to enable high velocity for our Product Engineering partners
  • Develop robust monitoring and observability services and patterns to consistently improve the team’s ability to identify, react, respond, and recover from complex failures
  • Collaborate with Technology Engineering, Development, and Product Management to help develop, scale, and improve production systems and services
  • Partner with service teams to provide appropriate documentation, cross-training, architecture planning, capacity management, and recommendations for future state
  • Engineer technical solutions to prevent or reduce the frequency of failures
  • Write clean and correct code, write test plans and identify code quality improvements when reviewing code
Loading...