Senior Site Reliability Engineer

at  Lightspeed

Toronto, ON, Canada -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate24 Oct, 2024Not Specified25 Jul, 2024N/AGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place!
Lightspeed Services organization is building out a common set of services that span multiple products. This includes Financial Services (Payments and Capital) - the Fintech backbone of Lightspeed, Accounts Services, and Golf product.
Our SRE team is responsible for the design, build and operation of Lightspeed’s infrastructure backing critical Lightspeed services. The platform covers the full cycle of software delivery, from CI/CD pipelines to high-availability scalable production environments. Cloud Efficiency is a new team under the Services group that is tasked with optimizing how the company is using cloud infrastructure.

What you’ll be responsible for:

  • Understand how different business units and engineering teams use cloud services
  • Build resource attribution, measurement, monitoring and quota management frameworks
  • Build a deep understanding of how different services utilize cloud technologies and work with the development teams to come up with architecture improvements to achieve more optimal utilization and performance
  • Initiate and contribute to the continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
  • Design and architect operational solutions with the specific goal of increasing the efficiency, performance, and standardization of operational tasks
  • Obsess over reliability, help teams deliver reliable software
  • Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
  • Provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time)

What you’ll be bringing to the team:

  • Strong knowledge of Amazon Web Services and Google Cloud Platform
  • Proven track record of driving optimization of cloud services, including, but not limited to data pipelines, storage, databases, caching layer, cores, memory, etc
  • Ability to reason about tradeoffs between different architectures, such as single-tenant vs multi-tenant deployments, managed services vs dedicated instances
  • Understanding different types of SLAs/SLOs, different types of resource contracts, such as reserved instances, savings plans etc.
  • Analytical mindset: live by the metrics, deeply understand data and use it to drive technical decisions
  • Strong experience with Docker, Kubernetes & Linux Systems
  • Experience with CI/CD pipelines, using CircleCI, Jenkins, Github, ArgoCD, Helm etc
  • Experience with Infrastructure as code practices: we use Terraform
  • Ability to read & understand programming languages: Python, Ruby, Java, …
  • Good understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods and testing
  • Ability to partner effectively with other teams
  • Ability to plan, organize, prioritize and stay focused
  • Good experience provisioning and managing infrastructures with high availability constraints

What you need to be successful:

  • You are a problem solver who does not shy away from tackling complexity and critical thinking
  • You have a strong will to learn, grow and get out of your comfort zone
  • You have great energy and passion for technology
  • You are able to express yourself flawlessly in English
  • You have strong interpersonal skills
  • You are a team player and a bar raiser

And what about the rest:

  • Lots of autonomy, flexible work culture and possibility of remote work
  • Development of high-traffic products, used at the global scale
  • Exposure to modern and proven technology
  • Opportunity to learn and expand your skill set
  • Tons of growth opportunities into technical or people management roles
  • Amazing benefits & perks, including equity for all Lightspeeders
  • Opportunity to join a fast-paced, high-growth company
  • Become a valued part of the diverse and inclusive Lightspeed family

Responsibilities:

  • Understand how different business units and engineering teams use cloud services
  • Build resource attribution, measurement, monitoring and quota management frameworks
  • Build a deep understanding of how different services utilize cloud technologies and work with the development teams to come up with architecture improvements to achieve more optimal utilization and performance
  • Initiate and contribute to the continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
  • Design and architect operational solutions with the specific goal of increasing the efficiency, performance, and standardization of operational tasks
  • Obsess over reliability, help teams deliver reliable software
  • Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
  • Provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

Graduate

Proficient

1

Toronto, ON, Canada