Site Reliability Engineer - SRE at ait global inc
Montréal, QC, Canada -
Full Time


Start Date

Immediate

Expiry Date

05 Dec, 25

Salary

45.0

Posted On

06 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Powershell, Distributed Systems, Itil, Splunk, Security, Firewalls, Docker, Jenkins, Analytics, Scripting, Automation, Orchestration, Python, Kubernetes, Devops

Industry

Information Technology/IT

Description

JOB DESCRIPTION:

We are seeking a Production Support Engineer with expertise in Azure Cloud for the IT Reliability & Production Engineering (RPE) domain. The role focuses on ensuring the stability, performance, and reliability of production systems in a high-availability environment.

REQUIRED SKILLS:

  • Strong experience in Azure Cloud (Azure Monitor, App Insights, Log Analytics).
  • Scripting & Automation: PowerShell, Python, Terraform, or Ansible.
  • Monitoring & Observability: Prometheus, Grafana, Splunk, or Datadog.
  • Incident Management: ITIL, SRE principles, RCA methodologies.
  • CI/CD & DevOps: Azure DevOps, GitHub Actions, Jenkins.
  • Containers & Orchestration: Kubernetes (AKS), Docker.
  • Networking & Security: Load balancers, firewalls, IAM, VPNs.

PREFERRED QUALIFICATIONS:

  • Experience with large-scale distributed systems.
  • Familiarity with SQL/NoSQL databases.
  • Knowledge of Cloud-Native architecture

DESCRIPTION DU POSTE :

Nous recherchons un ingénieur support production spécialisé dans Azure Cloud pour le domaine de la fiabilité informatique et de l’ingénierie de production (RPE). Le poste consiste principalement à garantir la stabilité, les performances et la fiabilité des systèmes de production dans un environnement à haute disponibilité.

QUALIFICATIONS SOUHAITÉES :

  • Expérience avec les systèmes distribués à grande échelle.
  • Connaissance des bases de données SQL/NoSQL.
  • Connaissance de l’architecture cloud native.
    Job Type: Fixed term contract
    Contract length: 12 months
    Pay: $45.00-$60.00 per hour
    Work Location: In perso

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
  • Monitor, troubleshoot, and resolve production incidents and service disruptions.
  • Ensure system reliability, availability, and performance using SRE principles.
  • Automate manual processes and optimize cloud infrastructure on Azure.
  • Analyze logs, metrics, and alerts to prevent incidents.
  • Collaborate with development, DevOps, and infrastructure teams for issue resolution.
  • Implement CI/CD pipelines, observability, and proactive monitoring strategies.
  • Maintain Azure resources (VMs, AKS, Storage, Networking, etc.).
  • Participate in on-call rotations for critical production support.
Loading...