Site Reliability Engineer at Sectigo

Ottawa, ON, Canada -

Full Time

Start Date

Immediate

Expiry Date

30 Sep, 25

Salary

0.0

Posted On

01 Jul, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Configuration Management, Software, Proxmox, Python, Docker, Data Processing, Jenkins, Databases, Bash, Vmware, Computer Science, Technology, Finance, Puppet, Government, File Systems, Kubernetes, Root, Maintenance, Ansible, Git, Infrastructure, Information Systems

Industry

Information Technology/IT

Description

COMPANY DESCRIPTION

At Sectigo, we align around our mission and pride ourselves in helping thousands of customers sleep better at night.
Sectigo is a leading provider of digital identity and cybersecurity solutions, offering a comprehensive suite of products to protect online transactions and communications. Our mission is to secure the digital landscape for enterprises worldwide.

JOB DESCRIPTION

We are looking for a Site Reliability Engineer to join our growing global team at Sectigo.
The Site Reliability Engineer will design and implement solutions to reduce toil and ensure reliability of our critical services at Sectigo.
This is a full-time and individual contributor role working in the hybrid model and at least 3 days a week from our Ottawa office, reporting to our Cloud Operations team.
The compensation range for this position is between CAN 100,000 and CAN 115,000, based on years of experience and internally equity.

Here are the core functions, responsibilities, and expectations for this role:

Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives.
Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
Build and maintain proper monitoring of our services by utilizing centralized logging and time series databases.
Automate deployments, administration, and monitoring of our services by following CI/CD practices.
Work with engineering and information security teams to enhance, document, establish processes and generally improve the operability and security of our services.
Other duties as assigned and related to the nature of this role and company initiatives.
Participation in team on-call rotation is required.

EDUCATION:

Bachelor’s degree in information systems, computer science, technology, or a related field is preferred.

EXPERIENCE:

Minimum of 3+ years of software and/or operational experience in building and maintaining internet-facing production environments is required.
Strong experience with Linux/Unix systems administration.
Knowledge of source control tools (Git preferred).
Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred).
Good understanding of container technology (Docker, Kubernetes preferred).
Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar.) and alerting systems.
Experience with non-cloud infrastructure.
Experience running a large-scale 24/7 production environment.
Experience with distributed data processing, databases, and large-scale file systems is a plus.

IDEAL CANDIDATE PROFILES, TALENTS, AND DESIRED QUALIFICATIONS:

Strong scripting abilities in Bash and Python.
Experience with incident management, troubleshooting, and root cause analysis.
Experience in handling postmortems, building incident response plans, and improving incident resolution procedures.
Experience running and maintaining real-world build systems (Jenkins, DroneCI, or similar tools)
Demonstrable experience with the entire life cycle of software, starting with Systems Architecture, Systems Design, Implementation, Maintenance, and Operation.
Programming experience using HTTP Service APIs.
Virtualization experience (VMWare, Proxmox, Oracle Linux Virtualization Manager).
Network administration experience is a plus.
Exposure to Security and Testing frameworks is a plus.
Exposure to compliant regulated industries such as Finance, Healthcare, or Government is a plus.
Experience with distributed data processing, databases, and large-scale file systems is a plus.

Responsibilities

Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives.
Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
Build and maintain proper monitoring of our services by utilizing centralized logging and time series databases.
Automate deployments, administration, and monitoring of our services by following CI/CD practices.
Work with engineering and information security teams to enhance, document, establish processes and generally improve the operability and security of our services.
Other duties as assigned and related to the nature of this role and company initiatives.
Participation in team on-call rotation is required