DevOps/Machine Learning Engineer at Thrive Career Wellness Inc
Remote, British Columbia, Canada -
Full Time


Start Date

Immediate

Expiry Date

23 Jun, 25

Salary

120000.0

Posted On

23 Mar, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Containerization, Infrastructure, Python, Communication Skills, Orchestration, Scripting Languages, Bash, Devops, Powershell, Code, Kubernetes, Docker

Industry

Information Technology/IT

Description

JOB OVERVIEW:

We are seeking a highly skilled DevOps Engineer to manage and optimize our AWS cloud infrastructure while supporting ML Ops initiatives. This role will focus on ensuring our cloud systems are secure, scalable, and efficient, while also enabling seamless deployment and operation of machine learning workflows.

QUALIFICATIONS:

  • Experience: 3+ years of experience in DevOps or a related role, with exposure to ML Ops workflows.

- TECHNICAL SKILLS:

  • Expertise in AWS services (e.g., EC2, S3, Lambda, EKS, SageMaker).
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
  • Hands-on experience with CI/CD tools like GitHub Actions, or GitLab CI/CD.
  • Strong skills in containerization (Docker) and orchestration (Kubernetes).
  • Proficiency in scripting languages such as Python, Bash, or PowerShell.
  • ML Ops Knowledge: Familiarity with SageMaker, Kubeflow, MLflow, or equivalent tools for machine learning operations.
  • Monitoring Tools: Experience with observability tools like CloudWatch, Prometheus, Grafana, or similar.
  • Problem-Solving: Strong troubleshooting skills for cloud and system-related issues.
  • Communication: Clear and effective communication skills to collaborate across technical and non-technical teams.
Responsibilities
  • Cloud Infrastructure Management: Design, implement, and maintain robust, scalable, and cost-efficient cloud solutions on AWS.
  • Automation & CI/CD: Build and maintain CI/CD pipelines to automate infrastructure provisioning, application deployments, and system monitoring.
  • Monitoring & Optimization: Develop monitoring solutions to ensure performance, reliability, and cost-effectiveness of cloud infrastructure.
  • Security: Implement cloud security best practices, including IAM, network configurations, and encryption strategies.
  • ML Ops Support: Collaborate with AI team and engineers to operationalize machine learning models, ensuring smooth integration into production systems.
  • Containerization & Orchestration: Use tools like Docker to containerize applications and manage clusters effectively.
  • Collaboration: Partner with software developers, data engineers, and other stakeholders to streamline workflows and ensure infrastructure aligns with business needs.
  • Documentation: Maintain comprehensive documentation of infrastructure, processes, and best practices for internal use and onboarding.
Loading...