DevOps Engineer at TRG Mobilearth Inc
Remote, British Columbia, Canada -
Full Time


Start Date

Immediate

Expiry Date

08 Jul, 25

Salary

80000.0

Posted On

08 Apr, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Database Optimization, Platforms, Tomcat, Devops, Thermal Management, Azure, Kubernetes, Slack, Operations, Communication Skills, Virtual Collaboration, Operating Systems, Recovery Planning, Power Optimization, Windows, Windows Administration, Mariadb

Industry

Information Technology/IT

Description

Seeking a skilled DevOps Engineer to join our dynamic team in a fully remote, work-from-home capacity. This role focuses on managing and optimizing our infrastructure, ensuring seamless deployment of applications, and maintaining high availability and security standards across QA, development, and production environments. The ideal candidate will have expertise in both Windows and Linux-based systems, database management, automation, disaster recovery, and hardware configuration for AI workloads, with a strong commitment to supporting critical systems after hours when needed.Key

Responsibilities:

  • Infrastructure Management:
  • Install, configure, and update Java, Tomcat, and MariaDB on both Windows and Linux-based systems.
  • Set up and maintain Nginx proxy servers on Linux to ensure efficient load balancing and routing.
  • Implement and manage replication for MariaDB to ensure data consistency and availability across platforms.
  • Application Deployment & Support:
  • Install and update application software across QA, developer, and production servers on both Windows and Linux environments.
  • Provide after-hours support for QA/developer machines and production servers at customer sites to ensure system reliability and uptime.
  • Automation:
  • Create and maintain scripts (e.g., Bash, PowerShell, Python) and batch files to automate repetitive tasks, streamline deployments, and improve operational efficiency across Windows and Linux systems.
  • Disaster Recovery & Security:
  • Design and implement disaster recovery systems for both Windows and Linux environments to minimize downtime and data loss.
  • Perform SOC 2-related tasks, including monitoring security issues, applying software updates promptly, and ensuring compliance with security standards.
  • AI Hardware Expertise:
  • Design, build, and optimize private server hardware to support high-performance AI model execution (e.g., Llama3) on both Windows and Linux platforms.
  • Select and configure GPUs, CPUs, memory, and storage solutions tailored for AI workloads.
  • Ensure hardware compatibility with software stacks and optimize for scalability and efficiency.
  • Collaboration:
  • Work closely with QA, development, and operations teams remotely to troubleshoot issues, optimize performance, and support testing and deployment cycles across dual-OS environments.

Required Skills & Qualifications:

  • Proven experience with system administration on both Windows and Linux, including Nginx configuration on Linux.
  • Expertise in installing, configuring, and updating Java, Tomcat, and MariaDB on both operating systems.
  • Advanced knowledge of MariaDB replication and database optimization in Windows and Linux environments.
  • Strong scripting skills (e.g., Bash, PowerShell, Python) for task automation across platforms.
  • Experience supporting QA, development, and production environments, including after-hours support, on both Windows and Linux.
  • Familiarity with disaster recovery planning and implementation for dual-OS setups.
  • Understanding of SOC 2 compliance, security monitoring, and rapid software patching.
  • Hardware Knowledge:
  • Experience building and configuring server hardware for AI workloads (e.g., GPU selection, thermal management, power optimization) compatible with Windows and Linux.
  • Understanding of AI model requirements (e.g., Llama3) and ability to match hardware to performance needs across platforms.
  • Excellent problem-solving skills and ability to work independently in a remote environment.
  • Strong communication skills for virtual collaboration using tools like Slack, Zoom, or similar.

Preferred Qualifications:

  • Experience with containerization tools (e.g., Docker, Kubernetes) on both Windows and Linux.
  • Prior exposure to cloud platforms (e.g., AWS, Azure, or GCP).
  • Certification in Windows administration, Linux, MariaDB, or relevant DevOps tools.
  • Background in AI infrastructure or machine learning operations (MLOps).

Work Schedule:

  • Full-time, work-from-home position with occasional after-hours support for production servers and customer sites.
  • Flexible hours with an emphasis on availability for critical system support as needed.

Be part of a forward-thinking, remote-first team that values innovation, collaboration, and reliability. This role offers the flexibility of working from home while providing opportunities to work on cutting-edge technologies, contribute to critical systems, and grow your expertise in DevOps, infrastructure management, and AI hardware optimization across Windows and Linux platforms.
Job Type: Full-time
Pay: $80,000.00-$100,000.00 per year

Benefits:

  • Dental care
  • Extended health care
  • Flexible schedule
  • Vision care
  • Work from home

Schedule:

  • 8 hour shift
  • Monday to Friday

Application question(s):

  • Describe how you would approach to build a private AI server. How would you select the hardware and operating system and supporting application software? The private AI server would run an open source model such as llama3 in a private network with reasonable response times for users. The model will implement RAG and the server could run additional models such as voice to text or scan an image of an ID to extract text content. How would you approach selecting the hardware with consideration to balancing costs that are reasonable for a private AI server? Also consider how maintenance could be automated for updating the OS and software and AI models as well as reporting on performance and security statistics. A future project is a private AI server integrated with the company user and staff applications so bonus points for a good answer. Thank you.

Experience:

  • DevOps: 10 years (required)

Work Location: Remot

Responsibilities
  • Infrastructure Management:
  • Install, configure, and update Java, Tomcat, and MariaDB on both Windows and Linux-based systems.
  • Set up and maintain Nginx proxy servers on Linux to ensure efficient load balancing and routing.
  • Implement and manage replication for MariaDB to ensure data consistency and availability across platforms.
  • Application Deployment & Support:
  • Install and update application software across QA, developer, and production servers on both Windows and Linux environments.
  • Provide after-hours support for QA/developer machines and production servers at customer sites to ensure system reliability and uptime.
  • Automation:
  • Create and maintain scripts (e.g., Bash, PowerShell, Python) and batch files to automate repetitive tasks, streamline deployments, and improve operational efficiency across Windows and Linux systems.
  • Disaster Recovery & Security:
  • Design and implement disaster recovery systems for both Windows and Linux environments to minimize downtime and data loss.
  • Perform SOC 2-related tasks, including monitoring security issues, applying software updates promptly, and ensuring compliance with security standards.
  • AI Hardware Expertise:
  • Design, build, and optimize private server hardware to support high-performance AI model execution (e.g., Llama3) on both Windows and Linux platforms.
  • Select and configure GPUs, CPUs, memory, and storage solutions tailored for AI workloads.
  • Ensure hardware compatibility with software stacks and optimize for scalability and efficiency.
  • Collaboration:
  • Work closely with QA, development, and operations teams remotely to troubleshoot issues, optimize performance, and support testing and deployment cycles across dual-OS environments
Loading...