Linux Systems Engineer at Cadre5
Knoxville, TN 37932, USA -
Full Time


Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

0.0

Posted On

04 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Information Technology, Image Processing, Web Pages, Python, Bash, Consideration, Platforms, Vmware, Openstack, Interpersonal Skills, Automation Tools, Software, Git, Tuning, Maintenance, Confluence, Information Systems, Jenkins, Communication Skills, Linux Distributions

Industry

Information Technology/IT

Description

LINUX SYSTEMS ENGINEER

Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the Emerging Technologies, AI & Computing group in the Research Computing Support division in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL) to recruit a qualified Linux Systems Engineer who will focus on supporting the technological needs of ORNL researchers.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.

CJ

OVERVIEW:

This role advocates and promotes Linux systems to researchers who process large data sets and/or develop code as a part of their project. This includes ensuring the availability, performance, scalability, and security of production systems. The ETAC Group frequently uses automation and monitoring solutions to minimize our day-to-day maintenance and are always looking for opportunities to optimize system management practices or system performance. As the primary domain experts for these systems, you will work with technical staff to install and help tune the performance of various scientific toolsets.

BASIC QUALIFICATIONS:

  • A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, business, or a related discipline and a minimum of five (5) to seven (7) years of aligned professional experience is required for consideration. An overall combination of equivalent education and experience may be considered.

o Masters and PhD degree holders in the same fields of study are also encouraged to apply:

  • Masters’ holders should have a minimum of four (4) to six (6) years of relevant and aligned experience.
  • PhD holders should have up to three (3) years of relevant and aligned experience.
  • Strong knowledge of Enterprise Linux distributions and enterprise class server/storage hardware.
  • Experience monitoring and maintaining hardware and software including, but not limited to, InfiniBand, Slurm, Lustre, RDMA, Weka, and related technologies central to this team’s work.
  • Experience with configuration management and automation tools such as Git, Jenkins, Ansible, Puppet.
  • Experience managing a virtualized environment including tuning and maintenance.
  • Strong working knowledge of system design.
  • Ability to obtain and maintain a security clearance is required.
  • Experience creating scripts using Bash, Python, etc.
  • Experience with on premises cloud native platforms (OpenStack, VMware, or others).
  • Experience with work planning and documentation tools (such as Jira, Confluence, etc.)
  • The ability to obtain and maintain a Department of Energy “Q” clearance is required. This requires US Citizenship.

PREFERRED QUALIFICATIONS:

  • Experience supporting AI-enabled systems and software for model training is preferred.
  • Understanding of platforms to support users with job submissions and troubleshooting.
  • Excellent interpersonal skills suitable for communication with customers and management.
  • Effective written, presentation, and verbal communication skills.
  • Experience with Centos/RHEL, Ubuntu, VMware.
  • Experience building and running containerized applications in an environment. Knowledge of Apptainer, Warewulf, Fuzzball.
  • Experience managing systems using GPU/CUDA clusters for AI/ML and/or image processing.
  • Proven ability to work in a dynamic environment and support large data systems.
  • Effective documentation skills, including ability to prepare simple documentation web pages.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
  • Develop and document system and service diagrams, procedures, and software build/install notes.
  • Create observability metrics and dashboards and assist in ongoing documentation efforts.
  • Install, configure, customize, and maintain Linux software including building software from source.
  • Collaborate with researchers, developers, and other engineers to develop creative solutions and solve complex challenges.
  • Create and foster partnerships with research at ORNL to encourage outstanding delivery of services.
  • Translate project deliverables, milestones, and timelines into predictable team output and tasks.
  • Provide consulting in the selection and purchase of hardware and software systems.
  • Ensure configuration management using tools such as Git, Jenkins, Ansible, Puppet, etc.
  • Ensure the secure and effective operation of systems through compliance with ORNL procedures and IT Internal Operating Procedures.
  • Develop, architect, and engineer systems and related application software solutions, including research projects. This includes:
    o Monitoring for system issues
    o Guiding project tasks through the engineering processes and ensuring that standard methodologies are implemented continuously and consistently
    o Managing backup services
    o Troubleshooting and resolving system problems quickly and effectively
    o Working with other systems engineers and vendors to resolve hardware and software issues
Loading...