DevOps Engineer at Radiant Digital
Dallas, TX 75202, USA -
Full Time


Start Date

Immediate

Expiry Date

09 Nov, 25

Salary

0.0

Posted On

10 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Good communication skills

Industry

Information Technology/IT

Description

At Radiant Digital, we provide IT solutions and consulting services to help government agencies and businesses in the USA, Canada, the Middle East, and Southeast Asia. On the federal side, we support agencies like NASA, the Department of State (DOS), the IRS, ACL, ACF,USDA and many others, along with numerous state and local government agencies.
We work with industries like telecom, healthcare, entertainment, oil and gas offering solutions designed to meet their specific needs. We focus on improving systems, making better use of data, and updating applications to keep up with changing markets.

JOB DESCRIPTION:

Senior DevOps Engineer with a strong background in infrastructure, compute, and storage automation to join our Storage and Compute Platform Management team. This is a contractor role focused on building scalable, reliable, and automated infrastructure systems that power our high-performance computing (HPC) and storage environments.
The successful candidate will play a key role in automating the provisioning, configuration, monitoring, and management of our compute and storage infrastructure, which supports multimegawatt CPU and GPU farms used for cutting-edge quantitative research and machine learning workloads. This is an exciting opportunity for someone passionate about infrastructure at scale, automation, and performance, with a forward-thinking mindset and a collaborative attitude.

Responsibilities
  • Design, develop, and maintain automation frameworks for provisioning and managing HPC and storage infrastructure.
  • Implement infrastructure-as-code and configuration management best practices to ensure consistency and repeatability. Collaborate with platform teams to improve scalability, reliability, and observability of systems.
  • Troubleshoot performance, reliability, and scale issues across a variety of infrastructure components.
  • Drive continuous improvement through automation, performance tuning, and capacity planning.
  • Support the deployment and operations of distributed systems and services used across the organization.
Loading...