Data Center Engineer at Maincode
Melbourne, Victoria, Australia -
Full Time


Start Date

Immediate

Expiry Date

11 Dec, 25

Salary

180000.0

Posted On

11 Sep, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Infiniband, Switching, Routing, Critical Infrastructure, Kubernetes, Firewalls, Rdma, Logging, Security

Industry

Information Technology/IT

Description

MINIMUM QUALIFICATIONS

  • Formal qualifications and industry certifications in datacenter, systems, or networking operations.
  • 5+ years of experience operating production datacenter environments at scale.
  • Strong expertise in large-scale networking and security (routing, switching, firewalls, load balancing).
  • Proven Linux systems administration experience across distributed clusters.
  • Hands-on experience with monitoring, logging, and alerting systems.
  • Demonstrated track record running mission-critical infrastructure with strict uptime requirements.
  • Ability to work on-site in datacenters and handle physical hardware deployment and troubleshooting.

PREFERRED QUALIFICATIONS

  • Experience with GPU clusters, HPC, or large-scale compute systems.
  • Familiarity with Kubernetes, Slurm, or other cluster schedulers.
  • Knowledge of InfiniBand, RDMA, or other high-performance networking.
  • Background supporting AI/ML or high-density compute workloads.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

THE ROLE

We are hiring Data Center Engineers to help build and operate the backbone of our production AI infrastructure. This is a hands-on, on-site role with direct responsibility for keeping our clusters online, performant, and secure. You will deploy and configure hardware at scale, manage complex networking from the edge inward, and respond to live production issues where every minute counts.
The technical bar at Maincode is unlike anywhere else. This is a frontier team moving fast, where the infrastructure you run powers large-scale AI training and serving in real production environments. We are looking for someone who already thinks deeply about how large-scale AI infrastructure runs, has direct experience operating mission-critical datacenter systems, and brings ideas about how to do it better.

RESPONSIBILITIES

  • Deploy, configure, and maintain production servers, storage, and networking infrastructure.
  • Design, implement, and operate large-scale networking (routing, switching, firewalls, load balancing).
  • Administer Linux systems across clusters, ensuring OS-level performance and stability.
  • Monitor system health, respond to incidents, and manage downtime and recovery.
  • Implement and enforce best practices for availability, redundancy, and security.
  • Work closely with engineering and research teams to ensure compute resources meet their requirements.
  • Coordinate with vendors and suppliers for hardware, networking, and service delivery.
  • Maintain clear documentation and operational standards for production environments.
Loading...