Site Reliability Infra Engineer (AI, LLM) at Binance
Brisbane, Queensland, Australia -
Full Time


Start Date

Immediate

Expiry Date

09 Nov, 25

Salary

0.0

Posted On

09 Aug, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Kafka, Zk, Hbase, Spark, Yarn, Airflow

Industry

Information Technology/IT

Description

Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.
We are looking for a seasoned SRE/ AI Engineer to design and improve our central Big Data infrastructure/services to the next stage, to ensure the data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective.

REQUIREMENTS

  • Have source code understanding of open-source data groups, such as HDFS, HBase, YARN, Spark, Flink, Airflow, Kyuubi, ZK, Kafka, etc.
  • In-depth understanding of Linux and computer networks.
  • Experience in at least one language (Python/Golang/Java, etc.).
  • Experience in profiling, benchmarking and optimizing ML applications
  • Self directed, self motivated and detail oriented with ability to come up with good design proposals and thorough analysis of production issues
  • Ability to thrive in a multi-functional team on high profile, critical projects
  • Minimum of 5 years of hands-on experience on backend or big data ecosystem.
  • Comfortable working in a high-velocity startup environment with evolving goals and systems
Responsibilities
  • Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation, and refinement.
  • Develop and maintain tools, re-designing capacity planning infrastructure for greater scalability.
  • Troubleshooting, diagnosing, fixing software issues, and ensuring data security.
  • Build production LLM systems to power business functions, from data to production, emphasizing automation and reproducibility.
  • Optimize and support LLM workloads in on-prem environments
Loading...