Support Operations Engineer (SOE)

at  CoreWeave Europe

London, England, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate22 Oct, 2024Not Specified22 Jul, 20242 year(s) or aboveBash,Virtual Machines,Communication Skills,Web Technologies,Stack,Docker,Authentication,Rack,Web Servers,Collaboration,Python,Scripting Languages,System Administration,Private Networks,Working Experience,Independent Thinking,Machine Learning,InfinibandNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Fueled by CoreWeave’s remarkable growth over the last year and to keep up with the surging demand we’re seeing from customers and in the market, CoreWeave is expanding its operations to Europe. We’ve built a reputation for delivering cutting-edge GPU infrastructure for leading AI companies, and are thrilled to continue this journey as a trusted partner to the AI community in Europe. If you thrive in fast-paced, high growth environments and want to play a key role in building and delivering the critical infrastructure required by AI, we’d love to hear from you. Learn more at www.coreweave.com.
As a Support Operations Engineer, you will be at the forefront of this transformational technology. You will assist a list of cutting-edge companies and developers using our accelerated compute services and features to run their mission-critical applications. You will be a crucial component in the success of their production-critical implementations via deployment, monitoring, triaging, and troubleshooting of critical infrastructure and jobs. Your efforts will ensure the efficient and uninterrupted execution of our clients’ jobs.
This role includes shift work, participation in an on-call rotation, and occasional after-hours support. It operates within a fast-paced, global, 24/7 support team environment, requiring flexibility for collaboration across different time zones.

REQUIRED SKILLS AND EXPERIENCE

  • Two years of experience with server hardware installation and server configuration in data center environments
  • Strong Linux command-line skills and experience with system administration.
  • Experience with High-Performance Computing (HPC) system administration
  • Understanding of networking concepts and troubleshooting (e.g., TCP/IP, InfiniBand).
  • Working experience with Kubernetes & Docker
  • Proficiency in scripting languages such as Bash, Python, for automation.
  • Solid understanding of distributed computing environments and methodologies, including storage volumes, private networks, load balancers, and virtual machines
  • User Support experience and excellent communication skills to assist and train end-users.
  • You have a knack for solving problems; you’re adept at recognizing technical issues and developing appropriate solutions
  • Working knowledge of basic Windows sysadmin

DESIRED SKILLS AND EXPERIENCE

  • Data Center Experience: Proficient in rack and stack, as well as server and cable troubleshooting.
  • GPU Hardware and HPC: Familiar with GPU hardware and high-performance computing use cases.
  • AI and ML: Knowledgeable in artificial intelligence and machine learning.
  • Operational/System Administration: Experienced in working from ticket queues, Network Operations Centers (NOC), and dashboards.
  • Monitoring Tools: Experienced with Grafana and other monitoring tools.
  • Web Technologies: Intermediate skills in troubleshooting web technologies, including web servers, frameworks, HTTP, and authentication.
  • Cloud Concepts: Skilled in system, API, and infrastructure design using cloud concepts such as storage volumes, private networks, load balancers, and virtual machines.

Location & Travel: This is a hybrid role based in London or fully remote within the UK.

  • Hybrid in London: If you live within a commutable distance of our London offices, we expect you to be there at least twice a week to foster collaboration and connection.
  • Fully Remote: If you live outside a commutable distance, you can work remotely with occasional visits to the office.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

  • Be Curious at your Core
  • Act like an Owner
  • Empower Employees
  • Deliver Best In-Class Client Experience
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities:

JOB DUTIES

  • Deployment and configuration of platform infrastructure in a Linux environment
  • Monitor software and infrastructure for issues and act quickly to stem any negative impact
  • Work with Development, Infrastructure, and Network Operations teams to troubleshoot and resolve deployment-related software, network, installation, and configuration issues
  • Support Development, Infrastructure, and Network Operations teams in resolving infrastructure issues
  • Work with contractors in remote sites to install, configure, and troubleshoot servers, network equipment, and data centre infrastructure
  • Reconfigure or decommission existing infrastructure as needed
  • Identify, maintain and create documentation for new hardware deployments, all varieties of corner case scenarios, and troubleshooting workflows
  • Streamline deployments to increase efficiency and reduce deployment times
  • Support the development, testing, and integration of new hardware into the platform
  • Liaise closely with the Client Support Engineers team to monitor customer support requests and act as an extension of the Client Support team
  • Help maintain high customer satisfaction by acting with empathy, understanding the business impact and priority of customer issues, and following our best practices
  • Promptly act on technical incidents and escalations, communicating effectively with all stakeholders
  • Assist with the training and development of new hires
  • Plan, organize, and manage tasks, resources, and timelines across teams to accomplish work accurately and on time

Location & Travel: This is a hybrid role based in London or fully remote within the UK.

  • Hybrid in London: If you live within a commutable distance of our London offices, we expect you to be there at least twice a week to foster collaboration and connection.
  • Fully Remote: If you live outside a commutable distance, you can work remotely with occasional visits to the office


REQUIREMENT SUMMARY

Min:2.0Max:7.0 year(s)

Information Technology/IT

IT Software - Network Administration / Security

Other

Graduate

Proficient

1

London, United Kingdom