Data & ML Infrastructure Engineer at Dolby Laboratories, Inc.
Wrocław, Lower Silesian Voivodeship, Poland -
Full Time


Start Date

Immediate

Expiry Date

10 Apr, 26

Salary

0.0

Posted On

10 Jan, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Machine Learning Operations, AWS, Infrastructure-as-Code, Python, Terraform, Cloud Formation, Ansible, Git, Kubernetes, Data Management, High-Performance Computing, Storage Management, Network Troubleshooting, Model Training, Continuous Integration, Continuous Release

Industry

Computers and Electronics Manufacturing

Description
Troubleshooting high-performance computing, storage and networks for machine-learning workloads. Collaborate with research, development and engineering to establish machine-learning and data management workflows and supporting tools and processes that maximize machine-learning activities and use of resources. Improve capabilities of data set exploration, transformation and overall data management of large to very large datasets. Collaborate with research and development to proactively iterate and fine-tune model training for best performance and efficient use of machine-learning resources. Collaborate with infrastructure teams physical compute, storage and network infrastructure experts to improve on-premise and cloud infrastructure. Improve use of cloud compute and storage for global research teams and manage within budget. BS or MS degree in Computer Science or equivalent experience. 4+ years of professional practical hands-on experience in machine learning operations or equivalent. Comprehensive knowledge of AWS and infrastructure-as-code techniques. Advanced proficiency with Python, Terraform, Cloud Formation, Ansible, git and related. Experience with machine learning and scaling workloads with both cloud and on-premise GPU server environments. Experience with managing and coordinating storage of large machine learning data sets. Proficiency in Kubernetes cluster design, deployment and management. Interest and understanding of industry trends in machine learning development techniques and tools and processes. Comprehensive knowledge of continuous integration and continuous release processes and tools
Responsibilities
Troubleshoot high-performance computing, storage, and networks for machine-learning workloads while collaborating with various teams to establish workflows and tools. Improve data management capabilities and optimize model training for efficient use of resources.
Loading...