Devops Engineer - Machine Learning
at CoMind
London, England, United Kingdom -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 06 Feb, 2025 | Not Specified | 07 Nov, 2024 | N/A | Pipelines,Docker,Code,Version Control,Containerization,Integration Testing,Git,Bitbucket,Parallel Processing,Infrastructure | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
At CoMind, we are developing a non-invasive neuromonitoring technology that will result in a new era of clinical brain monitoring. In joining us, you will be helping to create cutting-edge technologies that will improve how we diagnose and treat brain disorders, ultimately improving and saving the lives of patients across the world.
SKILLS & EXPERIENCE:
- Git or Bitbucket for version control, including experience with managing versioned infrastructure-as-code (IaC) repositories
- CI/CD pipelines for automating workflows, including experience with integration testing and containerization pipelines
- Experience managing and orchestrating complex cloud workflows (e.g., ECS Tasks, AWS Batch), with a focus on event-driven and parallel processing
- Infrastructure as Code (IaC) experience (e.g., Terraform, AWS CloudFormation) for creating, maintaining, and scaling cloud infrastructure
- Docker for containerization, including experience with containerizing machine learning workflows and publishing containers to repositories like AWS ECR.
How To Apply:
Incase you would like to apply to this job directly from the source, please click here
Responsibilities:
THE ROLE
CoMind is seeking a skilled DevOps Engineer to join our dynamic Research Data Science team to lead the orchestration of a robust ML training pipeline in AWS. This role is critical to enabling the scalable training and testing of a range of ML models on large volumes of a totally new form of clinical neuromonitoring data.
RESPONSIBILITIES:
- Architect and implement a scalable solution to support the Research Data Science Team in running a large number of assorted machine learning pipelines, including model training, evaluation, and inference
- Create a CI/CD pipeline for building containers from in-house Python packages, running integration tests, and publishing to AWS ECR
- Set up ECS or AWS Batch Tasks to run containers stored in AWS ECR
- Establish a robust configuration management system to store, version, and retrieve configurations associated with multiple machine learning workflows
- Implement robust error handling and monitoring solutions to ensure timely debugging across the pipeline with centralised logging and error reporting
- Implement cost monitoring solutions to track and manage compute costs across different runs, building dashboards to provide insights into resource usage and cost optimization
- Ensure security and data protection are integrated into the pipelines by applying AWS best practices for security protocols and data management
- Monitor and manage the team’s compute resources, including both cloud (AWS) and on-premise GPU nodes, ensuring efficient use and scalability
- Implement Infrastructure as Code (IaC) to set up and manage the pipeline architecture, using Terraform, AWS CloudFormation, or similar tools.
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Other
Software Engineering
Graduate
Proficient
1
London, United Kingdom