MLOps Engineer at Accellor

Hyderabad, Telangana, India -

Full Time

Start Date

Immediate

Expiry Date

05 Jul, 26

Salary

0.0

Posted On

06 Apr, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

MLOps, Python, Kubernetes, Docker, CI/CD, AWS, Azure, GCP, Kubeflow, MLflow, Terraform, Prometheus, Grafana, Model serving, Infrastructure as Code, Data pipelines

Industry

IT Services and IT Consulting

Description

We are seeking a Senior MLOps Engineer to design, build, and maintain the infrastructure and pipelines that operationalize AI and Machine Learning systems at scale. This role bridges the gap between model development and production deployment—ensuring ML and GenAI workloads are reliable, observable, cost-efficient, and continuously improving across enterprise environments. Key Responsibilities Design and implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, evaluation, and deployment. Build and manage CI/CD pipelines for ML models, including automated testing, validation, and rollback mechanisms. Architect and maintain model serving infrastructure for real-time and batch inference workloads, including LLM and agentic AI deployments. Implement model monitoring, drift detection, and alerting systems to ensure production model health and reliability. Manage experiment tracking, model versioning, and artifact registries to enable reproducibility and governance. Optimize compute costs and inference latency across GPU/CPU workloads on cloud platforms (AWS, Azure, or GCP). Containerize and orchestrate ML workloads using Docker and Kubernetes. Automate data pipeline workflows and feature store management for training and inference. Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the path from prototype to production. Establish and enforce MLOps best practices, standards, and documentation across the engineering organization. Bachelor’s degree in Computer Science, Engineering, or a related field. 5+ years of experience in DevOps, Platform Engineering, or MLOps roles with 1–2+ years focused on ML/AI infrastructure. Strong programming skills in Python; experience with Bash, Go, or Java is a plus. Hands-on experience with ML pipeline orchestration tools such as Kubeflow, MLflow, Airflow, or Vertex AI Pipelines. Proficiency with containerization (Docker) and orchestration (Kubernetes, Helm). Experience with cloud-native ML services on AWS (SageMaker), Azure (Azure ML), or GCP (Vertex AI). Familiarity with model serving frameworks such as TorchServe, Triton Inference Server, vLLM, or TGI. Knowledge of Infrastructure as Code (Terraform, Pulumi, or CloudFormation). Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or equivalent). Strong understanding of software engineering fundamentals, version control (Git), and CI/CD practices. Nice to Have: Experience deploying and serving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems in production. Familiarity with vector databases (Pinecone, Weaviate, Qdrant, or pgvector). Exposure to AI observability platforms (LangSmith, Weights & Biases, Arize, or WhyLabs). Experience with feature stores (Feast, Tecton, or equivalent). Familiarity with GPU cluster management and distributed training infrastructure. Experience with enterprise SaaS platforms and multi-tenant ML infrastructure.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

Design and maintain end-to-end ML pipelines and infrastructure to operationalize AI systems at scale. Manage CI/CD workflows, model serving, and monitoring to ensure production reliability and cost-efficiency.