GCP Infrastructure engineer at NTT DATA
Chennai, tamil nadu, India -
Full Time


Start Date

Immediate

Expiry Date

22 Jan, 26

Salary

0.0

Posted On

24 Oct, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

GCP, Docker, Kubernetes, Vertex AI, IBM Watsonx, Terraform, CI/CD, DevSecOps, IAM, Python, Bash, MLOps, Data Pipelines, Distributed Training, Open-Source Contributions, Cloud Networking, Security Best Practices

Industry

IT Services and IT Consulting

Description
Cloud Infrastructure & Platform Engineering Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP. Deploy and manage containerized workloads using Docker and Kubernetes (GKE). Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models. Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference. Ensure business continuity through backup, disaster recovery, and multi-region deployments. Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager. Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications. Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime. Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management. Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP). Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms. Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies). Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure. Bachelor's or master's degree in computer science, Software Engineering, or a related field. 5+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering. Experience with GenAI use cases (chatbots, content generation, code assistants, etc.). Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI. Experience with IBM Watsonx for AI application deployment and management. Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale. Proficiency in Python, Bash, or other relevant scripting languages. Strong understanding of cloud networking, IAM, and security best practices. Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager). Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka). Excellent problem-solving, debugging, and communication skills. Experience in MLOps practices for model deployment, monitoring, and retraining. Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem). Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler). Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed). Contributions to open-source projects in infrastructure, MLOps, or GenAI. Experience managing infrastructure in regulated industries. Google Cloud Certified - Professional Cloud Architect Google Cloud Certified - Machine Learning Engineer Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) IBM Certified Watsonx Generative AI Engineer - Associate IBM Certified Solution Architect - Cloud Pak for Data Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
Responsibilities
Design, provision, and maintain scalable infrastructure for GenAI applications on GCP. Implement high-performance GPU/TPU clusters and ensure business continuity through backup and disaster recovery.
Loading...