Senior DevOps Engineer with AI at EPAM Systems Inc
zdalnie, województwo śląskie, Poland -
Full Time


Start Date

Immediate

Expiry Date

02 Aug, 25

Salary

0.0

Posted On

02 May, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Logging, Security, Docker, Python, Devops, Communication Skills, English, Kubernetes

Industry

Information Technology/IT

Description

We are seeking a highly skilled and experienced Senior DevOps Engineer with expertise in AI to manage foundational infrastructure for Generative AI applications, with a focus on GCP, GKE, Vertex AI, and Python-based frameworks.
Work with a major company that aggregates broad and deep property data and generates analytics that inform smart decision-making and property-level insights that hundreds of thousands of users in the real estate industry rely on.

REQUIREMENTS

  • 3+ years of experience in DevOps, cloud infrastructure, or AI platform operations, specifically with GCP
  • Expertise in deploying and managing Google Kubernetes Engine (GKE) and Vertex AI solutions for production environments
  • Proficiency in Python and experience orchestrating AI tools/frameworks like LiteLLM, Dify.AI, CrewAI, and Guardrails AI
  • Proven experience with cloud-native deployments and modern container orchestration tools
  • Proficiency with GenAI or Agentic AI concepts, tools, and applications
  • Working knowledge of AI governance and security best practices
  • Competency in containerization technologies such as Docker and Kubernetes
  • Strong knowledge of cloud platforms (GCP), particularly with services like Vertex AI, GKE, and BigQuery
  • Familiarity with monitoring and logging systems such as Prometheus, Grafana, or Stackdriver
  • B2 level of English or higher, with an emphasis on technical communication skills
Responsibilities
  • Design, deploy, and manage scalable, secure, and optimized cloud infrastructure on GCP, including GKE clusters and Vertex AI workflows, to support GenAI applications
  • Integrate and support Python-based AI tools and frameworks like LiteLLM, Dify.AI, CrewAI, and Guardrails AI to ensure seamless operation within the Agentic AI platform
  • Build and maintain automated CI/CD pipelines and implement infrastructure provisioning to improve platform scalability and efficiency
  • Implement monitoring, logging, and alerting solutions to ensure the performance, reliability, and availability of AI services, while resolving operational issues effectively
  • Ensure adherence to security best practices, governance standards, and data protection measures to maintain a secure and compliant AI infrastructure
Loading...