Vice President, Production Services Infrastructure Support at BNY

Pune, maharashtra, India -

Full Time

Start Date

Immediate

Expiry Date

09 Jun, 26

Salary

0.0

Posted On

11 Mar, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Azure, AWS, GCP, Azure Kubernetes Service (AKS), AI/ML Platform Operations, Cloud Networking, Identity and Access Management (IAM), Monitoring and Logging, Incident Management, Root Cause Analysis (RCA), Python, Azure OpenAI, Splunk, Kubernetes, SRE, Automation

Industry

Financial Services

Description

Vice President, Production Services Infrastructure Support At BNY, our culture allows us to run our company better and enables employees’ growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world’s investible assets. Every day, our teams harness cutting-edge AI and breakthrough technologies to collaborate with clients, driving transformative solutions that redefine industries and uplift communities worldwide. Recognized as a top destination for innovators and champions of inclusion, BNY is where bold ideas meet advanced technology and exceptional talent. Together, we power the future of finance – and this is what #LifeAtBNY is all about. Join us and be part of something extraordinary. We’re seeking a future team member for the role of Vice President, Production Services Infrastructure Support to join our TSG PRODUCTION SERVICES team. This role is located in Pune In this role, you’ll make an impact in the following ways: * Lead support, maintenance, troubleshooting, and optimization of multi cloud platforms—primarily Azure, AWS, with exposure to GCP. * Deliver advanced support for Azure Kubernetes Service (AKS), VM workloads, networking, identity failures, access issues, service degradation, and platform level incidents. * Support AI/ML platform operations including Azure OpenAI, Azure Machine Learning, Azure Cognitive Services, GCP Vertex AI, and AI model deployment workflows (monitoring, debugging, access, quota management). * Troubleshoot cloud networking issues: routing, VNet/VPC connectivity, NSG/Firewall blocks, DNS failures, cross region communication. * Manage cloud identity & access problems including Conditional Access failures, IAM misconfigurations, role/permission issues, token/secret issues, and service principal expirations. * Support deployment and operational stability of monitoring/logging systems: Azure Monitor, GCP Cloud Logging, Splunk, EventHub integrations. * Ensure SLA adherence through robust incident management, RCA documentation, problem management, and preventive action planning. * Collaborate with platform engineering teams on deep technical issues involving Kubernetes, container runtimes, ingress/routing, storage, and multi cloud managed services. * Build and maintain runbooks, SOPs, support playbooks, alerting frameworks, and automated remediation workflows. * Onboard new applications into cloud support (network, IAM, monitoring, secrets, service connections, dependencies). * Partner with Cybersecurity, IAM, AI Governance, Data Platform, and DevOps teams to meet compliance and security requirements. * Oversee cost alerting, anomaly detection, quota management, and optimization across Azure/AWS/GCP. * Act as a senior SME ensuring reliability of enterprise cloud, AI platforms, and production environments. * Contribute to a culture of trust, knowledge sharing, ownership, and continuous improvement. To be successful in this role, we’re seeking the following: * Bachelor's degree or equivalent * 10 plus years of experience in Cloud Support, SRE, or Technical Operations roles within enterprise environments. * Hands-on expertise across Azure (primary), AWS, and GCP—compute, networking, IAM, storage, containers, PaaS. * Strong experience troubleshooting AKS, container workloads, node pools, ingress, networking, and related components. * Proficiency in Python for diagnostics, automation, tooling, and API integrations. * Experience supporting AI/ML platforms: * Azure OpenAI (quota/throttling/key issues) * Azure AI Search * Azure ML model deployment/monitoring * GCP Vertex AI (endpoint failures, training issues, IAM) * Experience resolving multi cloud IAM issues (AAD, AWS IAM, GCP IAM). * Hands-on experience with observability platforms (Azure Monitor, Splunk, GCP Cloud Logging, Prometheus/Grafana). * Strong knowledge of incident response, RCA documentation, and enterprise support processes. * Ability to independently diagnose and resolve complex cloud issues. * Excellent communication and analytical skills; ability to explain technical issues to non technical stakeholders. * Comfort working in high pressure, global support environments. * Relevant certifications preferred: AZ 104, AZ 305, AWS SysOps, GCP Associate Engineer. At BNY, our culture speaks for itself, check out the latest BNY news at: BNY Newsroom [https://www.bny.com/corporate/global/en/about-us/newsroom.html] BNY LinkedIn [https://www.linkedin.com/company/bnyglobal/posts/?feedView=all] Here’s a few of our recent awards: * America’s Most Innovative Companies, Fortune, 2025 * World’s Most Admired Companies, Fortune 2025 * “Most Just Companies”, Just Capital and CNBC, 2025 Our Benefits and Rewards: BNY offers highly competitive compensation, benefits, and wellbeing programs rooted in a strong culture of excellence and our pay-for-performance philosophy. We provide access to flexible global resources and tools for your life’s journey. Focus on your health, foster your personal resilience, and reach your financial goals as a valued member of our team, along with generous paid leaves, including paid volunteer time, that can support you and your family through moments that matter. BNY is an Equal Employment Opportunity/Affirmative Action Employer - Underrepresented racial and ethnic groups/Females/Individuals with Disabilities/Protected Veterans.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

This role involves leading the support, maintenance, troubleshooting, and optimization of multi-cloud platforms, primarily Azure and AWS, with a focus on advanced support for AKS, VM workloads, and AI/ML platform operations like Azure OpenAI and Vertex AI. The individual will ensure SLA adherence through robust incident management, build support documentation, and partner with various technical teams to meet security and compliance requirements.