Overview:
We are seeking a highly skilled Senior Manager – AIOps & MLOps to lead and oversee the automation, scalability, and reliability of AI/ML operations across the enterprise.
Responsibilities:
This role requires deep expertise in AI-driven observability, machine learning pipeline automation, cloud-based AI/ML platforms, and operational excellence. The ideal candidate will drive AI/ML model deployment, continuous monitoring, and self-healing automation to optimize system performance, minimize downtime, and enhance decision-making with real-time AI-driven insights.
- Lead and sustain large-scale AIOps, MLOps programs, ensuring alignment with business objectives, data governance standards, and enterprise data strategy.
- Oversee the implementation of real-time data observability, monitoring, and automation frameworks to enhance data reliability, quality, and operational efficiency.
- Develop program governance models and execution roadmaps to drive efficiency across data platforms, including Azure, AWS, GCP, and on-prem environments.
- Ensure seamless integration of CI/CD, data pipeline automation, and self-healing capabilities across the enterprise. Partner in building the next generation D&A platform(s), and leading a high-performing data operations team.
- Lead and manage the full people, process and technology driven Data & Analytics platform technology strategy and cultural shift for PepsiCo IT to a world class data first organization working across all Sector S&T.
- Champion of PepsiCo’s Data & Analytics program and platform management supporting large scale global data engineering efforts partnering across S&T organization
- Support Data & Analytics Technology Transformations to provide full sustainment capabilities across the PepsiCo Data Estate, including data platform management automation of proactive issue identification and self-healing abilities.
AIOps & Observability Automation:
- Design and implement AIOps strategies for automating IT operations using Azure Monitor, Azure Log Analytics, Azure Sentinel, and AI-driven alerting.
- Deploy Azure-based observability solutions (Azure Monitor, Application Insights, Azure Synapse for log analytics, and Azure Data Explorer) to enhance real-time system performance monitoring.
- Enable AI-driven anomaly detection and root cause analysis (RCA) using Azure Machine Learning (Azure ML) and AI-powered log analytics.
- Develop self-healing and auto-remediation mechanisms using Azure Logic Apps, Azure Functions, and Power Automate to proactively resolve system issues.
MLOps & Machine Learning Pipeline Management:
- Lead end-to-end ML lifecycle automation using Azure ML, Azure DevOps, and Azure Pipelines for ML (CI/CD).
- Deploy scalable ML models with Azure Kubernetes Service (AKS), Azure Machine Learning Compute, and Azure Container Instances.
- Automate feature engineering, model versioning, hyperparameter tuning, and drift detection using Azure ML Pipelines and MLflow.
- Optimize ML workflows with Azure Data Factory, Azure Databricks, and Azure Synapse Analytics for data preparation and ETL/ELT automation.
- Implement monitoring and explainability for ML models using Azure Responsible AI Dashboard, Fairlearn, and InterpretML.
Operational Excellence & Cross-Team Collaboration:
- Partner with Data Science, DevOps, CloudOps, and SRE teams to align AIOps/MLOps strategies with enterprise IT goals.
- Collaborate with business stakeholders and IT leadership to implement AI-driven insights and automation for improving operational decision-making.
- Define and track AI/ML operational KPIs, including model accuracy, latency, infrastructure efficiency, and predictive maintenance metric.
Risk, Compliance & AI Governance:
- Implement AI ethics, bias mitigation, and responsible AI practices for model governance in Azure Responsible AI Toolkits.
- Ensure compliance with Azure Information Protection (AIP), Role-Based Access Control (RBAC), and data security policies.
- Develop robust risk management strategies for AI-driven operational automation in Azure environments.
- Present program updates, risk assessments, and AIOps, MLOps maturity progress to senior executives and key stakeholders.
- Work collaboratively with wider PepsiCo colleagues to ensure your customer is delighted with their Azure cloud experience.
- Attract and build a diverse, high-performing team with capabilities needed to achieve current and future business objectives.
- Remove barriers to agility and enable the team to shift priorities quickly without losing productivity.
- Develop the appropriate organizational structure, resource plans and culture to support the business objectives and customer deliverables.
- Leverage your technical and operations expertise in cloud and high-performance computing to establish a solid understanding of the business, customers need, and ability to earn trust in relationships.