Compute - Project Manager - E at Ford Global Career Site

, , India -

Full Time

Start Date

Immediate

Expiry Date

22 Jun, 26

Salary

0.0

Posted On

24 Mar, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Virtualization, Capacity Planning, Forecasting, Automation, VMware, OSV, SRE, Role Based Access Control, Observability, Monitoring, Logging, Troubleshooting, Root Cause Analysis, Solution Design, Kubernetes, OpenShift

Industry

Motor Vehicle Manufacturing

Description

This is a Virtualization Server Hosting Engineering position in Enterprise Technology. Virtualization Hosting service enablement Capacity Management * Conduct capacity planning and forecasting for the platforms, including Compute/Virtual Machine (VM), memory, storage, and network resources, to ensure scalability and prevent resource exhaustion * Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization * Collaborate with application teams and stakeholders to understand future demand and project capacity needs * Develop and maintain capacity models and reports to support strategic planning Automation & Efficiency * Develop automation solutions (scripts, playbooks) for repetitive VMware/OSV tasks, including configuration changes, VM management (like snapshot removal), auditing, remediation and integration with ticketing systems * Leverage automation to enable delivering operator updates and changes efficiently at scale * Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency * Role Based Access Control deployment and auditing * Namespace and Resource Quota management (CPU, Disk and Storage) Observability, Monitoring, logging and Troubleshooting * Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the VMware/OSV environment, including integration with tools like Dynatrace and Prometheus/Grafana * Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response * Develop capabilities to flag and report abnormalities and identify "blind spots" in observability * Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment * Find the needle in a haystack/unhealthy bit in the compute universe (Globally) for faster time to resolution * Monitor VM health, resource usage, and performance metrics proactively * Monitor for unusual activity that might indicate a compromise or misconfiguration Solution Design & Consulting * Provide technical consulting and expertise to application teams requiring VMware/OSV solutions * Design, implement, and validate custom or dedicated OSV clusters and VM solutions for critical applications with unique or complex requirements (e.g., specialized appliances) Knowledge Management * Create, maintain, and update comprehensive internal documentation and customer facing content to facilitate self-service and clearly articulate platform capabilities Support * Participate in L1 – L3 level support to Operations teams environmental related issues. Monthly after hours and weekend work will be required

Responsibilities

This role involves conducting capacity planning and forecasting for compute platforms, developing automation solutions for repetitive tasks, and implementing Site Reliability Engineering principles to enhance stability and efficiency. The engineer will also be responsible for implementing comprehensive observability solutions, performing deep dive root cause analysis, and providing technical consulting for virtualization solutions.