Monitoring and Capacity Engineer at NTT DATA
Bengaluru, karnataka, India -
Full Time


Start Date

Immediate

Expiry Date

10 Mar, 26

Salary

0.0

Posted On

10 Dec, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Monitoring Tools, Capacity Planning, Performance Engineering, Data Analysis, Scripting, Cloud Platforms, Networking, Distributed Systems, DevOps Principles, CI/CD Pipelines, Infrastructure Management, Incident Response, Root Cause Analysis, Automation, Observability Standards, Stakeholder Management

Industry

IT Services and IT Consulting

Description
Develop, deploy, and maintain enterprise-grade monitoring solutions covering infrastructure, applications, and network components. Develop data-driven capacity planning models, providing forecasts and recommendations aligned with business growth and cost optimization. Continuously tune monitoring and alerting systems to reduce noise and improve detection accuracy. Perform capacity planning and predictive analysis to forecast growth and optimize resource utilization. Define performance benchmarks and thresholds for infrastructure and application services. Conduct root cause analysis for performance issues and outages, proposing corrective and preventive actions. Collaborate with service owners and architects to support scalability strategies in line with business growth. Automate monitoring tasks and capacity workflows where possible for streamlined operations. Generate regular performance and capacity reports for leadership and stakeholders. Ensure observability standards and best practices are applied across deployment environments (on-prem, hybrid, cloud). Support incident response by providing real-time system health insights and technical expertise. Minimum 5 years of relevant experience in monitoring, performance engineering, or capacity management roles. 5+ Years of L3 engineer experience. Bachelor's degree in computer science, Information Technology, Engineering, or related field. Strong proficiency in monitoring and observability tools (e.g., Prometheus, Grafana, Zabbix, Dynatrace, Datadog, New Relic, AppDynamics, Elastic APM). Ability to engage with a variety of internal and external stakeholders Solid interpersonal skills and the ability to build solid working relationships Thorough knowledge of identity administration and expertise to solve complex issues Hands-on experience with infrastructure (Linux/Windows), virtualization, and cloud platforms (AWS / Azure / GCP). Practical understanding of networking, databases, and distributed systems. Analytical skills in performance tuning, log analysis, and trend forecasting. Scripting or automation experience (Python, Shell, PowerShell, or similar). Understanding of DevOps principles, CI/CD pipelines, and Infrastructure-as-Code (Terraform, Ansible, etc.) Familiarity with ITIL processes - event, incident, capacity & change management. Exposure to containerization and orchestration (Docker, Kubernetes). Experience building dashboards and KPIs for system performance visibility. Strong problem-solving mindset with attention to detail. Ability to work under pressure and prioritize multiple tasks. Effective communication and stakeholder management. Ability to negotiate / influence
Responsibilities
Develop and maintain monitoring solutions for infrastructure, applications, and networks while performing capacity planning and predictive analysis. Collaborate with service owners to support scalability strategies and automate monitoring tasks.
Loading...