South Star- Observability Engineer at MetTel
Mumbai, maharashtra, India -
Full Time


Start Date

Immediate

Expiry Date

05 Mar, 26

Salary

0.0

Posted On

05 Dec, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Grafana, Prometheus, AppDynamics, Splunk Observability, Python, Bash, Kubernetes, Docker, Ansible, Terraform, Monitoring, Alerting, Incident Response, Data Visualization, Cloud, Distributed Systems

Industry

Telecommunications

Description
MetTel is a global communications solutions provider with the most complete suite of fully managed services that focus on secure connectivity, and network and mobility services. We simplify communications and networking for business and government agencies. Our customers include many of the Fortune 500, and Gartner recognizes us as an industry leader. We have the broadest portfolio of technology and integrated partnerships, as well as our private network, which we use to create tailored solutions design, deployment, and ongoing management, driving cost savings, efficiency, innovation, and the ability to focus on core objectives. We believe that each team member is a key to the success and sustainability of the group. In order to achieve this, we offer an environment where all professionals can grow and develop their skills and competencies, collaborate with diverse professionals, share knowledge and enjoy a rewarding career. Observability Engineer Department: Corporate IT Reports to: Director South Star Software Private Limited is committed to fostering innovation, collaboration, and technical excellence in every aspect of its operations. As an Observability Engineer at South Star, you will join a dynamic team that values continuous learning and encourages initiative in driving transformative solutions across our platforms. Position Summary We are looking for an Observability Engineer to design, implement, and manage our enterprise-level monitoring and observability infrastructure. The successful candidate will be responsible for architecting robust observability solutions utilizing industry-leading platforms such as Grafana, Prometheus, AppDynamics, and Splunk Observability. This position supports engineering teams by providing advanced dashboards, effective alerting mechanisms, and comprehensive data correlation that deliver critical insights into system performance, reliability, and behavior. Key Responsibilities Architecture & Design Design and implement scalable observability architectures that support monitoring across cloud, on-premises, and hybrid environments Establish observability standards, patterns, and best practices across the organization Evaluate and integrate new monitoring technologies and tools to enhance visibility capabilities Design data retention, aggregation, and storage strategies for metrics, logs, and traces Platform Management Deploy, configure, and maintain enterprise monitoring platforms including Grafana, Prometheus, AppDynamics, and Splunk Observability Ensure high availability, performance, and scalability of observability infrastructure Manage platform upgrades, patches, and capacity planning Integrate observability tools with existing CI/CD pipelines and infrastructure automation Dashboard & Visualization Development Create and maintain comprehensive dashboards that provide actionable insights for application and infrastructure teams Build executive-level reporting dashboards for system health and performance metrics Develop custom visualizations tailored to specific business and technical requirements Implement role-based access and dashboard governance Alerting & Incident Response Design intelligent alerting strategies that minimize noise and prioritize critical issues Configure multi-channel alert routing and escalation policies Establish SLI/SLO/SLA frameworks and implement corresponding monitoring Collaborate with incident response teams to improve detection and diagnosis capabilities Conduct post-incident reviews to enhance monitoring coverage and alert accuracy Collaboration & Enablement Partner with development, operations, and security teams to instrument applications and infrastructure Provide guidance on observability best practices, including logging standards, metrics collection, and distributed tracing Conduct training sessions and create documentation for observability tools and practices Act as subject matter expert for monitoring-related questions and troubleshooting Required Qualifications 3-5+ years of experience with enterprise monitoring and observability platforms Hands-on expertise with Grafana, Prometheus, AppDynamics, and Splunk Observability (or similar tools) Strong understanding of monitoring fundamentals: metrics, logs, traces, and events Experience with containerized environments (Kubernetes, Docker) Proficiency in scripting languages (Python, Bash, PowerShell) for automation Knowledge of application performance monitoring (APM) concepts and practices Experience with configuration management tools (Ansible, Terraform) for infrastructure as code Understanding of networking, system administration, and distributed systems architecture Preferred Qualifications Experience with OpenTelemetry and distributed tracing implementations Familiarity with PromQL, SPL (Splunk Processing Language), and other query languages Knowledge of time-series databases (InfluxDB, TimescaleDB, Prometheus TSDB) Experience implementing SRE practices and establishing SLI/SLO frameworks Background in software development or DevOps engineering Certifications in relevant monitoring platforms or cloud technologies Experience in regulated industries with compliance monitoring requirements Technical Skills Monitoring Platforms: Grafana, Prometheus, AppDynamics, Splunk Observability Scripting/Programming: Python, Bash, Go, PowerShell Container Orchestration: Kubernetes, Docker, container monitoring best practices Configuration Management: Ansible, GitOps workflows Data Formats: JSON, YAML, Prometheus exposition format Version Control: Git, GitLab/GitHub Personal Attributes Strong analytical and problem-solving abilities Excellent communication skills with ability to explain complex technical concepts Self-motivated with ability to work independently and prioritize effectively Detail-oriented with commitment to documentation and knowledge sharing Collaborative mindset with focus on enabling team success Shift Schedule The standard in-office schedule is Monday to Friday, from 11:00 am to 8:00 pm IST. Remote work is permitted during maintenance windows. MetTel is an Equal Opportunity Employer and considers applicants for all positions without regard to race, color, religion or belief, sex, age, national origin, citizenship status, marital status, military/veteran status, genetic information, sexual orientation, gender identity, physical or mental disability or any other characteristic protected by applicable laws. To learn more about our company visit us at www.mettel.net
Responsibilities
The Observability Engineer will design, implement, and manage enterprise-level monitoring and observability infrastructure. This includes architecting robust solutions and providing advanced dashboards and alerting mechanisms to support engineering teams.
Loading...