Enterprise Monitoring & Observability Lead at Fathom Management LLC
Harrisburg, Pennsylvania, United States -
Full Time


Start Date

Immediate

Expiry Date

23 May, 26

Salary

59.0

Posted On

22 Feb, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Monitoring, Observability, Automation, PowerShell, Python, SQL, Azure Monitor, Log Analytics, Azure Automation, CI/CD Pipelines, KQL, SCOM, SquaredUp, Dynatrace, Datadog, Splunk

Industry

Staffing and Recruiting

Description
Enterprise Monitoring & Observability Lead (Hybrid Cloud & IT Operations Automation Specialist) Location: Hybrid – Harrisburg, PA Residency Requirement: Must reside within 2 hours of Harrisburg, PA Schedule: Full-Time | 40 hours per week | Minimum 1 day onsite per week Compensation: $102,000 annually or $59/hour Position Summary The Commonwealth of Pennsylvania is seeking an Enterprise Monitoring & Observability Lead to serve as a subject matter expert (SME) for enterprise monitoring platforms, automation, and IT service management processes across hybrid on-premises and cloud environments. This role is responsible for modernizing monitoring and observability practices by transforming person-centric processes into repeatable, automated, and well-documented workflows. The incumbent will collaborate closely with technical specialists, agency teams, and vendors to deliver actionable monitoring, reporting, and continuous operational improvements aligned with ITIL, ServiceNow, and Commonwealth IT policy standards. Key Responsibilities Monitoring, Observability & Automation Drive process and tooling improvements, identifying gaps and implementing automation-first practices to reduce manual effort and improve service quality. Maintain endpoint and infrastructure monitoring connectivity, ensuring reliable telemetry ingestion via agents, SNMP, WMI, APIs, and secure credential and certificate management. Evaluate, design, and implement monitoring solutions for on-premises and cloud-based applications and technology resources. IT Service Management & Operations Document incidents and problems with full observability context in ServiceNow, including metrics, timelines, and root cause analysis. Produce post-incident reviews and maintain a Known Error Database (KEDB). Collaborate with Enterprise Change, Incident, and Problem Management teams to ensure standardized workflows, risk assessments, and communication plans. Monitor service restoration performance, tracking SLA compliance, MTTR, and corrective action effectiveness. Create, track, and validate Requests for Change (RFCs) in ServiceNow, ensuring post-change monitoring health. Documentation, Reporting & Governance Own and maintain runbooks, SOPs, service maps, and automated workflows within a structured, version-controlled knowledge repository. Deliver accurate monthly and quarterly SLA and operational reports, including availability, incident trends, and enhancement metrics. Implement standardized stakeholder communication workflows for incidents, changes, and problems, including distribution list management and self-service subscription options. Ensure alignment with Commonwealth IT policies, recommending updates to improve reliability, security, and cost efficiency. Resilience, Continuity & Compliance Design, test, and maintain Disaster Recovery (DR) plans, including defining and validating RTO/RPO for monitoring and network infrastructure. Participate in Continuity of Government (CoG) activities, including relocation to alternate sites during catastrophic incidents. Operate within ITIL-aligned service management frameworks, contributing to process maturity, audits, and compliance initiatives. Professional Development Maintain technical currency in monitoring, observability, automation, and cloud technologies. Pursue relevant training and certifications to support continuous improvement and operational excellence. Required Qualifications Education & Experience 5+ years of experience in IT infrastructure monitoring, automation, and observability within hybrid environments. Bachelor's degree in Information Technology, Computer Science, or a related field (or equivalent experience). Technical Expertise Strong proficiency in PowerShell and at least one additional scripting language (Python, Bash, or SQL). Hands-on experience with: Azure Monitor & Log Analytics Azure Automation CI/CD pipelines SQL and KQL Experience with enterprise monitoring platforms such as: SCOM, SquaredUp, or equivalent tools (Dynatrace, Datadog, Splunk) Knowledge of API integrations and secure authentication mechanisms. Process & Platforms Working knowledge of ITIL 4 practices (Change, Incident, Problem Management). Experience with ServiceNow or comparable ITSM platforms. Core Competencies Strong troubleshooting and root cause analysis skills. Excellent documentation, communication, and stakeholder coordination abilities. Highly organized, detail-oriented, and capable of working independently in complex environments. Preferred Certifications & Experience Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect Expert ITIL 4 Foundation (or higher) Experience with: Squared Up or equivalent dashboarding and visualization tools Disaster Recovery planning and testing Performance tuning and capacity planning for monitoring platforms Familiarity with: Security best practices for APIs and automation scripts Hybrid cloud architectures and networking fundamentals SEO / ATS Keyword Alignment Enterprise Monitoring, Observability, Azure Monitor, ServiceNow, ITIL, Automation, PowerShell, Hybrid Cloud, Incident Management, Change Management, SCOM, SquaredUp, SLA Reporting, Disaster Recovery, IT Operations, Monitoring Lead, Cloud Monitoring Benefits Overview Full-time employees are offered a comprehensive and competitive benefits package, including: Paid vacation, sick leave, and holidays Medical, dental, and vision health insurance Life insurance coverage Short- and long-term disability insurance 401(k) retirement plan with company match and immediate vesting Military leave Training and professional development opportunities Tuition reimbursement Employee wellness program Commuter benefits And more Equal Employment Opportunity (EEO) Statement Fathom Management, Inc. is committed to providing equal employment opportunities to all employees and applicants. All employment decisions-including recruiting, hiring, training, promotion, compensation, benefits, and termination-are made without regard to race, color, religion, creed, national origin, sex, age, marital status, sexual orientation, gender identity, citizenship status, veteran status, disability, or any other characteristic protected by applicable federal, state, or local law.
Responsibilities
This role involves driving process and tooling improvements for monitoring and observability, focusing on automation-first practices to reduce manual effort and enhance service quality across hybrid environments. The lead will also be responsible for documenting incidents, problems, and changes within ServiceNow, ensuring alignment with ITIL standards and producing operational reports.
Loading...