Team Lead/Production Monitoring/Reliability Engineer at Indev
Ashburn, VA 20147, USA -
Full Time


Start Date

Immediate

Expiry Date

15 Sep, 25

Salary

115000.0

Posted On

15 Jun, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Middleware, Javascript, Puppet, Ksh, Zsh, Powershell, Information Technology, Nexus, Gitlab, Bash, Telecommunications, Programming Languages, Csh, Ruby, Github, Computer Science, Splunk, Artifactory, Operating Systems, Version Control, Nginx, New Relic, Cloud

Industry

Information Technology/IT

Description

POSITION DESCRIPTION:

Indev is seeking a skilled Team Lead and Reliability Engineer to support our client’s mission by enhancing Production Monitoring and ensuring optimal service delivery for their applications. This role involves proactive issue identification, incident resolution, and system health optimization within a 24x7x365 operational environment. The ideal candidate will lead monitoring solutions, manage ITIL engineers, automate processes, and collaborate across IT and business teams to improve service reliability. Expertise in AWS environments, root cause analysis, and technical troubleshooting is essential, along with strong communication and leadership skills to drive continuous improvement.
This is a direct-hire, full time position with salary and benefits. Indev provides a comprehensive benefits package, including Medical, Dental, Vision, 401k with match, Flexible Spending Account, and Paid Time Off (PTO)—including vacation and holiday pay.

REQUIRED QUALIFICATIONS:

  • Bachelor’s degree in Computer Science, Information Systems, Engineering, Business or other related discipline with a minimum of 10 years of experience in information technology. Additional years of experience can be substituted.
  • Practical knowledge and hands-on experience with Agile development and DevSecOps practices
  • Background in systems engineering with expertise in one or more areas such as telecommunications, programming languages, operating systems, middleware, or database technologies
  • Proficiency with development and operations tools including:
  • Version control and artifact management systems like GitHub, GitLab, Bitbucket, Artifactory, or Nexus
  • Cloud and infrastructure monitoring tools, particularly AWS CloudWatch
  • Centralized logging and analytics platforms such as Splunk
  • Infrastructure as Code (IaC) and configuration management tools like Terraform or Puppet
  • Preferred Qualifications:
  • Familiarity with observability platforms such as New Relic or other AI-driven operations tools
  • Proficient in one or more programming languages such as JavaScript, Ruby, or Go
  • Experience working with modern application delivery technologies including Nginx, HAProxy, Docker, Kubernetes, or equivalents
  • Understanding of messaging platforms, collaborative tools, app-level firewalls, proxy servers, and common operating systems
  • Comfortable working in Linux and Windows environments, with scripting experience in Bash, CSH, KSH, ZSH, or PowerShell
  • Experience utilizing monitoring and alerting frameworks such as Prometheus, Grafana, or Datadog

ABOUT US:

At Indev, we are redefining Intelligent Development—delivering forward-thinking, mission-driven technology solutions that empower federal agencies to modernize, automate, and innovate with confidence. We go beyond the status quo, thinking creatively and providing impactful, non-traditional solutions that drive federal technology transformation. Our team harnesses the power of AI-driven automation, mission analytics, and cloud-native technologies to create agile, secure, and efficient enterprises that are built for the future. Let’s innovate. www.indev.com.
Job Type: Full-time
Pay: From $115,000.00 per year

Benefits:

  • 401(k)
  • 401(k) matching
  • Dental insurance
  • Health insurance
  • Health savings account
  • Life insurance
  • Paid time off
  • Referral program
  • Vision insurance

Schedule:

  • Monday to Friday

Work Location: Hybrid remote in Ashburn, VA 2014

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
  • Serve as Team Lead over staff of reliability engineers
  • Schedule and ensure Emergency Operations Center is always staffed with Reliability Engineers
  • Participate in outage calls when possible
  • Ensure SLAs for escalations and notifications are met
  • Coach and guide less experienced reliability engineers
  • In close coordination with PM, ensure all project deliverables are met
  • Provide regular improvement or refresher training sessions to reliability engineers
  • Aid Project Manager in gathering monitoring metrics and other material for presentations
  • Serve as liaison between Federal Staff and Contractors
  • Present Production Monitoring related material to Sr. Leadership
  • Aid PM in interviewing new candidates
  • Serve as Reliability Engineer
  • Triage and escalate events in accordance with Standard Operating Procedures (SOPs)
  • Asses initial severity, gather impacts, create tickets, engage support teams, and escalate issues properly.
  • Effectively document incidents describing the issue, business impact, root cause and resolution
  • Monitor various applications to proactively identify system disruptions and preempt enterprise outages
  • Notify internal and external departments of performance issues and trends
  • Support maintenance and scheduled outages
  • Review and update tickets with current status information
  • Understand applications and their interdependencies
  • Monitor and support scheduled change activity in the production environment and escalate unexpected issues
  • Provide application verification support to support teams upon completion of scheduled changes in the production environment
  • Aid in development of Root Cause Analyst (RCA) following an event
  • Provide shift reports detailing the health of the environment and any pending changes which may potentially impact applications
  • Provide documentation and presentation support as needed
  • Identify areas where improvements in processes or documentation will increase the team’s overall proficiency
  • Communicate clearly and effectively across IT, with business process owners, and customers at all levels of the organization.
  • Communicate overall status and health of the application to business and application support teams.
  • Participate in the creation and maintenance of technical and knowledge base documentation.
Loading...