Site Reliability Engineer at Kaztronix
Sunnyvale, CA 94089, USA -
Full Time


Start Date

Immediate

Expiry Date

09 Oct, 25

Salary

0.0

Posted On

10 Jul, 25

Experience

8 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Stig, Reliability Engineering, Security+, Complex Systems, Machine Learning, Computer Science, Puppet, Devops, Rmf, Aws, Bash, Computer Engineering, System Administration, Operations, Azure, Solarwinds, Software Development, Nispom, Secure Coding, Kubernetes, Python

Industry

Information Technology/IT

Description

A Global Government Contracting Company is seeking a Site Reliability Engineer to join thier team in Sunnyvale, CA!

As a Site Reliability Engineer, you will:

  • Design, implement, and maintain highly available and scalable systems and infrastructure to support classified applications and services
  • Develop and implement reliability-focused engineering practices, such as continuous integration, continuous deployment, and continuous monitoring, while ensuring compliance with classified system requirements
  • Collaborate with development teams to ensure that reliability and scalability are considered throughout the software development lifecycle, while maintaining the security and integrity of the classified system
  • Identify and mitigate potential sources of downtime and performance degradation, including infrastructure, application, and network issues, while ensuring that all troubleshooting and debugging activities are conducted in accordance with classified system procedures
  • Develop and maintain technical documentation, including system diagrams, architecture documents, and runbooks, while ensuring that all documentation is properly marked and handled in accordance with classified system requirements
  • Lead and participate in incident response and post-incident reviews to identify root causes and implement corrective actions, while ensuring that all incident response activities are conducted in accordance with classified system procedures
  • Collaborate with other teams, including development, operations, and security, to ensure that reliability and scalability are considered in all aspects of system design and operation, while maintaining the security and integrity of the classified system
  • Develop and maintain metrics and monitoring systems to measure system reliability and performance, while ensuring that all monitoring activities are conducted in accordance with classified system requirements
  • Stay up-to-date with industry trends and emerging technologies, and apply this knowledge to continuously improve system reliability and scalability, while maintaining the security and integrity of the classified system

BASIC QUALIFICATIONS

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • Minimum 8 years of experience in site reliability engineering, DevOps, or a related field, with a focus on classified systems
  • Must possess or be able to obtain within 6 months of start date a valid IAT Level II or III DoD Approved 8140 (DoD 8570) certification such as Security+, in good standing
  • Ability to obtain & maintain a Top Secret security clearance, US Citizenship required
  • Experienced with production use of vSphere/ESXi/vCenter, RHEL
  • Advance proficiency using of Python, BASH, Ansible, puppet, and chef for system administration
  • Demonstrable proficiency with MRTG/PRTG, Nagios, SolarWinds or similar
  • Proven ability with Cloud and Container technologies: Kubernetes, Docker/Mirantis, AWS, and/or Azure
  • Strong technical background in systems administration, networking, and software development, with a focus on classified systems
  • Excellent problem-solving skills, with the ability to analyze complex systems and identify root causes of issues, while maintaining the security and integrity of the classified system
  • Networking fundamentals, including TCP/IP, DNS, and routing protocols

DESIRED SKILLS

  • System integration experience of large-scale distributed infrastructure systems
  • Masters degree in Computer Engineering or related field
  • Data center operations/system administrator experience, preferably in a DoD environment (RMF, STIG, or NISPOM)
  • Certification in site reliability engineering, DevOps, or a related field, with a focus on classified systems
  • Experience with machine learning and artificial intelligence technologies, with a focus on classified systems
  • Strong knowledge of security principles and practices, including secure coding, secure deployment, and secure operations, with a focus on classified systems
  • Strong understanding of networking fundamentals, including TCP/IP, DNS, and routing protocols, with a focus on classified systems
  • Ability to support on-call 24X7 and off-shift for mission critical events/operation that may require extended hours or weekend supports
  • Comfortable working in a fast paced and dynamic multi-disciplinary environment
  • Active Secret security clearance
Responsibilities

Please refer the Job description for details

Loading...