Systems Reliability Engineer - EMEA

at  IQGeo

Cambridge CB2 1GE, , United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate25 Dec, 2024Not Specified29 Sep, 2024N/AAws,Reliability,Integration,Mttr,Postgresql,Kubernetes,Customer Experience,Graphs,Github,Python,Scalability,Mitigation Strategies,Docker,Platforms,Pipelines,Bash,Code,Implementation ServicesNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

JOB SUMMARY

The Systems Reliability Engineering (SRE) department serves to support IQGEO’s hosted customers in AWS, provide consultative services alongside Implementation Services for external deployments in all three cloud providers and some on-prem scenarios. SRE provides a CI/CD pipeline to deliver solutions from Engineering and Implementation Services to customers. The SRE team works closely with the Engineering, Support, Implementation Services, and Pre-Sales departments to ensure that they are successful with cloud infrastructure and Kubernetes deployments.

JOB DESCRIPTION

We are looking for a Systems Reliability Engineer Professional, who will be working closely with Engineering, Implementation Services, Support, and the rest of the Operations Team to enable rapid delivery of capabilities through a secure Continuous Integration/Continuous Delivery (CI/CD) Pipeline to production systems. You will be responsible for building and maintaining the entire customer pipeline and infrastructure needs for the hosted offerings of IQGEO. You will be a key member of the team, where the rubber meets the road, doing work that matters for our customers, and utilizing the latest technologies in Cloud Computing, GIS, containers, and Kubernetes. Providing backend support and ensuring that support and Implementation Services are setup for success from an infrastructure and pipeline perspective. Be available for an on-call rotation.

REQUIRED SKILLS & ABILITIES

  • Expert with GitHub or similar source repository and CI/CD collaboration platform
  • Expert with platforms such as Kubernetes, EKS, or another container orchestration platform
  • Expert with scripting language such as Python or bash; sed/awk is preferred
  • Experience with Docker or similar container technology
  • Experience with PostgreSQL
  • Understanding of a well architected Cloud Infrastructure deployment in AWS, Including:
  • Compute

EDUCATION AND EXPERIENCE

  • Bachelor’s degree from a four-year college or university, and engineering Degree is preferred
  • 5+ years of telecom industry, wireless industry, or customer management (or relevant) experience
  • An equivalent combination of education and experience will be considered

Actively and consistently supports all efforts to simplify and enhance the customer experience.

  • Build resilient, self-healing systems that could scale seamlessly (high-performance) and improve system reliability (always available)
  • Monitor system health using various charts, graphs and logs, detect and trace problems and react to issues at scale
  • Write post-mortems, participate in forensic root cause analysis to implement corrective measures preventing issue(s) from reoccurring
  • Create, modify, evolve and document risk-mitigation strategies to eliminate potential risks that could impact performance, scalability and reliability of systems and services,
  • Create, modify, evolve, repair and/or maintain scripts to secure CI/CD pipelines across Multiple Domains
  • Create, modify or evolve current processes for source control, build, integration, automated test, security scanning, and delivery of applications
  • Will be required to interact with Product House and Implementation Services to deliver the end-to-end pipelines of software delivery
  • Create automated development and operations scripts/processes to ensure reliability, scalability, repeatability of pipelines without error, bugs or with very minimal customer impact
  • Leverage Infrastructure as Code and Configuration as Code to automate deployments
  • Be held to MTTR or other SLA

Responsibilities:

Actively and consistently supports all efforts to simplify and enhance the customer experience.

  • Build resilient, self-healing systems that could scale seamlessly (high-performance) and improve system reliability (always available)
  • Monitor system health using various charts, graphs and logs, detect and trace problems and react to issues at scale
  • Write post-mortems, participate in forensic root cause analysis to implement corrective measures preventing issue(s) from reoccurring
  • Create, modify, evolve and document risk-mitigation strategies to eliminate potential risks that could impact performance, scalability and reliability of systems and services,
  • Create, modify, evolve, repair and/or maintain scripts to secure CI/CD pipelines across Multiple Domains
  • Create, modify or evolve current processes for source control, build, integration, automated test, security scanning, and delivery of applications
  • Will be required to interact with Product House and Implementation Services to deliver the end-to-end pipelines of software delivery
  • Create automated development and operations scripts/processes to ensure reliability, scalability, repeatability of pipelines without error, bugs or with very minimal customer impact
  • Leverage Infrastructure as Code and Configuration as Code to automate deployments
  • Be held to MTTR or other SLAs


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Information Technology

Graduate

Engineering

Proficient

1

Cambridge CB2 1GE, United Kingdom