Site Reliability Engineer (SRE)

at  Altitude Angel

Reading, England, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate09 Aug, 2024Not Specified09 May, 2024N/ADashboards,Reliability Engineering,Aws,Regulations,Agile,Reliability,Microsoft Sql Server,Bash,Postgresql,High Availability,Iaas,Replication,Scripting,Paas,Code,Infrastructure,Windows Server,Python,Programming Languages,Powershell,Storage,AzureNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

KEY REQUIREMENTS

Essential:

  • Proficiency in Docker/Kubernetes deployment, scaling, and managing containerized applications
  • Expertise in managing and optimizing monitoring stacks like Grafana, Prometheus or Azure Monitor
  • Strong skills in creating dashboards using PromQL or KQL
  • Experience with CI/CD/CT platforms such as Azure DevOps
  • Familiarity with “Infrastructure as Code” and “Continuous Integration and Continuous Development” principles and practices
  • Knowledge of Agile, Site Reliability Engineering (SRE), and DevOps principles and practices
  • Proficiency in scripting and programming languages such as PowerShell, Python, Bash, and C#
  • Knowledge of backup and recovery processes and procedures
  • Advanced understanding of clustering, high availability, replication, and disaster recovery techniques
  • Strong ability to tune network, storage, server, and virtualization layers for optimal performance and reliability
  • In-depth performance tuning skills and system internals knowledge
  • Experience implementing CIS security hardening recommendations

    Highly desirable:

  • Familiarity with cloud computing platforms (IaaS, PaaS, and SaaS) such as Azure, AWS, or GCP

  • Knowledge of GitOps principles and practices
  • Experience with Alpine Linux and related technologies
  • Proficiency in Microsoft Windows Server and related technologies
  • Knowledge of Microsoft SQL Server, CosmosDB, and PostgreSQL
  • Familiarity with networking protocols such as TCP/IP, DNS, DHCP, and VLANs
  • Experience with Azure Active Directory/Microsoft Entra ID
  • Knowledge of data security governance and regulations (e.g., GDPR)
  • Awareness of security and auditing requirements in regulated environments

APPLY TODAY

With drones becoming more and more prevalent, we’re looking for individuals who are eager to help us build and deliver the solutions which have the potential to change the world. If you want to join us and be at the forefront of the aerial revolution, apply today.
To apply for this position, email your details to jobs@altitudeangel.com

Responsibilities:

ABOUT THE ROLE

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and play a pivotal role within our Product Engineering department, in maintaining and enhancing the performance, reliability, and scalability of our systems. The ideal candidate will lead and establish SLAs and SLOs, optimize resource usage, and proactively monitor system health. You will be instrumental in developing solutions that ensure seamless operations, cost-effective performance, and continuous service improvement.

WHAT YOU WILL DO

Primary responsibilities:

  • Establish service level objectives (SLOs) and service level agreements (SLAs)
  • Optimise system performance and scalability through effective resource management, load distribution, and latency reduction
  • Develop proactive monitoring solutions and dashboards to provide visibility into system health and performance while alerting on potential issues
  • Ensure services operate within defined budget constraints, identifying opportunities for cost-saving and optimization
  • Create and maintain comprehensive documentation for system architecture, configuration, and troubleshooting procedures
  • Partner with development teams to ensure new features and enhancements meet reliability and performance standards
  • Conduct root cause analysis post-incident and implement preventive measures to avoid future occurrences
  • Automate repetitive tasks and processes to enhance efficiency and minimize manual intervention
  • Stay informed about industry best practices, emerging trends, and new technologies in site reliability engineering
  • Identify technical debt and collaborate with application teams to establish remediation plans
  • Deliver continuous service improvement through Infrastructure as Code development

Secondary responsibilities:

  • Perform daily health and compliance checks on systems as required
  • Validate and promptly resolve monitoring alerts and batch job failures
  • Ensure sufficient capacity is available to support growth
  • Respond promptly to emails sent to team distribution lists or mailboxes
  • Handle incidents and requests efficiently, prioritizing a “customer-first” mindset
  • Maintain highly available, reliable, secure, and performant infrastructure
  • Conduct general server, database, and virtualization administration maintenance activities
  • Provide technical support to application support and development teams
  • Offer consultation to application support and development teams


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

Graduate

Proficient

1

Reading, United Kingdom