Site Reliability Engineer

at  Police Digital Service

Remote, Scotland, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate10 Feb, 2025GBP 80000 Annual10 Nov, 2024N/AGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

JOIN POLICE DIGITAL SERVICE AS A SITE RELIABILITY ENGINEER - STARTING SALARY £80,000

As Site Reliability Engineer (SRE) you will be a cornerstone of the Technical Operations team, dedicated to ensuring the seamless operation and reliability of our systems that deliver critical services to our Policing customers. This role is at the heart of our technological infrastructure, requiring a unique blend of skills that combine software engineering with operational acumen.

Responsibilities:

KEY RESPONSIBILITIES

  • Design Scalable Infrastructure: Architect and engineer cloud solutions that are inherently scalable, ensuring they can manage varying loads and demands with ease, while maintaining performance and reliability.
  • Automate Operations: Develop and implement robust scripts and automation tools to streamline deployment, configuration, and management tasks, thereby increasing efficiency and reducing the potential for human error.
  • Monitor System Health: Utilize comprehensive monitoring solutions to continuously track system performance and health indicators, allowing for proactive identification and resolution of potential issues.
  • Lead Incident Response: Take charge during service disruptions, coordinating and leading the response to ensure rapid resolution, minimal impact, and clear communication throughout the incident.
  • Enforce Security Standards: Vigilantly uphold security protocols and compliance standards to protect sensitive data and infrastructure against threats and vulnerabilities.
  • Plan for Capacity: Engage in strategic capacity planning to accurately predict and prepare for future infrastructure needs, scaling resources accordingly to handle increased load and service demands.
  • Document Systems: Create and maintain clear, detailed, and up-to-date documentation of cloud infrastructure, including architecture designs, configurations, and operational procedures.
  • Mentor Team Members: Provide expert guidance and mentorship to less experienced team members, promoting a culture of knowledge sharing, continuous learning, and technical excellence.
  • Research New Technologies: Actively investigate and evaluate new technologies, tools, and practices that can enhance system reliability, efficiency, and the overall cloud service offering.
  • Develop Resilience Strategies: Formulate and implement strategies to enhance the resilience and fault-tolerance of cloud services, ensuring they can withstand and recover from unexpected disruptions.
  • Problem Management: Lead comprehensive post-mortem analysis following incidents to identify root causes, extract lessons learned, and implement preventive measures to avoid future occurrences.

WHAT YOU NEED TO SUCCEED IN THE ROLE

  • Technical Expertise: In depth knowledge of Azure cloud infrastructure, including services like Azure Compute, Azure Storage, and Azure Networking. Familiarity with implementing and managing Azure solutions such as Azure Kubernetes Service, Azure Functions, and Azure DevOps is crucial.
  • Software Engineering Skills: Strong coding skills in languages such as PowerShell, Python, Go, or Ruby, and experience with software development life cycles and agile methodologies. Understanding of Azure SDKs and APIs for integration and automation purposes.
  • Automation and Orchestration: Experience with automation tools like Azure Resource Manager, Azure Automation, Ansible, or Chef and orchestration platforms like Kubernetes or Docker Swarm. Proficiency in Azure Bicep would be a significant advantage.
  • Monitoring and Analytics: Proficiency with Azure monitoring tools such as Azure Monitor, Application Insights, and Network Watcher. Ability to analyse and interpret complex datasets to inform decision-making.
  • Continuous Learning: A commitment to continuous professional development, staying abreast of the latest industry trends and emerging technologies in cloud computing and SRE practices, particularly within the Azure ecosystem.
  • Leadership and Mentorship: The ability to lead initiatives, mentor junior team members, and contribute to a culture of technical excellence and continuous improvement.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Remote, United Kingdom