Site Reliability Engineer
at CLOUD BRIDGE
Marlow SL7 3AA, , United Kingdom -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 09 May, 2025 | Not Specified | 09 Feb, 2025 | N/A | Jenkins,Ruby,Kubernetes,Microservices,Python,Cloud Security,Azure,Docker,Infrastructure,Incident Response,Large Scale Systems,Storage,Firewalls,Aws,Containerization,Distributed Systems,Orchestration,Bash | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
The Site Reliability Engineer (SRE) will play a key role in maintaining and scaling infrastructure, ensuring reliability, performance, and scalability. You will collaborate closely with development, operations, and security teams to improve the reliability and efficiency of applications, addressing incidents, automating processes, and managing infrastructure as code.
REQUIRED SKILLS & EXPERIENCE:
- Hands-on experience with AWS, GCP, or Azure for managing compute, storage, and networking services.
- Proficiency in using Terraform, CloudFormation, Ansible, or similar tools for automating infrastructure.
- Strong experience in monitoring and incident response using tools like Prometheus, Grafana, and ELK Stack.
- Strong scripting skills in Python, Bash, Go, or Ruby for automating tasks and building custom tools.
- Experience with CI/CD pipelines (Jenkins, GitLab CI) and optimizing performance for large-scale systems.
- Familiarity with cloud security, access controls, firewalls, and networking best practices.
PREFERRED QUALIFICATIONS:
- Certifications: AWS Certified DevOps Engineer, Google Professional Cloud Architect, or similar.
- Containerization & Orchestration: Experience with Docker, Kubernetes, or ECS/EKS for containerized applications.
- SRE Experience: Familiarity with SRE principles like SLAs, SLOs, and error budgets, and practical application of those in large-scale systems.
- Distributed Systems: Understanding of microservices, service discovery, and fault-tolerant architectures
If you are an experienced Site Reliability Engineer with a passion for building and maintaining highly available systems, we want to hear from you!
Responsibilities:
- Build and scale cloud infrastructure (AWS, GCP, or Azure), automate provisioning using Terraform or CloudFormation, and manage resources for optimal performance.
- Monitor, troubleshoot, and resolve incidents, optimizing systems to ensure reliability and minimize downtime.
- Implement monitoring (Prometheus, Grafana, Datadog) and set up alerting systems to proactively address issues and ensure scalability.
- Work with DevOps, engineering, and security teams to improve application deployment, infrastructure management, and system resilience.
- Develop disaster recovery strategies, ensure infrastructure security through best practices, and maintain business continuity.
- system documentation and lead continuous improvement initiatives to streamline operations.
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Network Administration / Security
Software Engineering
Graduate
Proficient
1
Marlow SL7 3AA, United Kingdom