Lead Site Reliability Engineer
at FactSet Research Systems
London, England, United Kingdom -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 20 Dec, 2024 | Not Specified | 25 Sep, 2024 | N/A | Aws,Computer Science,Distributed Systems,Python,Azure,Orchestration,C++,Automation Tools,Infrastructure Technologies,Code,Infrastructure,Communication Skills,Programming Languages,Docker | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.
Responsibilities:
- Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
- Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
- Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
- Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
- Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
- Take a proactive approach to continuously improving reliability.
- Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
- Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
- Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
- Participate in on-call rotation.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- Proven experience deploying and managing large-scale distributed systems successfully.
- Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
- Proficiency in programming languages such as Python, C++, or Go
- Familiarity with monitoring and observability tools.
- Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
- Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.
Desirable Qualifications:
- Familiarity with security best practices and experience implementing security measures in a production environment.
- Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
- Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
- Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).
Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment
Responsibilities:
- Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
- Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
- Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
- Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
- Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
- Take a proactive approach to continuously improving reliability.
- Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
- Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
- Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
- Participate in on-call rotation
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Application Programming / Maintenance
Software Engineering
Graduate
Computer science engineering or equivalent practical experience
Proficient
1
London, United Kingdom