Site Reliability Engineer
at Nooxit
10407 Berlin, Prenzlauer Berg, Germany -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 18 Feb, 2025 | Not Specified | 19 Nov, 2024 | N/A | Automation,Resource Management,Python,Code,Azure,Metrics,System Monitoring,Scripting,Distributed Systems | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
Full-time (40 h), as soon as possible, permanent and based in Berlin or remotely in home office.
We’re seeking an experienced Site Reliability Engineer (SRE) with a solid foundation in Python, a passion for performance optimization, and a proactive approach to infrastructure management. In this role, you’ll work closely with development and operations teams to maintain, monitor, and improve the reliability of our systems, leveraging cutting-edge tools and methodologies to ensure peak performance.
Tasks
- Design, implement, and optimize systems to improve the reliability, performance, and scalability of our services.
- Build and maintain observability solutions using tools like Jaeger, Prometheus, and Grafana to enhance monitoring, tracing, and alerting across applications.
- Collaborate with development teams to build, manage, and scale Kubernetes environments, ensuring high availability and robust service delivery.
- Develop automation scripts and tools in Python to enhance system reliability and reduce manual intervention.
- Diagnose and resolve incidents, conduct root-cause analysis, and implement measures to prevent recurrence.
- Participate in on-call rotations, ensuring rapid response to system issues while continuously improving incident management processes.
Requirements
- Proficiency in Python for scripting and automation.
- Experience with tracing tools such as Jaeger or similar to troubleshoot and monitor complex distributed systems.
- Experience with monitoring tools such as Prometheus or similar for collecting and alerting on metrics.
- Experience with dashboarding tools such as Grafana or similar for creating visualizations that aid in system monitoring and diagnostics.
- Experience working in Kubernetes environments, with an understanding of container orchestration, scaling, and resource management.
PREFERRED QUALIFICATIONS (OPTIONAL):
- Hands-on experience with CI/CD pipelines and DevOps practices.
- Familiarity with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code tools like OpenTofu.
Benefits
- Competitive salary
- Flexible work hours and remote work opportunities.
- A beautiful Gather remote office
- An ambitious and helpful team
- Opportunity to work with cutting-edge technologies and make a significant impact in a fast-growing startup environment
Are you interested?
Then apply right now by sending your CV If available, please include a Github link. A cover letter is not necessary.
If you have any questions, please contact us or just give us a call
How To Apply:
Incase you would like to apply to this job directly from the source, please click here
Responsibilities:
- Design, implement, and optimize systems to improve the reliability, performance, and scalability of our services.
- Build and maintain observability solutions using tools like Jaeger, Prometheus, and Grafana to enhance monitoring, tracing, and alerting across applications.
- Collaborate with development teams to build, manage, and scale Kubernetes environments, ensuring high availability and robust service delivery.
- Develop automation scripts and tools in Python to enhance system reliability and reduce manual intervention.
- Diagnose and resolve incidents, conduct root-cause analysis, and implement measures to prevent recurrence.
- Participate in on-call rotations, ensuring rapid response to system issues while continuously improving incident management processes
REQUIREMENT SUMMARY
Min:N/AMax:5.0 year(s)
Information Technology/IT
IT Software - Application Programming / Maintenance
Software Engineering
Graduate
Proficient
1
10407 Berlin, Germany