Junior / Medior / Senior Site Reliability Engineer
at Prodensa Group sro
Praha, Praha, Czech -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 17 Feb, 2025 | Not Specified | 18 Nov, 2024 | 1 year(s) or above | Infrastructure,Teams,Python,Code,Cloud Services,Communication Skills,Kubernetes,Docker,Soft Skills,Automation,Health,Information Technology,Azure,Computer Science,Analytical Skills,Aws,Scripting Languages | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
ABOUT THE COMPANY:
A reputable IT provider on the Czech and international market, the company specializes in custom application development and consultancy services focusing on developing distribution portals in a cloud environment, including the implementation of modern technologies such as GraphQL API, creating specific software solutions, such as simulators, and developing frontend platforms for various purposes. Known for its autonomy, precise work, and deep technological knowledge, the company provides expert consultancy services and organizes regular training for its employees and contractors. The team of experts has an extensive experience in various industries, allowing it to effectively support and develop client IT systems.
About Project: For a private owned company that specializes in AI-driven talent marketplace solutions tailored for internal career advancement. Their platform facilitates internal mobility by connecting employees with relevant projects, gigs, mentorships, and full-time roles aligned with their interests, career goals, skills, and experiences. The company’s technology is utilized by various enterprises to democratize career development, promote worker agility, and cultivate a workforce prepared for the future. They are a fast-growing and dynamic startup with 200+ team members worldwide.
Contract: Full-time, hybrid (3 days on-site is a must), Freelancer Contract
Job Overview: The SRE will monitor, troubleshoot, and maintain our infrastructure, with an emphasis on reliability, scalability, and automation. This role is ideal for candidates with foundational NOC experience who are interested in expanding their skills to include SRE practices and modern infrastructure management.
REQUIREMENTS
Education: Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.
Experience:
- 1+ years in a NOC, Technical Support, or Junior SRE role, with a focus on monitoring and system health.
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerized environments (Docker, Kubernetes) is preferred.
Technical Skills:
- Proficiency in system and application monitoring tools (e.g., Prometheus, Grafana, Coralogix, etc).
- Basic understanding of automation, Infrastructure-as-Code (IaC), and scripting languages (Python, Bash).
- Foundational knowledge of networking, Kubernetes, and cloud services (AWS is an advantage).
Soft Skills:
- Strong analytical skills with a proactive approach to problem-solving.
- Excellent communication skills with the ability to work collaboratively across teams.
- Eagerness to learn and adapt to new technologies, with a growth mindset
Responsibilities:
Monitoring and Incident Response:
Continuously monitor application performance, system health, and network status.
Respond swiftly to incidents, performing root cause analysis and implementing resolution strategies.
Escalate issues when necessary and lead the collaboration and communication for swift resolution of incidents.
Automated Monitoring and Alerting:
Use and configure monitoring tools (e.g., Prometheus, Grafana, Coralogix, Splunk) to improve visibility into system performance.
Develop and refine alerting rules to reduce noise and improve incident detection.
Troubleshooting and System Maintenance:
Perform initial troubleshooting and diagnostics across application, infrastructure, and network layers.
Work with Developers and DevOps to implement fixes, validate configurations, and ensure systems are resilient to future incidents.
Operational Automation:
Automate repetitive tasks, such as alert handling, system checks, and routine maintenance.
Use scripting (e.g., Python, Bash) and Infrastructure-as-Code (IaC) tools (e.g., Terraform, Crossplane) to improve operational efficiency.
Documentation and Knowledge Sharing:
Document processes, incidents, and troubleshooting steps, maintaining a knowledge base for common issues.
Contribute to runbooks for automated troubleshooting and escalate complex issues to SREs or other technical teams.
Continuous Improvement:
Analyze incidents and recurring issues to identify areas for improvement in system reliability and automation.
Lead post-incident reviews and contribute insights for future preventive actions.
REQUIREMENT SUMMARY
Min:1.0Max:6.0 year(s)
Information Technology/IT
IT Software - System Programming
Software Engineering
Graduate
Computer science information technology or a related field or equivalent experience
Proficient
1
Praha, Czech