Junior / Medior / Senior Site Reliability Engineer

at  Prodensa Group sro

Praha, Praha, Czech -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate17 Feb, 2025Not Specified18 Nov, 20241 year(s) or aboveInfrastructure,Teams,Python,Code,Cloud Services,Communication Skills,Kubernetes,Docker,Soft Skills,Automation,Health,Information Technology,Azure,Computer Science,Analytical Skills,Aws,Scripting LanguagesNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

ABOUT THE COMPANY:

A reputable IT provider on the Czech and international market, the company specializes in custom application development and consultancy services focusing on developing distribution portals in a cloud environment, including the implementation of modern technologies such as GraphQL API, creating specific software solutions, such as simulators, and developing frontend platforms for various purposes. Known for its autonomy, precise work, and deep technological knowledge, the company provides expert consultancy services and organizes regular training for its employees and contractors. The team of experts has an extensive experience in various industries, allowing it to effectively support and develop client IT systems.
About Project: For a private owned company that specializes in AI-driven talent marketplace solutions tailored for internal career advancement. Their platform facilitates internal mobility by connecting employees with relevant projects, gigs, mentorships, and full-time roles aligned with their interests, career goals, skills, and experiences. The company’s technology is utilized by various enterprises to democratize career development, promote worker agility, and cultivate a workforce prepared for the future. They are a fast-growing and dynamic startup with 200+ team members worldwide.
Contract: Full-time, hybrid (3 days on-site is a must), Freelancer Contract
Job Overview: The SRE will monitor, troubleshoot, and maintain our infrastructure, with an emphasis on reliability, scalability, and automation. This role is ideal for candidates with foundational NOC experience who are interested in expanding their skills to include SRE practices and modern infrastructure management.

REQUIREMENTS

Education: Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.

Experience:

  • 1+ years in a NOC, Technical Support, or Junior SRE role, with a focus on monitoring and system health.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and containerized environments (Docker, Kubernetes) is preferred.

Technical Skills:

  • Proficiency in system and application monitoring tools (e.g., Prometheus, Grafana, Coralogix, etc).
  • Basic understanding of automation, Infrastructure-as-Code (IaC), and scripting languages (Python, Bash).
  • Foundational knowledge of networking, Kubernetes, and cloud services (AWS is an advantage).

Soft Skills:

  • Strong analytical skills with a proactive approach to problem-solving.
  • Excellent communication skills with the ability to work collaboratively across teams.
  • Eagerness to learn and adapt to new technologies, with a growth mindset

Responsibilities:


  • Monitoring and Incident Response:

  • Continuously monitor application performance, system health, and network status.

  • Respond swiftly to incidents, performing root cause analysis and implementing resolution strategies.

  • Escalate issues when necessary and lead the collaboration and communication for swift resolution of incidents.

  • Automated Monitoring and Alerting:

  • Use and configure monitoring tools (e.g., Prometheus, Grafana, Coralogix, Splunk) to improve visibility into system performance.

  • Develop and refine alerting rules to reduce noise and improve incident detection.

  • Troubleshooting and System Maintenance:

  • Perform initial troubleshooting and diagnostics across application, infrastructure, and network layers.

  • Work with Developers and DevOps to implement fixes, validate configurations, and ensure systems are resilient to future incidents.

  • Operational Automation:

  • Automate repetitive tasks, such as alert handling, system checks, and routine maintenance.

  • Use scripting (e.g., Python, Bash) and Infrastructure-as-Code (IaC) tools (e.g., Terraform, Crossplane) to improve operational efficiency.

  • Documentation and Knowledge Sharing:

  • Document processes, incidents, and troubleshooting steps, maintaining a knowledge base for common issues.

  • Contribute to runbooks for automated troubleshooting and escalate complex issues to SREs or other technical teams.

  • Continuous Improvement:

  • Analyze incidents and recurring issues to identify areas for improvement in system reliability and automation.

  • Lead post-incident reviews and contribute insights for future preventive actions.


REQUIREMENT SUMMARY

Min:1.0Max:6.0 year(s)

Information Technology/IT

IT Software - System Programming

Software Engineering

Graduate

Computer science information technology or a related field or equivalent experience

Proficient

1

Praha, Czech