Site Reliability Engineer
at Reach Digital Health

Cape Town, Western Cape, South Africa -

Start Date	Expiry Date	Salary	Posted On	Experience	Skills	Telecommute	Sponsor Visa
Immediate	26 Nov, 2024	Not Specified	29 Aug, 2024	N/A	Puppet,Testing,Mysql,Benchmarking,Documentation,Operating Systems,Web Servers,Kubernetes,Reliability Engineering,Aws,C++,Ruby,Azure,Java,Git,Perl,Capacity Planning,Scripting Languages,Programming Languages,Load Testing,Performance Engineering,Redis	No	No

Add to Wishlist Apply All Jobs

Required Visa Status:

Citizen	GC
US Citizen	Student Visa
H1B	CPT
OPT	H4 Spouse of H1B
GC Green Card

Employment Type:

Full Time	Part Time
Permanent	Independent - 1099
Contract – W2	C2H Independent
C2H W2	Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

JOB OVERVIEW

The Site Reliability Engineer (SRE) will apply software engineering principles and practices to infrastructure and operations problems, and design and implement solutions that automate and improve the availability, scalability, and efficiency of our systems. You will also collaborate with other engineering teams, project teams, and other stakeholders to deliver high-quality products and services that meet our customers’ needs and expectations.
You’ll be exposed to unique challenges assisting with the maintenance and stability of our global and local infrastructure, and have the opportunity to contribute to internal and external open source projects. We believe in choosing the right tools for the job, and support creativity in solving problems.

QUALIFICATIONS

A bachelor’s degree in Computer Science, Engineering or related field, or equivalent experience.

SKILLS AND EXPERIENCE REQUIRED

Proficient in one or more programming languages, such as Python, Go, Java, or C++.
Proficient in one or more scripting languages, such as Bash, Perl, or Ruby.
Proficient in one or more cloud platforms, such as AWS, Azure, or GCP.
Proficient in one or more UNIX-like operating systems.
Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform.
Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
Proficient in one or more container and orchestration tools, such as Docker, Kubernetes.
Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy.
Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis.
Proficient in one or more version control and collaboration tools, such as Git.
Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture.
Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation.
Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning.
Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus.
Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment.
Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner.
Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.

Responsibilities:

RESPONSIBILITIES AND DUTIES

Your primary responsibilities will include but not be limited to:

Assisting with resources to facilitate engineering services, and keep them operational. This includes continuous integration systems, software deployment and basic troubleshooting of code, and creation and management of software repositories.
Ensuring servers are patched against security exploits in time, managing secure access to servers and repositories for partners and internal staff, and secure interconnection between systems.
Ensuring servers are configured in a documented and repeatable way.
Ensuring system and server architecture is appropriate to the requirements of projects, easily maintainable in the long term, and provides appropriate levels of redundancy.
Provide timeous uptime assurance, and support with issue investigation and recovery procedures.
Design and develop tools and software that automate and improve the infrastructure and operation of our systems, ensuring adoption of best practices.
Perform code reviews, testing and debugging and troubleshooting of the software and tools developed by the SRE team and assist other engineering teams with the same.
General support (problems, password changes, etc) of office infrastructure and services such as Google Workspace, Slack, and PPM Pro.
Site load testing, unit testing, disaster recovery testing, and quality assurance on a system level including backend performance, deployment sanity, security, scalability and stability.
Providing data security expertise within SRE and supporting the organisation and projects with compliance with data privacy regulations, implementing and monitoring security measures to protect sensitive health information, and managing data backups and disaster recovery.
Advise on and/or contribute to new or emerging technologies that might be relevant to Reach.
Commit to writing software that allows itself to be tested.
Work well within cross functional teams in order to produce world class products and programmes that empower end users.

You will primarily be responsible for:

Infrastructure reliability and performance:
Monitoring, measuring, and improving the reliability and performance of our systems
Maintenance, upgrades, and security updates
Automation and tooling:
Designing and developing software and scripts that automate and streamline various aspects of infrastructure and operations
Assisting other teams with deployment and updates of their applications and services
Supporting with internal management of the organisation’s technological infrastructure
Data Management & Security:
Working with SRE, Data Security, Legal and project team to develop and enforce policies and procedures for data collection, storage, and access to ensuring compliance with data privacy regulations, implementing and monitoring security measures to protect sensitive health information, and managing data backups and disaster recovery

REQUIREMENT SUMMARY

Experience:Min:N/AMax:5.0 year(s)

Industry:Information Technology/IT

Functional area of job:IT Software - Network Administration / Security

Domain:Software Engineering

Qualifications:Graduate

Specialization:Computer science engineering or related field or equivalent experience

English Proficiency:Proficient

Number of posts:1

Address of job:Cape Town, Western Cape, South Africa

Site Reliability Engineer
at Reach Digital Health

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

INDIA

AUSTRALIA

UNITED ARAB EMIRATES

Site Reliability Engineerat Reach Digital Health

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

Site Reliability Engineer
at Reach Digital Health