Principal Application Reliability/Site Reliability Engineer
at International SOS

Trevose, PA 19053, USA -

Start Date	Expiry Date	Salary	Posted On	Experience	Skills	Telecommute	Sponsor Visa
Immediate	16 Sep, 2024	Not Specified	18 Jun, 2024	15 year(s) or above	Good communication skills	No	No

Add to Wishlist Apply All Jobs

Required Visa Status:

Citizen	GC
US Citizen	Student Visa
H1B	CPT
OPT	H4 Spouse of H1B
GC Green Card

Employment Type:

Full Time	Part Time
Permanent	Independent - 1099
Contract – W2	C2H Independent
C2H W2	Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

International SOS is the world’s leading medical and security services company with over 12,000 employees working in 1,000 locations in 90 countries. We were founded on the principle of putting our clients’ employees first and this is still true today. Led by 5,200 medical professionals and 200 security specialists our teams work night and day to find solutions to protect our clients and their employees in whatever situation they may be facing; we assess, advise and assist from a medical, security and logistical perspective on a global scale to protect and save lives and thereby enable our clients to achieve their business goals. As we’ve delivered on this mission over the last 35 years, we have become the market leader in global telehealth services and digital health solutions for an extensive client base of Fortune 500 companies, NGO’s and governments around the world.

ABOUT YOU:

Thorough, detailed, and careful planning, development, and execution
Proactively looking for areas to improve
Clear communication with all involved parties
Calm under pressure
Clear sense of ownership and accountability
15+ years of hands-on experience with Windows and Linus operating systems, databases (SQL & Non-SQL)
In-depth knowledge and proven record of building and operating highly available, scalable, large-scale enterprise applications on AWS or Azure, or other Open Stack clouds.
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
10+ years of SRE or closely related experiences for large-scale cloud SaaS 10+ years of hands-on technical experiences in DevOp, Release Management Engineering, or similar areas.
Strong experience with Monitoring tools: Datadog, Prometheus, Grafana, Cloudwatch, ELK, etc.
Extensive knowledge of config management systems
Strong programming skills, Net, Java, Python, JavaScript, etc.
B.S. in Computer Science or Software Engineering. M.S. in similar fields preferred.
Minimal Occasional travel domestically in US.
On call Rotation

Responsibilities:

ABOUT THE ROLE:

We are seeking a Principal ARE/SRE to be responsible for keeping all user-facing services and production systems running smoothly. Application/Site Reliability Engineer(s) a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to environments and the codebase.

KEY RESPONSIBILITIES:

Be on rotation for availability incidents and provide support for customer service engineers.
Proactively develop scripts and tools to prevent incidents from ever happening.
Run infrastructure and applications with modern tools and automation like Puppet, Terraform, Kubernetes, etc.
Develop a comprehensive monitoring and alerting alert on symptoms and potential issues to prevent outages.
Measure and optimize system performance, to push our capabilities forward, get ahead of customer needs, and innovate to continually improve
Provide primary operational support and engineering for multiple large distributed software applications
Document every action so findings turn into repeatable actions–and then into automation.

KEY RESPONSIBILITIES CONT:

Improve the deployment process to make it as smooth and effortless as possible.
Design, build and maintain core infrastructure pieces that allow scaling to support enterprise-level of concurrent users.
Debug production issues across services and levels of the stack.
Plan the growth of infrastructure and capacity planning.
Provide technical leadership of the SRE team (internal or through Managed Services)
Proactively working with development leads, client service leads, solution architects, and infrastructure leads to enhance system reliability, scalability, and robustness.

REQUIREMENT SUMMARY

Experience:Min:15.0Max:20.0 year(s)

Industry:Information Technology/IT

Functional area of job:IT Software - Other

Domain:Software Engineering

Qualifications:Graduate

English Proficiency:Proficient

Number of posts:1

Address of job:Trevose, PA 19053, USA

Principal Application Reliability/Site Reliability Engineer
at International SOS

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

INDIA

AUSTRALIA

UNITED ARAB EMIRATES

Principal Application Reliability/Site Reliability Engineerat International SOS

Required Visa Status:

Employment Type:

REQUIREMENT SUMMARY

Principal Application Reliability/Site Reliability Engineer
at International SOS