Site Reliability Engineer, VP - Scheduling Platform

at  Goldman Sachs

Dallas, TX 75201, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate22 Jan, 2025Not Specified23 Oct, 20243 year(s) or abovePython,Cloud,Go,FirewallsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

WHAT WE DO:

At Goldman Sachs, our Engineers don’t just make things – we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets.
Engineering, which is comprised of our Technology Division and global strategists groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions. Want to push the limit of digital possibilities? Start here.

WHO WE ARE:

Procmon Platform delivers a highly scalable and reliable ecosystem for scheduling business critical jobs across Goldman Sachs.
Our platform is responsible for scheduling tens of millions of daily jobs for Global Banking & Markets, Asset & Wealth Management, Risk and other business and engineering functions.

The ecosystem includes a number of high availability, very large scale systems including:

  • Job scheduling
  • Event streaming
  • Log shipping
  • Data warehousesSecurity infrastructure

-

REQUIREMENTS

  • 5+ years of relevant professional experience
  • 3+ years of Linux fundamentals and system administration skills
  • 3+ years of networking experience(familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
  • 3+ years experience working with distributed computing systems and Cloud computing environments
  • Excellent problem-solving and automation skills
  • Proficiency in at least one programming language; the team uses a mix of Go, Python and ErlangAble to operate effectively in a mission critical, highly regulated financial services environment
-

Responsibilities:

  • Own technical operations for systems that manage hundreds of thousands of compute cores
  • Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
  • Troubleshoot and resolve issues with block devices, file descriptors, and packet loss
  • Lead real-time outage investigations and present postmortems to senior management
  • Define SLIs and SLOs and partner with development teams to ensure systems are sufficiently well designed and instrumented
  • Partner with our development team throughout development and operations
  • Plan and manage deployments and migrations (including end-of-life programs)
  • Plan and implement robust business continuity and security programsProvide regional coverage for the Procmon platform and participate in on-call support
-


REQUIREMENT SUMMARY

Min:3.0Max:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

Dallas, TX 75201, USA