Site Reliability Engineer, VP - Scheduling Platform
at Goldman Sachs
Dallas, TX 75201, USA -
Start Date | Expiry Date | Salary | Posted On | Experience | Skills | Telecommute | Sponsor Visa |
---|---|---|---|---|---|---|---|
Immediate | 22 Jan, 2025 | Not Specified | 23 Oct, 2024 | 3 year(s) or above | Python,Cloud,Go,Firewalls | No | No |
Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
WHAT WE DO:
At Goldman Sachs, our Engineers don’t just make things – we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets.
Engineering, which is comprised of our Technology Division and global strategists groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions. Want to push the limit of digital possibilities? Start here.
WHO WE ARE:
Procmon Platform delivers a highly scalable and reliable ecosystem for scheduling business critical jobs across Goldman Sachs.
Our platform is responsible for scheduling tens of millions of daily jobs for Global Banking & Markets, Asset & Wealth Management, Risk and other business and engineering functions.
The ecosystem includes a number of high availability, very large scale systems including:
- Job scheduling
- Event streaming
- Log shipping
Data warehousesSecurity infrastructure
REQUIREMENTS
- 5+ years of relevant professional experience
- 3+ years of Linux fundamentals and system administration skills
- 3+ years of networking experience(familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
- 3+ years experience working with distributed computing systems and Cloud computing environments
- Excellent problem-solving and automation skills
- Proficiency in at least one programming language; the team uses a mix of Go, Python and ErlangAble to operate effectively in a mission critical, highly regulated financial services environment
Responsibilities:
- Own technical operations for systems that manage hundreds of thousands of compute cores
- Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
- Troubleshoot and resolve issues with block devices, file descriptors, and packet loss
- Lead real-time outage investigations and present postmortems to senior management
- Define SLIs and SLOs and partner with development teams to ensure systems are sufficiently well designed and instrumented
- Partner with our development team throughout development and operations
- Plan and manage deployments and migrations (including end-of-life programs)
- Plan and implement robust business continuity and security programsProvide regional coverage for the Procmon platform and participate in on-call support
REQUIREMENT SUMMARY
Min:3.0Max:5.0 year(s)
Information Technology/IT
IT Software - Other
Software Engineering
Graduate
Proficient
1
Dallas, TX 75201, USA