Platform Site Reliability Engineering Senior Manager

at  Wells Fargo

San Francisco, California, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate14 Feb, 2025USD 287600 Annual17 Nov, 20242 year(s) or aboveKubernetes,Solaris,Git,Jenkins,Netcool,Vmware,Puppet,Kafka,Json,Ruby,Gradle,Python,Appdynamics,Microservices,Ansible,Maven,Training,Kibana,Jira,Sprint Planning,Java,Splunk,Confluence,Windows,Bash,Sitescope,Github,Openshift,Linux,Perl,Javascript,AutomationNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

PAY RANGE

Reflected is the base pay range offered for this position. Pay may vary depending on factors including but not limited to achievements, skills, experience, or work location. The range listed is just one component of the compensation package offered to candidates.
$120,400.00 - $287,600.00

APPLICANTS WITH DISABILITIES

To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo .

WELLS FARGO RECRUITMENT AND HIRING REQUIREMENTS:

a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process

Required Qualifications, US:

  • 6+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of Management experience
  • 5+ years of Site Reliability Engineering experience
  • 5+ years of cloud technology experienc

Desired Qualifications:

  • 3+ years managing Agile teams including use of tools such as Jira and Confluence
  • 5+ years’ experience with Agile Scrum (Daily Standup, Sprint Planning and Sprint Retrospective meetings)
  • 5+ years’ experience in two or more of the following tenets - Observability, Automation, Reliability, Resiliency, Scalability, Configuration Management & Actionable, Data Driven insights.
  • 5+ years’ experience troubleshooting and systems administration experience across multiple OS Platforms: Solaris, AIX, PKS, Kubernetes, OpenShift, Linux, Windows, VMware
  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of software development experience with languages such as Perl, Python, Java, JavaScript, Ruby, JSON, Angular, NodeJS
  • 2+ years’ experience with Observability/Monitoring/Logging tools: AppDynamics, Grafana, Big Panda, MoogSoft, Splunk, Netcool, Sitescope, Elastic, Kibana, Kafka, Traffic Manager, Message Processor, Filebeat, Basemon, etc.
  • 2+ years’ experience with modern architectures - ex. private/public cloud -GCP/Azure, microservices, event-driven architecture, API Management and related technologies.
  • 2+ years’ experience with Automation Scripting: Bash, Shell, Ansible, Terraform, Azure DevOps
  • 2+ years’ experience with one or more CI/CD Pipeline (Github, Jenkins) and Automation tools: Gradle, Maven, Git, Ansible, Puppet
  • 2+ years Incident Management System experience
  • Experience with data center migration

Responsibilities:

Wells Fargo is seeking a Platform Site Reliability Engineering Senior Manager to help design durable and reliable services, automate wherever possible, drive observability, and provide coverage for incidents, change activity, business continuity, and other production related activities.

In this role, you will:

  • Lead by example - focus on key aspects of SRE like Observability, Automation, Reliability, Resiliency, Scalability, Configuration Management & Actionable, Data Driven insights.
  • Act as a key transformation agent to help the team learn and develop SRE capabilities and advance the team through a defined SRE maturity model.
  • Attract, recruit, hire, and build top performing teams. Cultivate an engaged, diverse, inclusive and transparent culture
  • Ensure adherence to the Platform Architecture and meeting non-functional requirements for API management products and services.
  • Partner with, engage and influence architects and experienced engineers to incorporate Wells Fargo Technology technical strategies, while understanding next generation domain architecture and enable application migration paths to target architecture.
  • Function as the technical representative for the product during cross-team collaborative efforts and planning. Assess the availability of critical business flows, identify service level objectives and indicators, and conduct destructive and resiliency testing to reach 99.995% availability for the firm’s critical products and services leading to improved customer experience and customer satisfaction.
  • Collaborate and influence Product Managers/Product Owners to drive user satisfaction, influence technology requirements and priorities in the product roadmap, promote innovative and intelligent solutions, generate corporate value and articulate technical strategy while being a solid advocate of agile and DevOps practices
  • Drive the buildout of automation to prevent problem recurrence, with the goal of automating response to all non-exceptional service conditions.
  • Introduce enterprise capabilities, tools, and innovation to improve availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, CI/CD integration, continuous testing (performance, functional), continuous improvement, and standardization/automation of key SRE metrics and IT Service Operations processes.
  • Share support responsibilities for critical applications, to identify systemic issues, conduct blameless postmortems, root cause analysis, and introduce strategic solutions in code that solve the problem and eliminate repeat issues.
  • Apply technology background in software engineering and systems engineering to ensure the applications on-boarded to SRE are available, have full-stack observability, are integrated with CI/CD, and always-on by introducing continuous improvement through code and automation, continuous testing (performance, functional), and provide operational insight through analytics.
  • Troubleshoot, and analyze production job failures across the technology stack e.g., database, network file delivery, server, and application issues independently and provide solutions to recovery. Participate in root cause analysis and preventative actions to avoid recurring incidents.
  • Interact directly with third party vendors and technology service providers
  • Act as a key participant in developing standards and companywide best practices for engineering complex and large-scale technology solutions for technology engineering disciplines
  • Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
  • Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
  • Develop original and/or complex code, provide coding guidance/review, and create documentation
  • Manage and develop teams of individual contributors and managers in roles with moderate complexity and risk in Technology Operations
  • Manage the operational outcomes of key IT services delivered by network services and operations, database services, infrastructure services including server and storage services
  • Engage and influence stakeholders, internal partners and peers
  • Identify and recommend opportunities for technology operations process improvement and development
  • Leverage metrics to support infrastructure associated with applications that are highly automated, and latency sensitive, client facing and internal applications consumed by the Business
  • Drive key strategic initiatives associated with infrastructure availability
  • Manage backups, recovery and ensure recovery includes periodic tests to ensure business continuity
  • Work with IT risk management, compliance and all lines of defense, including Audit, to ensure platform risks are proactively managed
  • Institute controls in partnership with Operation Risks to ensure risk management is sustainable
  • Manage the costs, demand and resource capacity for the team resources, leveraging external resources as needed
  • Determine appropriate strategy and actions of technology operations team to meet deliverable
  • Interpret and develop policies and procedures
  • Collaborate with and influence all levels of professionals, including more experienced managers
  • Manage allocation of people and financial resources to ensure commitments are met and align with strategic objectives in technology operations
  • Develop and guide a culture of talent development to meet business objectives and strategy

Required Qualifications, US:

  • 6+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of Management experience
  • 5+ years of Site Reliability Engineering experience
  • 5+ years of cloud technology experience

Desired Qualifications:

  • 3+ years managing Agile teams including use of tools such as Jira and Confluence
  • 5+ years’ experience with Agile Scrum (Daily Standup, Sprint Planning and Sprint Retrospective meetings)
  • 5+ years’ experience in two or more of the following tenets - Observability, Automation, Reliability, Resiliency, Scalability, Configuration Management & Actionable, Data Driven insights.
  • 5+ years’ experience troubleshooting and systems administration experience across multiple OS Platforms: Solaris, AIX, PKS, Kubernetes, OpenShift, Linux, Windows, VMware
  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of software development experience with languages such as Perl, Python, Java, JavaScript, Ruby, JSON, Angular, NodeJS
  • 2+ years’ experience with Observability/Monitoring/Logging tools: AppDynamics, Grafana, Big Panda, MoogSoft, Splunk, Netcool, Sitescope, Elastic, Kibana, Kafka, Traffic Manager, Message Processor, Filebeat, Basemon, etc.
  • 2+ years’ experience with modern architectures - ex. private/public cloud -GCP/Azure, microservices, event-driven architecture, API Management and related technologies.
  • 2+ years’ experience with Automation Scripting: Bash, Shell, Ansible, Terraform, Azure DevOps
  • 2+ years’ experience with one or more CI/CD Pipeline (Github, Jenkins) and Automation tools: Gradle, Maven, Git, Ansible, Puppet
  • 2+ years Incident Management System experience
  • Experience with data center migrations

Job Expectations:

  • Ability to travel up to 10% of the time.
  • This position is not eligible for Visa Sponsorship.


REQUIREMENT SUMMARY

Min:2.0Max:6.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

San Francisco, CA, USA