Principal System Reliability Engineer at Wells Fargo
Charlotte, North Carolina, USA -
Full Time


Start Date

Immediate

Expiry Date

26 Nov, 25

Salary

305000.0

Posted On

26 Aug, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Health Check, Linux, Relevance, Windows, Training, Infrastructure, Emerging Technologies, Middleware

Industry

Information Technology/IT

Description

PAY RANGE

Reflected is the base pay range offered for this position. Pay may vary depending on factors including but not limited to achievements, skills, experience, or work location. The range listed is just one component of the compensation package offered to candidates.
$159,000.00 - $305,000.00

APPLICANTS WITH DISABILITIES

To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo .

WELLS FARGO RECRUITMENT AND HIRING REQUIREMENTS:

a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process

Required Qualifications:

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of experience leading observability and monitoring tooling
  • 7+ years in infrastructure (windows and Linux) support
  • 5+ years proven success in toil reduction initiatives
  • 5+ years in cloud application managemen

Desired Qualifications:

  • Ability to troubleshoot the full application stack, operating system stack and middleware
  • Deep understanding of java applications including how to read a thread dump and use of java flight recorder
  • Ability to quickly script for new alarms and execute according to company approved alerting tools
  • Ability to dynamically engage and attend high impact production incidents and troubleshoot to resolution and provide immediate incident analysis both written and spoken
  • Experience setting up distributed tracing across an internet topology for full health check and with the ability to pinpoint problem source
  • Ability to mentor the platform teams by training, documenting, certifying and building the team’s skill set
  • Demonstrate knowledge/understanding of emerging technologies, industry trends, and outside perspectives, and communicate relevance to the organizations strategic and tactical goals
  • Knowledge and understanding of AI capabilities
  • Experience leading proof of concepts and prototyping
  • Experience leading the execution of critical/complex project deliverables
  • Strong communication with the ability to communicate on all levels of the organizatio
Responsibilities

Wells Fargo is seeking a Principal Engineer. This Principal Engineer must be able to perform gap analysis of the application, implement resiliency, observability and operational automation. This person will adopt a System Reliability Engineering practice for both on-prem, hybrid, and native cloud applications. The Principal Engineer must have strong communication skills and the ability to mentor, provide expert advice, and upskill platform engineers. The Principal Engineer will be called to production outage calls and be expected to reduce the mean time to resolve by providing expert level troubleshooting skills.

In this role, you will:

  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions.
  • Ensure high availability and performance of production systems through proactive monitoring and incident response.
  • Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments.
  • Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability.
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry’s best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Will own and drive alarming, monitoring, toile reduction and overall risk reduction in the Financial Hardship Operations, Consumer Lending Operations and Unsecure Lending Operations Space. Full experience with the OSI model (Open Systems Interconnection)

Required Qualifications:

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 7+ years of experience leading observability and monitoring tooling
  • 7+ years in infrastructure (windows and Linux) support
  • 5+ years proven success in toil reduction initiatives
  • 5+ years in cloud application management

Desired Qualifications:

  • Ability to troubleshoot the full application stack, operating system stack and middleware
  • Deep understanding of java applications including how to read a thread dump and use of java flight recorder
  • Ability to quickly script for new alarms and execute according to company approved alerting tools
  • Ability to dynamically engage and attend high impact production incidents and troubleshoot to resolution and provide immediate incident analysis both written and spoken
  • Experience setting up distributed tracing across an internet topology for full health check and with the ability to pinpoint problem source
  • Ability to mentor the platform teams by training, documenting, certifying and building the team’s skill set
  • Demonstrate knowledge/understanding of emerging technologies, industry trends, and outside perspectives, and communicate relevance to the organizations strategic and tactical goals
  • Knowledge and understanding of AI capabilities
  • Experience leading proof of concepts and prototyping
  • Experience leading the execution of critical/complex project deliverables
  • Strong communication with the ability to communicate on all levels of the organization

Job Expectations:

  • This position offers a hybrid work schedule - ability to work in office
  • This position is not eligible for Visa sponsorship
  • Relocation assistance is not available for this position
Loading...