Principal System Reliability Engineer at Wells Fargo

Charlotte, North Carolina, USA -

Full Time

Start Date

Immediate

Expiry Date

26 Nov, 25

Salary

305000.0

Posted On

26 Aug, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Health Check, Linux, Relevance, Windows, Training, Infrastructure, Emerging Technologies, Middleware

Industry

Information Technology/IT

Description

PAY RANGE

Reflected is the base pay range offered for this position. Pay may vary depending on factors including but not limited to achievements, skills, experience, or work location. The range listed is just one component of the compensation package offered to candidates.
$159,000.00 - $305,000.00

APPLICANTS WITH DISABILITIES

To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo .

WELLS FARGO RECRUITMENT AND HIRING REQUIREMENTS:

a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process

Required Qualifications:

7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
7+ years of experience leading observability and monitoring tooling
7+ years in infrastructure (windows and Linux) support
5+ years proven success in toil reduction initiatives
5+ years in cloud application managemen

Desired Qualifications:

Ability to troubleshoot the full application stack, operating system stack and middleware
Deep understanding of java applications including how to read a thread dump and use of java flight recorder
Ability to quickly script for new alarms and execute according to company approved alerting tools
Ability to dynamically engage and attend high impact production incidents and troubleshoot to resolution and provide immediate incident analysis both written and spoken
Experience setting up distributed tracing across an internet topology for full health check and with the ability to pinpoint problem source
Ability to mentor the platform teams by training, documenting, certifying and building the team’s skill set
Demonstrate knowledge/understanding of emerging technologies, industry trends, and outside perspectives, and communicate relevance to the organizations strategic and tactical goals
Knowledge and understanding of AI capabilities
Experience leading proof of concepts and prototyping
Experience leading the execution of critical/complex project deliverables
Strong communication with the ability to communicate on all levels of the organizatio

Responsibilities

Wells Fargo is seeking a Principal Engineer. This Principal Engineer must be able to perform gap analysis of the application, implement resiliency, observability and operational automation. This person will adopt a System Reliability Engineering practice for both on-prem, hybrid, and native cloud applications. The Principal Engineer must have strong communication skills and the ability to mentor, provide expert advice, and upskill platform engineers. The Principal Engineer will be called to production outage calls and be expected to reduce the mean time to resolve by providing expert level troubleshooting skills.

In this role, you will:

Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions.
Ensure high availability and performance of production systems through proactive monitoring and incident response.
Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments.
Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability.
Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
Maintain knowledge of industry’s best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
Will own and drive alarming, monitoring, toile reduction and overall risk reduction in the Financial Hardship Operations, Consumer Lending Operations and Unsecure Lending Operations Space. Full experience with the OSI model (Open Systems Interconnection)

Required Qualifications:

7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
7+ years of experience leading observability and monitoring tooling
7+ years in infrastructure (windows and Linux) support
5+ years proven success in toil reduction initiatives
5+ years in cloud application management

Desired Qualifications:

Ability to troubleshoot the full application stack, operating system stack and middleware
Deep understanding of java applications including how to read a thread dump and use of java flight recorder
Ability to quickly script for new alarms and execute according to company approved alerting tools
Ability to dynamically engage and attend high impact production incidents and troubleshoot to resolution and provide immediate incident analysis both written and spoken
Experience setting up distributed tracing across an internet topology for full health check and with the ability to pinpoint problem source
Ability to mentor the platform teams by training, documenting, certifying and building the team’s skill set
Demonstrate knowledge/understanding of emerging technologies, industry trends, and outside perspectives, and communicate relevance to the organizations strategic and tactical goals
Knowledge and understanding of AI capabilities
Experience leading proof of concepts and prototyping
Experience leading the execution of critical/complex project deliverables
Strong communication with the ability to communicate on all levels of the organization

Job Expectations:

This position offers a hybrid work schedule - ability to work in office
This position is not eligible for Visa sponsorship
Relocation assistance is not available for this position