Staff Site Reliability Engineering - Linux, Containers, Terraform, Cloud, G at Visa

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

05 Jul, 26

Salary

0.0

Posted On

06 Apr, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Site Reliability Engineering, Linux, Docker, Kubernetes, Terraform, Jenkins, Ansible, Java, Python, Shell Scripting, JavaScript, Cloud Infrastructure, CI/CD, Generative AI, Automation, Troubleshooting

Industry

IT Services and IT Consulting

Description

Company Description Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters — to you, to your community, and to the world. Progress starts with you. Job Description Our systems and applications are exceedingly available and exceptionally performing We are into the business which touch everyones pocket if card is being used for transaction Those transactions are happening within fraction of seconds In addition within a second thousands of transactions happen concurrently Our service is not limited to one junction we are in 200 countries Security is our prior concern We develop an environment which is reliable for transactions happening more than a trillion in value Millions of cyberattacks is always challenging and give us more confidence to make our environment more assured Availability is ensuring to security Our environments are available round the clock As technology is changing day by day we believe in continuous enhancement of ourselves and like to deep dive of present day knowledge What are responsibilities of Product Site Reliability Engineer: Prime responsibility of Site Reliability Engineer is to make sure that environment is secure and safe All security findings should be remediated within required resolution date defined by governance We do not allow outage even for a second If any issue happens as owner of the environment we do the needful to make sure those environments are up and running Root cause analysis should be within hours We make sure that findings are remediated in Production environment after all tests and checks in lower environments As owner of environment we keep track of all activities planned or happening in our environments We are responsible for deploying new code in the environment We look and analyze our environment regularly If there is a manual task we do automation of that We are increasing selfheal capabilities and will continue to do the same until environments become autoheal If a new service is coming under our support or if migration of old environment is going to happen to new technologies we start interaction with product developer to sketch out planning for production As our business is running round the clock we work in shift and synchronize with multiple locations and multiple tracks We make sure that every activity is being recorded as per incident or change management process Technical and related run books need to be prepared and shared with the team This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Preferred Qualifications: 7-11 years of IT experience with expertise in Site Reliability Engineering (SRE), Build and Release Engineering, Cloud Infrastructure, Automation, and Technical Support. B.E./B.Tech in Computer Science or a related discipline. Good understanding of CI/CD technologies and modern software delivery practices. Core skills in Docker, DevOps, Linux, Jenkins, Ansible, Kubernetes, and cloud infrastructure automation. Good experience in Java-based web applications. Hands-on experience in implementing CI/CD processes and improving deployment reliability. Expertise in troubleshooting Java applications, Tomcat services, and Apache web applications. Good exposure to virtualization and container technologies, especially Docker. Ability to build deployment scripts, build pipelines, and automated solutions using Shell scripting, JavaScript, and Python. Hands-on experience with Docker, including creating and managing containers, images, and Dockerfiles. Experience creating and managing deployments, services, and ingress flows for application setup in Kubernetes clusters. Participated in release-level discussions and worked across the full SDLC in Agile environments. Provided on-call support for DevOps and production operations activities. Strong ability to work as a team player. Good written and verbal communication skills. Excellent problem-solving and troubleshooting abilities. Ability to effectively prioritize and coordinate tasks and deliverables. Quick learner with the ability to adopt and implement latest technology trends in the industry. GenAI Expectations Strong understanding of Generative AI (GenAI) concepts and their application in SRE, DevOps, and operational workflows. Ability to build GenAI-powered workflows for incident analysis, alert summarization, runbook generation, root cause analysis support, and knowledge automation. Experience writing code and scripts using Python, Shell, and JavaScript to integrate GenAI capabilities into engineering and operational processes. Ability to use GenAI for continuous service improvement, including log analysis, alert correlation, deployment validation, service insights, and automation enhancement. Experience designing intelligent automation solutions that improve operational efficiency, service reliability, and engineering productivity. Understanding of prompt design, workflow orchestration, API integration, and automation guardrails for production use cases. Ability to apply GenAI responsibly with attention to security, validation, governance, and reliability considerations. Capability to operate at a Staff SRE Engineer level, driving AI-assisted reliability improvements, MTTR reduction, observability enhancement, and operational excellence. Experience identifying service pain points and implementing scalable GenAI-assisted solutions to reduce manual effort and improve support outcomes. Ability to partner with cross-functional teams to develop practical, production-ready GenAI workflows for modern platform and service operations. Additional Information Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law. Job Family Group: Engineering and Technology

Responsibilities

The Site Reliability Engineer is responsible for ensuring the security, availability, and performance of production environments through automation and proactive monitoring. They manage code deployments, remediate security findings, and perform root cause analysis to maintain system reliability.