Staff Site Reliability Engineer (Swing shift 4 days a week) at Servicenow

Dublin, County Dublin, Ireland -

Full Time

Start Date

Immediate

Expiry Date

12 Nov, 25

Salary

0.0

Posted On

13 Aug, 25

Experience

8 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Good communication skills

Industry

Information Technology/IT

Description

Company Description
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.
Job Description
In this role, the Site Reliability Engineer (SRE) will be responsible for managing and resolving the most challenging issues for the ServiceNow SRE team, focusing on instance performance, reliability, and availability. This is a swing shift role (4 days a week) and the candidate must be located within the Republic of Ireland.

ADDITIONAL GOOD TO HAVE QUALIFICATION:

Certifications in one or more public cloud platforms
Exposure to DevOps and Agile methodologies.
Familiarity with CI/CD pipelines and tools like Jenkins or GitLab CI.
Understanding of development on ServiceNow platform
Additional Information

Responsibilities

WHAT YOU GET TO DO IN THIS ROLE:

Provide relief and sustainable resolution to issues within our infrastructure.
Conduct root cause analysis of incidents and implement preventive measures.
Participate in troubleshooting bridges and provide support during critical incidents.
Use your experience in software development, systems engineering, and networking to proactively prevent repeatable issues.
Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Design, develop, and maintain scalable and reliable systems.
Implement and manage monitoring, alerting, and incident response processes.
Collaborate with development teams to ensure the reliability and performance of new features.
Automate repetitive tasks to improve efficiency and reduce human error.
Innovate and continuously improve system reliability, performance, and capacity.
Qualifications

TO BE SUCCESSFUL IN THIS ROLE YOU MUST HAVE:

Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI’s potential impact on the function or industry.
8+ years of experience in a Site Reliability Engineering or similar role.
A degree in Computer Science, Engineering, or a related field.
Self-motivated go-getter attitude with a proven ability to lead and drive initiatives across the organization.
The ability to inspire collaboration, navigate ambiguity, and drive initiatives from concept to successful execution, consistently delivering impactful results.
Extensive experience with ITIL-based IT operations, including incident, problem, and change management.
Advanced expertise in Unix/Linux system administration, including troubleshooting memory, processes, storage, network connectivity, and performance issues using command-line utilities and shell scripting.
Proficient in automation tools and security best practices, ensuring robust, scalable, and secure production operations across diverse environments.
Comprehensive knowledge of networking protocols, including TCP/IP, DNS, HTTP/HTTPS, TLS/SSL, FTP/SFTP, DHCP, among others.
Solid experience with relational databases such as MySQL or Postgres, including performance tuning and query optimization.
Experience with infrastructure-as-code and configuration management tools like Terraform, Puppet or Ansible.
Strong programming skills in languages such as Python, Go, or Java.
Cloud experience across AWS, Azure, or GCP;
Proficiency in using monitoring and logging tools like Splunk, Prometheus, Grafana, or ELK stack.
Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
Excellent problem-solving skills and attention to details.
Excellent written and verbal communication skills with the ability to clearly articulate solutions to technical problems
Ability to work in shifts which are from 3 pm to 1 am.