Sr. Engineer, Operations System (Disaster Recovery Engineer) at DaVita
United States, , USA -
Full Time


Start Date

Immediate

Expiry Date

01 Nov, 25

Salary

147000.0

Posted On

04 Aug, 25

Experience

7 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

It Infrastructure Operations, Network Automation, Business Continuity, Distributed Systems, Disaster Recovery, Software Defined Networking, Backup Solutions, Google Cloud, Soft Skills, Information Technology, Computer Science

Industry

Information Technology/IT

Description

POSITION OVERVIEW:

DaVita IT Operations is seeking an experienced Disaster Recovery Engineer to enhance our IT resilience and operational readiness. This position will ensure that DaVita’s Business and Digital Continuity plans and Recovery Environments are current and frequently tested to be able to handle any type of data loss event. This position will work closely with the DaVita Information Technology and Business Leadership to help design, implement, and maintain advanced infrastructure solutions with a focus on fault domain isolation, multi-region reliability, cloud tiering, and elastic compute capabilities to support seamless business continuity. This individual is skilled in configuring and managing Virtual Private Clouds (VPCs), routing, network security services, load balancing, and Cloud DNS. Additionally, they are proficient in setting up hybrid connectivity through Cloud Interconnect and Cloud VPN. Their expertise extends to diagnosing, monitoring, and troubleshooting network operations by using Google Cloud Observability and the Network Intelligence Center.

PREFERRED SKILL SETS/EXPERIENCE:

  • Hands-on experience designing and building fault-tolerant solutions that run in on-premises, hybrid, and cloud-native environments for enterprise environments.
  • Experience with Disaster Recovery and Digital Continuity operations in large, highly regulated environments (Healthcare or Financial).
  • Knowledge of best-practices for recovery operations of traditional, virtualized, and containerized workloads.
  • Extensive knowledge of incident management, including coordination of large-scale recovery operations.
  • Extensive knowledge of physical, virtual, and cloud-based IT Infrastructure operations.
  • Expertise in traditional and software-defined networking, including extensive experience with code-based network management on major public cloud platforms.
  • Experience with network automation and orchestration tools.
  • Proficiency in cloud platforms (AWS, Azure, Google Cloud), including tiering and backup solutions.
  • Deep knowledge of fault domain isolation and strategies for distributed systems.
  • Familiarity with elastic compute platforms and tools for auto-healing and dynamic scaling.

EDUCATION:

  • Bachelor’s degree in Information Technology, Computer Science, or a related field.

EXPERIENCE:

  • 7+ years in infrastructure engineering with a focus on business continuity, disaster recovery, or cyber resiliency.
  • Proven experience with Hybrid, Multi-Region, and Multi-Cluster architectures.
  • Expertise in designing and managing cloud tier storage and backup solutions for off-premises and multi-cloud environments.

SOFT SKILLS:

  • Strong analytical and problem-solving skills to anticipate and mitigate risks.
  • Effective communication and collaboration skills for cross-functional teams, including technical and business leaders.
  • Ability to prioritize and manage multiple complex projects simultaneously.
Responsibilities
  • Design, implement, maintain, and test recovery environments that contain multiple operating systems, databases, and applications.
  • Conduct Disaster Recovery / Digital Continuity discovery with application owners and key business units.
  • Document procedures for various types of disasters, including large-scale or targeted attacks, natural disasters, or events caused by human error.
  • Work with vendors and suppliers to stay current on security vulnerabilities, potential data loss events, and emerging trends in recovery operations.
Loading...