Principal Architect at Data Direct Networks
Remote, Oregon, USA -
Full Time


Start Date

Immediate

Expiry Date

01 Aug, 25

Salary

0.0

Posted On

01 May, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Nfs, Participation, Enterprise Storage, Kubernetes, Training, File Systems, Mttr, Weka, Value Creation, Vast, Python, Replication, Reliability, Posix, Distributed Systems, Mitigation Strategies, Bash, Tcpdump

Industry

Information Technology/IT

Description

Overview:
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world’s most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

“THE REAL DIFFERENTIATOR IS DDN. I NEVER HESITATE TO RECOMMEND DDN. DDN IS THE DE FACTO NAME FOR AI STORAGE IN HIGH PERFORMANCE ENVIRONMENTS” - MARC HAMILTON, VP, SOLUTIONS ARCHITECTURE & ENGINEERING | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description:
As a Principal Architect - AI In-Market Engineering, you’ll be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands-on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges at scale.
You’ll collaborate with Engineering, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time-to-resolution, streamline diagnostics, and elevate the support experience for strategic customers.

PRODUCT KNOWLEDGE & VALUE CREATION

  • Be the subject-matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc.
  • Reproduce complex customer issues and propose product improvements or workarounds.
  • Author and maintain detailed runbooks, performance tuning guides, and RCA documentation.
  • Feed real-world support insights back into the development cycle to improve reliability and diagnostics.

REQUIRED QUALIFICATIONS

  • 15+ years in enterprise storage, distributed systems, or cloud infrastructure
  • Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals.
  • Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf).
  • Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures.
  • TCP/IP / Network top expert.
  • Exposure to RDMA, NVMe-oF, or high-performance networking stacks.
  • Exceptional communication and executive reporting skills.
  • Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR.

PREFERRED QUALIFICATIONS

  • Experience with DDN, VAST, Weka, or similar scale-out file systems.
  • Expert scripting/coding ability in Python, Bash, or Go.
  • Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry.
  • Knowledge of replication, consistency models, and data integrity mechanisms.
  • Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures.
    This position requires participation in an on-call rotation to provide after-hours support as needed.
Responsibilities

Loading...