Principal Architect at ePlus inc

San Ramon, CA 94583, USA -

Full Time

Start Date

Immediate

Expiry Date

28 Nov, 25

Salary

190000.0

Posted On

28 Aug, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Intel, Netapp, Storage Solutions, Stack, Bcm, Orchestration, Communication Skills, Containerization, Dell, Lenovo, Operating Systems, Fine Tuning, File Systems, Parallel Processing, Nvidia, Cluster Management, Kubernetes

Industry

Information Technology/IT

Description

Overview:
ePlus Technology, inc. is seeking a highly skilled and experienced Principal Architect specializing in High Performance Compute (HPC) & AI Infrastructure Design to join our presales team in the Bay Area. In this critical role, you’ll be the technical expert responsible for understanding customer needs, designing robust and scalable HPC / AI solutions, and articulating their value proposition to a diverse range of clients. You’ll focus on the latest HPC / AI technologies from leading compute vendors such as NVIDIA, AMD, Intel, Lenovo, HPE, and Dell, alongside high-performance storage solutions like VAST Data, Pure Storage (FlashBlade), NetApp, and Hammerspace. You’ll also work with critical MLOps platforms, NVIDIA’s AI Enterprise software stack, and advanced GPU orchestration tools, ensuring our clients can tackle their most demanding computational and AI/ML challenges.
Responsibilities:

QUALIFICATIONS

Bachelor’s degree preferred
5+ years of applicable technical pre-sales engineering experience
Specific Technical Focus: Demonstrated expertise in High Performance Compute (HPC / AI) architectures, including parallel processing, distributed computing, and relevant software stacks.
Vendor Knowledge: Hands-on experience and architectural understanding of HPC / AI solutions from leading compute vendors such as NVIDIA, AMD, Intel, Lenovo, HPE and Dell.
Storage Expertise: Strong understanding and practical experience with high-performance storage solutions, with specific emphasis on all-flash, parallel file systems (e.g., Lustre, BeeGFS), global namespace (like Hammerspace), and next-generation storage architectures (like VAST Data, Pure Storage, and NetApp).
Networking Knowledge: Familiarity with high-speed networking technologies used in HPC / AI environments (e.g., InfiniBand, high-speed Ethernet, RoCE v2).
Operating Systems: Proficiency with Linux-based operating systems commonly used in HPC / AI environments.
Containerization: Familiarity with containerization technologies (e.g., Docker, Kubernetes) for HPC and AI/ML workloads.
AI/ML Expertise:
Comprehensive understanding of the AI/ML lifecycle and stack, including familiarity with MLOps platforms (e.g., Weights & Biases, MLflow, Kubeflow).
Expertise in LLM models and inference techniques like RAG and fine-tuning.
AI/ML Cloud Knowledge: Familiarity with leading AI/ML Cloud platforms (e.g., AWS, Azure, Google Cloud) and a readiness to deepen expertise in their HPC and AI/ML specific offerings (e.g., AWS SageMaker, AWS Bedrock, Azure HPC / AI, Google Cloud AI Platform).
Cluster Management & Orchestration: Understanding and familiarity with HPC / AI cluster management and workload orchestration platforms (e.g., Slurm, Kubernetes, NVIDIA Run:ai), including monitoring tools (e.g., Bright Cluster Manager (BCM), Prometheus, Grafana) is a plus.
Customer-Facing Experience: Proven ability to build strong relationships with clients, act as a trusted advisor, and prior experience at a customer or in a presales or customer-facing technical consulting role at a VAR or technology vendor.
Problem-Solving: Excellent analytical and problem-solving skills, with the ability to translate complex technical concepts into clear, actionable solutions.
Communication: Exceptional written and verbal communication skills, with the ability to present complex technical information to both technical and non-technical audiences.

PHYSICAL REQUIREMENTS

While performing this role, you will engage in both seated and occasional standing or walking activities. We provide reasonable accommodations, in accordance with relevant laws, to support success in this position.
By embracing our values, you will contribute to our collective mission of making a positive impact within our organization and the broader community. We understand that this job description serves as a guide and is not an employment contract.

Responsibilities

Technical Presales Leadership: Act as the primary technical presales resource for HPC / AI opportunities, engaging with customers from initial discovery to solution proposal and design.
Solution Design & Architecture: Design and architect complex HPC / AI solutions, including compute clusters (CPUs, GPUs, interconnects), high-performance storage systems, networking, and related infrastructure.
Customer Engagement: Conduct technical presentations, workshops, and white boarding sessions with clients to deeply understand their unique HPC and AI/ML requirements, workflows, and performance objectives.
Technical Authority: Serve as a subject matter expert on HPC and AI/ML infrastructure trends, technologies, and best practices, providing insights and recommendations to both internal teams and external customers.
Vendor Collaboration: Work closely with key technology partners such as NVIDIA, AMD, Intel, Lenovo, HPE, Dell, VAST Data, Pure Storage, NetApp, Hammerspace, and staying current on product roadmaps, new features, and integrated solutions.
Proof-of-Concept (POC) Support: Support technical validation efforts, including designing and overseeing proof-of-concept engagements to demonstrate solution capabilities and performance.
Proposal Development: Contribute to the technical sections of proposals, Statements of Work (SOWs), and bills of material (BOMs), ensuring accuracy and alignment with customer needs.
Competitive Analysis: Stay informed about the competitive landscape in the HPC and AI/ML infrastructure market and articulate ePlus’ unique differentiators.
Knowledge Transfer: Share knowledge and best practices with the broader ePlus sales and engineering teams