Overview:
ePlus Technology, inc. is seeking a highly skilled and experienced Principal Architect specializing in High Performance Compute (HPC) & AI Infrastructure Design to join our presales team in the Bay Area. In this critical role, you’ll be the technical expert responsible for understanding customer needs, designing robust and scalable HPC / AI solutions, and articulating their value proposition to a diverse range of clients. You’ll focus on the latest HPC / AI technologies from leading compute vendors such as NVIDIA, AMD, Intel, Lenovo, HPE, and Dell, alongside high-performance storage solutions like VAST Data, Pure Storage (FlashBlade), NetApp, and Hammerspace. You’ll also work with critical MLOps platforms, NVIDIA’s AI Enterprise software stack, and advanced GPU orchestration tools, ensuring our clients can tackle their most demanding computational and AI/ML challenges.
Responsibilities:
QUALIFICATIONS
- Bachelor’s degree preferred
- 5+ years of applicable technical pre-sales engineering experience
- Specific Technical Focus: Demonstrated expertise in High Performance Compute (HPC / AI) architectures, including parallel processing, distributed computing, and relevant software stacks.
- Vendor Knowledge: Hands-on experience and architectural understanding of HPC / AI solutions from leading compute vendors such as NVIDIA, AMD, Intel, Lenovo, HPE and Dell.
- Storage Expertise: Strong understanding and practical experience with high-performance storage solutions, with specific emphasis on all-flash, parallel file systems (e.g., Lustre, BeeGFS), global namespace (like Hammerspace), and next-generation storage architectures (like VAST Data, Pure Storage, and NetApp).
- Networking Knowledge: Familiarity with high-speed networking technologies used in HPC / AI environments (e.g., InfiniBand, high-speed Ethernet, RoCE v2).
- Operating Systems: Proficiency with Linux-based operating systems commonly used in HPC / AI environments.
- Containerization: Familiarity with containerization technologies (e.g., Docker, Kubernetes) for HPC and AI/ML workloads.
- AI/ML Expertise:
- Comprehensive understanding of the AI/ML lifecycle and stack, including familiarity with MLOps platforms (e.g., Weights & Biases, MLflow, Kubeflow).
- Expertise in LLM models and inference techniques like RAG and fine-tuning.
- AI/ML Cloud Knowledge: Familiarity with leading AI/ML Cloud platforms (e.g., AWS, Azure, Google Cloud) and a readiness to deepen expertise in their HPC and AI/ML specific offerings (e.g., AWS SageMaker, AWS Bedrock, Azure HPC / AI, Google Cloud AI Platform).
- Cluster Management & Orchestration: Understanding and familiarity with HPC / AI cluster management and workload orchestration platforms (e.g., Slurm, Kubernetes, NVIDIA Run:ai), including monitoring tools (e.g., Bright Cluster Manager (BCM), Prometheus, Grafana) is a plus.
- Customer-Facing Experience: Proven ability to build strong relationships with clients, act as a trusted advisor, and prior experience at a customer or in a presales or customer-facing technical consulting role at a VAR or technology vendor.
- Problem-Solving: Excellent analytical and problem-solving skills, with the ability to translate complex technical concepts into clear, actionable solutions.
- Communication: Exceptional written and verbal communication skills, with the ability to present complex technical information to both technical and non-technical audiences.
PHYSICAL REQUIREMENTS
While performing this role, you will engage in both seated and occasional standing or walking activities. We provide reasonable accommodations, in accordance with relevant laws, to support success in this position.
By embracing our values, you will contribute to our collective mission of making a positive impact within our organization and the broader community. We understand that this job description serves as a guide and is not an employment contract.