Job Description
Job Purpose
Designs and architect end-to-end AI Cloud platforms with a focus on security, cost-efficiency, and performance. This position involves direct client engagement to translate requirements into technical Solution, encompassing GPU infrastructure rightsizing and optimal model selection. We are looking for a cloud expert with a demonstrated ability to transition complex AI models from concept to large-scale production. The ideal candidate brings extensive experience in AI/Cloud ecosystems and a successful track record of architecting and managing production-grade, large-scale AI platforms.
Role Summary
Key Responsibilities
Translate business requirements into scalable, high-performance AI/GenAI architectures featuring NVIDIA GPU clusters
Design end-to-end AI Cloud and next-generation platforms optimized for deep learning workloads and distributed training.
Architect HPC cluster topologies utilizing high-speed InfiniBand (NDR/HDR) and RoCE v2 interconnects for low-latency communication.
Right-size platform components, including GPUs, CPUs, memory and NVMe storage for comprehensive client proposals.
Architect distributed training and inference environments optimized for MPI frameworks and workload scheduling via Slurm.
Desing scalable container orchestration platforms using Kubernetes and Kubeflow to manage AI workloads.
Propose optimized inference strategies using vLLM, Triton, and TensorRT-LLM to meet specific latency and throughput KPIs.
Should have experience on RAG systems and multi-agent orchestration frameworks like LangGraph and agentic ecosystems.
Develop private AI cloud environments focused on data sovereignty and regulatory compliance, such as the India DPDP Act.
Define integration strategies for LLMs and open-source models within existing enterprise data systems, APIs, and knowledge graphs.
Establish reference architectures for CI/CD/CT pipelines and automated model retraining workflows to ensure reproducibility.
Implement automation and observability frameworks for monitoring GPU utilization, performance tuning, and failure handling.
Drive technical validation through Proof of Concept (PoC) engagements, focusing on scalability and performance benchmarks for LLM training.
Establish Infrastructure-as-Code (IaC) practices to ensure reproducible and reliable cluster deployments.
Collaborate with C-suite stakeholders and cross-functional teams to drive technical decision-making, innovation, and roadmap alignment.
Experience & Educational Requirements
Qualifications and Experience
EDUCATIONAL QUALIFICATIONS: (degree, training, or certification required)
BE/B-Tech or equivalent with Computer Science or Electronics & Communication
RELEVANT EXPERIENCE: 15 – 20 years of IT Experience with minimum 5 years in AI platform
Required Technical Skills
Core AI/ML Expertise
Strong experience in Nvidia, Intel, Google GPU Architecture, InfiniBand
Strong expertise in Kubernetes, Slurm and OpenShift
Good experience in Python, PyTorch and TensorFlow
Good knowledge on LangChain, LangGraph
Deep understanding of Transformers, Attention mechanisms, Diffusion, MoE
Knowledge of RLHF, Pinecone, FAISS, Chroma, OpenAI, VLLM
Expertise in RAG and agentic AI workflows
Knowledge of high-performance storage (Lustre, PFS, Object NVMe)
Good Knowledge with NVIDIA architectures (Hopper, Blackwell)
Soft Skills
Strong problem-solving and analytical thinking
Excellent communication and stakeholder management
Ability to influence leadership and drive strategic decisions
Innovation mindset with focus on enterprise impact
Preferred Experience
Currently in AI / Cloud Presales team
Should be able to right size infra and choose right GPU model as per client requirement
Hands-on with Python, vector DBs (Pinecone, FAISS, Chroma), and LLM APIs (OpenAI, Anthropic).
Solid understanding of cloud-native architecture OpenStack, KVM, (Azure/AWS/GCP), microservices, Kubernetes, serverless, API gateways.
Good knowledge on deep learning experience: CNNs, RNNs/LSTMs, Transformers, and attention mechanisms.
Proficiency in Python for ML: NumPy, pandas, scikit-learn, and frameworks such as PyTorch or TensorFlow.
Experience in integrating LLMs (GPT, Claude, Gemini, LLaMA, Mistral) into applications.
Prompt engineering skills: zero-shot, few-shot, chain-of-thought, ReAct, and structured output patterns.
Experience building RAG systems: document chunking, embedding models, vector search, and retrieval optimization.
Understanding of AI agent patterns, tool use, and agentic workflows.
Familiarity with Docker, CI/CD pipelines, and Git-based workflows.
Strong communication, stakeholder management, and solution design skills.