Original listing text, shown exactly as published by the company.
THE OPPORTUNITY
We are looking for early-career Software Engineers to join our team. This is a specialized role sitting at the intersection of high-performance computing (HPC) and Large Language Model (LLM) engineering. You will be responsible for building the automated "speedometer and diagnostic" suite for our next-generation AI infrastructure.
In this role, you won’t just be using models; you will be tearing them apart to see how they run on the metal. You will build tools that measure GPU FLOPS, stress-test InfiniBand clusters, and define the benchmarks that ensure our systems are production-ready.
RESPONSIBILITIES
- Performance Benchmarking: Run and automate standard LLM quality benchmarks (GSM8K, MMLU) alongside custom performance suites for specific workloads (e.g., long-context window, KV cache reuse).
- Infrastructure Validation: Create automated acceptance tests for new GPU clusters across x86 and ARM systems, measuring GPU memory bandwidth, networking throughput, and multi-node networking performance.
- Model Dev Experience: Develop and maintain internal GPU-enabled development environments (similar to GitHub Codespaces). You will ensure the team has seamless, high-performance "dev machines" optimized for model experimentation.
- Tool Development: Build and contribute to tools such as InferenceMAX and genai-bench to automate model evaluation and optimization.
- Deep Hardware Profiling: Use PyTorch Profiler and NVIDIA Nsight Systems to collect performance profiles, identify bottlenecks, and debug the NVIDIA compute/networking stack.
- Monitoring & Observability: Develop real-time dashboards and alerts to monitor system health, model startup times, and runtime performance.
- Continuous Integration: Automate performance testing via CI/CD pipelines to catch regressions in model setups before they hit production.
- Optimization Automation: Build tools to find the "Pareto frontier"—identifying the absolute best configuration (latency vs. cost vs. quality) for a given model and workload.
WHAT WE'RE LOOKING FOR
This is a fresher-friendly role. We care more about your trajectory, curiosity, and technical depth than your years of experience. We want to talk to you if you have:
- A Love for Systems & Hardware: You aren’t just interested in the AI; you want to understand GPU memory subsystems, InfiniBand, and how data moves across a cluster.
- An Automation Mindset: You believe that if a task has to be done twice, it should be scripted. You have a passion for stress-testing and fuzzy testing to find the "breaking point" of a system.
- Mathematical Curiosity: A desire to understand the underlying math of Transformers and how it translates into FLOPs and memory requirements.
- Interest in Optimization: You are excited to learn about (or already play with) quantization, speculative decoding, disaggregated serving, and kernel-level optimizations.
- Technical Toolkit: Familiarity with Python, and an eagerness to master the NVIDIA software stack. C++ familiarity is good to have.
WHY THIS ROLE
- Direct Impact: Your tools will be the gatekeeper for what defines "good" performance for our customers.
- Deep Learning (Literally): You will gain world-class expertise in GPU orchestration and LLM inference that few engineers in the industry possess.
- High Ownership: As a small team of freshers led by experts, you will have the autonomy to build tools from scratch and contribute to open-source projects.