Original listing text, shown exactly as published by the company.
THE ROLE
Voice is becoming the internet’s next interface, but a production-grade Voice AI system is "hard to build". You’ll join a small founding team of Baseten Voice AI, focused on bringing state-of-the-art open source models into production for Voice AI customers across productivity, customer service, clinical conversation, creator tools, education, and more. You’ll make a meaningful impact on people’s daily lives and help reshape these industries.
This is a high-impact, high-ownership role. You will be the primary owner of Baseten Voice AI - our in-house inference stack to power Voice AI models - from product roadmap through engineering implementation. You’ll partner closely with Forward Deployed Engineers, Model Performance Engineers, and sister engineering teams to push the boundaries of Voice AI.
EXAMPLE INITIATIVES
- Develop world-class model serving stack for state-of-the-art open-source voice models - reduce end-to-end and tail latency (p95/p99), increase throughput, and improve GPU efficiency via profiling, runtime tuning, and server-level optimizations.
- Build large-scale, real-time infrastructure for multi-model voice agents - orchestrate STT, TTS, and agent components with streaming I/O to meet customer SLOs.
- Design tight training and inference iteration loops for voice model customization - enable fast evaluation, safe rollout, and rapid experimentation for custom voice model development.
- Past projects:
- The world's fastest Whisper — with streaming and diarization
- Canopy Labs selects Baseten for Orpheus TTS inference
RESPONSIBILITIES
- Own and lead Voice AI product areas end-to-end - from architecture and system design through implementation, rollout, and long-term production operations.
- Design, build, and operate real-time, large-scale, high-performance model serving systems for STT, TTS, and voice agent workloads for mission-critical customer deployments
- Drive cross-team collaboration with sister engineering teams to solve full-stack technical problems, align on priorities, and coordinate end-to-end delivery across the product surface area
- Mentor teammates through code reviews, design docs, and technical leadership.
REQUIREMENTS
- Bachelor's degree or higher in Computer Science or related field
- Proven track record owning production-grade real-time, large-scale systems where tail latency (p99) matters.
- Proficient coding abilities in one or more popular programming or scripting languages; Python proficiency is a plus.
- Good taste in product, particularly developer-oriented tools
- Interest in ML/AI infrastructure and willingness to learn
- Strong collaboration and communication skills
- Comfortable using AI coding assistants (e.g., Claude Code, Codex, Cursor) as a daily productivity multiplier — as an AI-native company, we see this as a must-have skill.
NICE TO HAVE
- Experience implementing pipeline-level model runtime optimizations such as dynamic batching, async scheduling, or decode-side throughput improvements.
- Experience building developer platforms: SDKs, CLIs, APIs, and self-serve workflows for ML or infrastructure products.
- Experience with containerization and orchestration technologies (Docker, Kubernetes), service meshes, or distributed scheduling.
- Familiarity with speech/audio ML models (STT, TTS, speech-to-speech)
- Familiarity with model-serving runtimes (vLLM, TensorRT, ONNX).
- Familiarity with systems-level performance profiling across host-device boundaries (e.g. PyTorch Profiler), diagnosing GPU utilization issues
- Exposure to customer-facing engineering: pre-sales prototyping, technical discovery, or working directly with customers to ship solutions.