./instances/rtx-pro-6000
NVIDIA’s Universal AI Chip
Blackwell architecture meets multi-purpose workloads

Unleash Next-Gen GPU Power.
Bare Metal, No Abstraction
Get ready to run any workload that demands real GPU muscle. With 96GB of VRAM per card, check out what the g4.rtx6kpro powered by NVIDIA’s Blackwell architecture can handle:
LLM training & fine-tuning
No hypervisor overhead, no noisy neighbors — every cycle and memory channel is yours.
Inference at scale
Deliver real-time inference throughput with consistent tail latencies, even at high query volumes.
Rendering & VFX
Power GPU render farms with deterministic performance per node — ideal for CGI, animation, simulation workflows.
Gen-ai & Diffusion Models
Optimize large model pipelines involving tensor ops, vision, and multimodal inference.
Scientific computing & HPC
Utilize full double & mixed-precision performance across GPU cores for physics, molecular, and engineering workloads.
THE GPU 'SWEET SPOT'
AI infrastructure teams often face a trade-off: wait months and overpay for top-tier GPUs, or settle for older chips that can’t handle today’s compute demands.
The RTX PRO 6000 resolves that tension: With Blackwell architecture, next-gen tensor cores, and generous VRAM, it delivers the performance needed for LLM fine-tuning, and inference at scale.
LLM Inference Maximum Throughput
(Tokens/s)
| Llama v4 Scout | 17857 1105 |
| Llama v3.3 70B | 4776 1694 |
| Llama v3.1 8B | 22757 8471 |
RTX PRO 6000 Server Edition
NVIDIA L40S
GPU Comparison Table
| GPU model | Architecture | Memory (GB) | Max model size* | Tensor cores | Tensor cores gen |
|---|---|---|---|---|---|
| NVIDIA B200 | Blackwell | 192 | 98B | 1056 | 5ᵗʰ |
| NVIDIA RTX PRO 6000 | Blackwell | 96 | 48B | 752 | 5ᵗʰ |
| NVIDIA H100 PCIE | Hopper | 80 | 40B | 456 | 4ᵗʰ |
| NVIDIA L40S | Ada Lovelace | 48 | 24B | 568 | 4ᵗʰ |
| *FP16 |
An Instance for Every Need
From LLM training to Generative AI inference, check out the different variations of our instances powered by NVIDIA’s RTX PRO 6000 Server Edition and deploy the one that best fits your requirements.
For maximum performance and zero overhead, choose RTX PRO 6000 bare metal.
g4.rtx6kpro.large
Starting at $11,529.58/mo
Metal GPU
GPU8 x NVIDIA RTX PRO 6000 Server Edition
VRAM768 GB Total
CPUDual AMD 9355, 64 Cores @ 3.55 GHz
RAM1.5 TB of DDR5
STR4 x 3.8TB NVME
NIC2 x 100 Gbps
For flexible usage and the ultimate scalability, go with the RTX PRO 6000 VM instead.
vm.rtx6kpro.small
GPU VM
GPU1x NVIDIA RTX PRO 6000 Server Edition
VRAM96 GB
vCPU16 vCPUs
RAM128 GB of DDR5
STR500 GB local
Get the most out of your GPUs on bare metal
Deploy fully dedicated NVIDIA RTX PRO 6000 Blackwell instances with zero virtualization overhead, ultra-low latency, and total resource control.

Dedicated & Deterministic Performance
No hypervisor overhead, no noisy neighbors — every cycle and memory channel is yours.
Blackwell Architecture, Built for Scalability
Next-gen GPUs with improved throughput, AI performance, and compute density over prior generations.
Ultra-Low Latency Networking
Leveraging NVIDIA’s Spectrum X-800 network, our platform enables sub-microsecond RDMA and GPU peer-to-peer transfers across nodes.
Seamless Integration & Control
Full CUDA, cuDNN, TensorRT support with root-level access, driver customization, and GPU direct RDMA.
Accelerated AI Networking powered by Spectrum-X *
We’ve integrated NVIDIA’s Spectrum X-800 switches across our data centers, enabling your GPU nodes to interconnect with minimal latency, full RDMA support, and high bisection bandwidth.
Low-latency RDMA & GPUDirect
Enable zero-copy data paths directly between GPU memory across nodes or within the same rack, eliminating host CPU involvement in inter-GPU transfers.
High-throughput, non-blocking fabric
Massive internal bandwidth ensures your GPUs don’t bottleneck on the network, even during all-to-all communication or large parameter exchanges.
Isolation & QoS built-in
Dedicated lanes and traffic shaping guarantee that your data flows remain predictable and secure, even at scale.
Scale clusters seamlessly across racks and zones
Build multi-node training or inference clusters with full high-speed connectivity across the fabric.
* Spectrum-X is only available in selected instances. You can request access to Spectrum-X powered instances through this form.
Purpose-built for AI workloads
Enjoy complete control over hardware configurations and optimized software stacks tailored to your AI workloads. Integrate seamlessly with your favorite AI tools and libraries, and never look back.
./built-for-developers
Start with Terraform
Create GPU instances programmatically with Latitude.sh's powerful, friendly API. Get started quickly with integrations like Terraform and client libraries for your preferred programming language.