NVIDIA L40S: The Underrated AI GPU for 2026
NVIDIA L40S: The Underrated AI GPU for 2026
What Is the L40S?
The NVIDIA L40S is a professional GPU based on the **Ada Lovelace architecture** (the same generation as RTX 4090), released in late 2023. It sits in an interesting position: more capable than a consumer RTX 4090 in sustained workloads, far cheaper than an A100 or H100, and surprisingly well-suited for AI inference.
Key Specifications
| Spec | L40S | A100 80GB | H100 SXM |
|---|---|---|---|
| Architecture | Ada Lovelace | Ampere | Hopper |
| VRAM | 48GB GDDR6 | 80GB HBM2e | 80GB HBM3 |
| Memory Bandwidth | 864 GB/s | 2,000 GB/s | 3,350 GB/s |
| FP16 TFLOPS | 362 | 312 | 989 |
| TDP | 350W | 400W | 700W |
| Form Factor | PCIe | SXM/PCIe | SXM/PCIe |
What Makes It Different from A100?
The L40S uses **GDDR6** instead of HBM2e, which means lower memory bandwidth — a disadvantage for memory-bandwidth-bound workloads like large LLM inference.
However, the L40S has **higher FP16 compute** than the A100, making it excellent for:
The 48GB GDDR6 is also faster to access sequentially than might seem from the bandwidth numbers alone, due to cache design differences.
Performance Benchmarks: Inference
LLaMA 3 8B (FP16, batch size 8)
| GPU | Tokens/sec | Cost/hr | Cost/1M tokens |
|---|---|---|---|
| RTX 4090 | 3,200 | $0,44 | ~$0,038 |
| L40S | 4,800 | $0,95 | ~$0,055 |
| A100 80GB | 5,100 | $1,89 | ~$0,103 |
| H100 SXM | 11,200 | $3,99 | ~$0,099 |
The L40S offers compelling throughput at a mid-range price point.
Stable Diffusion XL (images/min)
Pricing on Cloud Platforms
| Platform | L40S Price/hour |
|---|---|
| RunPod | $0,89–1,10/hr |
| Vast.ai | $0,72–0,99/hr |
| Lambda Labs | $1,20/hr |
**Sweet spot pricing** between consumer and enterprise GPUs.
Ideal Use Cases
Multi-modal AI inference: — vision-language models (LLaVA, Qwen-VL) benefit from the L40S's strong FP16 compute
Large inference batches: — 48GB VRAM handles batches of 13B models at full precision
Image generation at scale: — Flux.1, SDXL, SD 3.0 run excellently
Fine-tuning mid-size models: — 7B–13B with QLoRA fits comfortably
When to Choose L40S Over Alternatives
**Choose L40S if:**
**Choose A100 if:**
Availability
The L40S has grown in availability throughout 2025–2026, with RunPod and Vast.ai both listing healthy pools of L40S instances.
Conclusion
The L40S is genuinely underrated. For inference workloads and image generation, it delivers strong performance at a price between consumer and full enterprise GPUs. It's the sweet spot many teams overlook.
Related Articles
NVIDIA H200 GPU Cloud: Pricing and Availability in 2026
The H200 packs 141 GB of HBM3e memory and 4.8 TB/s bandwidth. Here is what cloud providers charge for it, who needs it, and when the H100 is still the better choice.
RTX 5090 in the Cloud: Is Blackwell Worth It for AI?
The RTX 5090 brings NVIDIA Blackwell to the consumer tier with 32GB GDDR7. We break down cloud pricing, performance vs RTX 4090 and H100, and exactly when it makes sense.
Cheapest GPU Cloud Providers in 2026
A comprehensive ranking of the most affordable GPU cloud providers in 2026. Find the lowest prices for H100, A100, RTX 4090, and more.