How to Estimate AI Training Costs Before You Start
How to Estimate AI Training Costs Before You Start
Why Estimate First?
GPU cloud bills can surprise even experienced ML engineers. A misconfigured training script, an unexpectedly long run, or forgetting to terminate an instance can turn a $50 experiment into a $500 mistake. Estimating costs upfront takes 10 minutes and can save hundreds of dollars.
Step 1: Estimate Training FLOPs
For transformer models, the standard rule of thumb is:
**FLOPs = 6 x N x D**
Where:
This formula (from Chinchilla and Kaplan scaling laws) accounts for forward pass, backward pass, and gradient computation.
**Examples:**
| Model | Params (N) | Tokens (D) | FLOPs |
|-------|-----------|-----------|-------|
| 7B fine-tune (100K samples, 512 tokens avg) | 7e9 | 5.1e7 | 2.1e18 |
| 13B full fine-tune (1M samples) | 1.3e10 | 5.1e8 | 4.0e19 |
| 70B QLoRA (500K samples) | 7e10 | 2.6e8 | 1.1e20 |
Note: LoRA/QLoRA only trains a fraction of parameters (typically 0.1–1%), so actual FLOPs are much lower — divide by 10–100 for LoRA jobs.
Step 2: Convert FLOPs to GPU Hours
Each GPU has a theoretical FLOP/s rating. In practice, model FLOP utilisation (MFU) is 30–50% for well-optimised training runs.
**GPU hours = FLOPs / (GPU FLOP/s x MFU x 3600)**
| GPU | Theoretical BF16 TFLOPS | Practical MFU | Effective TFLOPS |
|-----|------------------------|---------------|-----------------|
| H100 SXM | 989 | 45% | ~445 |
| A100 80GB | 312 | 45% | ~140 |
| RTX 4090 | 165 | 35% | ~58 |
| RTX 5090 | 210 | 38% | ~80 |
**Example: Fine-tuning Llama 3 7B (2.1e18 FLOPs) on a single H100:**
GPU hours = 2.1e18 / (445e12 x 3600) = approximately 1.3 hours
At $2.60/hr for an H100: approximately $3.40 total compute cost.
Step 3: Add Storage Costs
| Item | Typical Cost |
|------|-------------|
| Dataset (100 GB) | $2–5 one-time transfer + $2–5/month storage |
| Checkpoint saves (every N steps) | $0.02–0.10 per checkpoint |
| Model weights (7B in BF16, ~14 GB) | Negligible |
Storage is rarely the dominant cost, but for long training runs with frequent checkpointing, it can add up.
Step 4: Egress Costs
Downloading results, model weights, or logs back to your local machine:
**Tip:** Many providers offer free egress within their ecosystem. Plan your data pipeline to minimise cross-provider transfers.
Step 5: Contingency Budget
Training runs rarely go exactly as planned:
Step 6: Full Cost Estimate Template
Training FLOPs: calculate from N x D x 6
GPU hours needed: FLOPs divided by (effective TFLOPS x 3600)
GPU cost: GPU hours multiplied by hourly rate
Storage: dataset GB at approximately $0.03/GB/month
Egress: output GB at approximately $0.10/GB
Subtotal: sum of above
Contingency (40%): subtotal x 0.40
Total estimate: subtotal plus contingency
Use BestGPUCloud for Final Comparison
Once you have your GPU-hours estimate, use BestGPUCloud to compare live prices across RunPod, Vast.ai, Lambda Labs, and other providers. A 7B fine-tune that costs $3.40 on H100 might cost $8 on A100 or $1.20 on RTX 5090 with quantisation — the right hardware choice can cut your bill by 60–80%.
Related Articles
Best GPU Cloud Providers in 2026: Complete Ranking
We ranked the top GPU cloud providers of 2026 on price, reliability, GPU selection, and developer experience. Here is who comes out on top — and who is best for your specific use case.
Best GPU for LLaMA 3 Fine-Tuning in 2026
Complete guide comparing H100 vs A100 for LLaMA 3 fine-tuning. Cost breakdowns, performance benchmarks, and provider recommendations.
Best GPU Cloud for Stable Diffusion in 2026
GPU requirements for SD 1.5, SDXL, and SD 3.0, best cloud providers with pricing, and how to set up ComfyUI on RunPod for maximum throughput per dollar.