🧰
TheDevToolbox
Search tools...
Ctrl+K
Blog
LLM RAM Calculator
LLM RAM Calculator
Calculate VRAM/RAM requirements for running large language models
Llama 3.1 8B
Model Selection
Select Model
Llama 3.1 8B
Llama 3.1 70B
Llama 3.1 405B
Llama 2 7B
Llama 2 13B
Mistral 7B
Mixtral 8x7B
Mixtral 8x22B
Falcon 7B
Falcon 40B
GPT-J 6B
Phi-3 3.8B
Gemma 2B
Gemma 7B
Qwen2 7B
Qwen2 72B
DeepSeek V2 236B
CodeLlama 34B
Custom
Context Length
2K tokens
4K tokens
8K tokens
16K tokens
32K tokens
64K tokens
128K tokens
Batch Size
1 (single)
2
4
8
16
32
8B
Parameters
32
Layers
4096
Hidden
Memory Analysis
8B params, ~5.5 GB INT4
Memory Requirements
Quant
Model
+KV
Total
FP32 (Full)
32.0 GB
~4.0 GB
~39.6 GB
FP16 (Half)
16.0 GB
~2.0 GB
~19.8 GB
INT8 (8-bit)
8.0 GB
~1.0 GB
~9.9 GB
INT4 (4-bit)
4.0 GB
~1.0 GB
~5.5 GB
Will It Fit?
RTX 3060 12GB
FP16
INT8
INT4
RTX 3070 8GB
FP16
INT8
INT4
RTX 3080 10GB
FP16
INT8
INT4
RTX 3090 24GB
FP16
INT8
INT4
RTX 4060 8GB
FP16
INT8
INT4
RTX 4070 12GB
FP16
INT8
INT4
RTX 4080 16GB
FP16
INT8
INT4
RTX 4090 24GB
FP16
INT8
INT4
A100 40GB
FP16
INT8
INT4
A100 80GB
FP16
INT8
INT4
H100 80GB
FP16
INT8
INT4
Apple M1 8GB
FP16
INT8
INT4
Apple M1 16GB
FP16
INT8
INT4
Apple M2 Max 32GB
FP16
INT8
INT4
Apple M2 Max 96GB
FP16
INT8
INT4
Apple M3 Max 48GB
FP16
INT8
INT4
Apple M3 Max 128GB
FP16
INT8
INT4
Click
Select Model
Related Tools, Tips & More