Best OLLAMA (CPU‑only) model for AMD Ryzen 5 5600G + 46 GiB RAM?

By jabbyai
No Comments

Hi everyone,
I’ve got a local dev box with:

OS: Linux 5.15.0-130-generic CPU: AMD Ryzen 5 5600G (12 threads) RAM: 48 GiB total Disk: 1 TB NVME + 1 Old HDD GPU: AMD Radeon (no NVIDIA/CUDA) I have ollama installed and currently I have 2 local llm installed deepseek-r1:1.5b & llama2:7b (3.8G)

I’m already running llama2:7B (Q4_0, ~3.8 GiB model) at ~50% CPU load per prompt, which works well but it’s not too smart I want smarter then this model. I’m building a VS Code extension that embeds a local LLM and in extenstion I have context manual capabilities and working on (enhanced context, mcp, basic agentic mode & etc) and need a model that:

Fits comfortably in RAM
Maximizes inference speed on 12 cores (no GPU/CUDA)
Yields strong conversational accuracy

Given my specs and limited bandwidth (one download only), which OLLAMA model (and quantization) would you recommend?

Please let me know any additional info needed.

TLDR;

As per my findings I found below things (some part is ai sugested as per my specs):

Qwen2.5-Coder 32B Instruct with Q8_0 quantization is the best model (I don’t confirm it, but as per my findings I found this but I am not sure)
models like Gemma 3 27B or Mistral Small 3.1 24B as alternatives, but Qwen2.5-Coder excels (I don’t confirm it, but as per my findings I found this but I am not sure)

Memory and Model Size Constraints

The memory requirement for LLMs is primarily driven by the model’s parameter count and quantization level. For a 7B model like LLaMA 2:7B, your current 3.8GB usage suggests a 4-bit quantization (approximately 3.5GB for 7B parameters at 4 bits, plus overhead). General guidelines from Ollama GitHub indicate 8GB RAM for 7B models, 16GB for 13B, and 32GB for 33B models, suggesting you can handle up to 33B parameters with your 37Gi (39.7GB) available RAM. However, larger models like 70B typically require 64GB.

Model Options and Quantization

LLaMA 3.1 8B: Q8_0 at 8.54GB
Gemma 3 27B: Q8_0 at 28.71GB, Q4_K_M at 16.55GB
Mistral Small 3.1 24B: Q8_0 at 25.05GB, Q4_K_M at 14.33GB
Qwen2.5-Coder 32B: Q8_0 at 34.82GB, Q6_K at 26.89GB, Q4_K_M at 19.85GB

Given your RAM, models up to 34.82GB (Qwen2.5-Coder 32B Q8_0) are feasible (AI Generated)

Model	Parameters	Q8_0 Size (GB)	Coding Focus	General Capabilities	Notes
LLaMA 3.1 8B	8B	8.54	Moderate	Strong	General purpose, smaller, good for baseline.
Gemma 3 27B	27B	28.71	Good	Excellent, multimodal	Supports text and images, strong reasoning, fits RAM.
Mistral Small 3.1 24B	24B	25.05	Very Good	Excellent, fast	Low latency, competitive with larger models, fits RAM.
Qwen2.5-Coder 32B	32B	34.82	Excellent	Strong	SOTA for coding, matches GPT-4o, ideal for VS Code extension, fits RAM.

I have also checked:

https://aider.chat/docs/leaderboards/ (didn’t understand since it’s showing cost & accuracy, but I need cpu, ram etc usage & accuracy)
https://llm-stats.com/models/compare (mostly large models)

submitted by /u/InsideResolve4517
[link] [comments]

No Comments

Uncategorized

Best OLLAMA (CPU‑only) model for AMD Ryzen 5 5600G + 46 GiB RAM?

Leave a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories