Loading
Quick disclaimer: this is a experiment, not a theological statement. Every response comes straight from each model’s public API no extra prompts, no user context. I’ve rerun the test several times and the outputs do shift, so don’t expect identical answers if you try it yourself.
Model | Reply | Latency | Cost |
---|---|---|---|
Mistral Small | No | 0.84 s | $0.000005 |
Grok 3 | Yes | 1.20 s | $0.000180 |
Gemini 1.5 Flash | No | 1.24 s | $0.000006 |
Gemini 2.0 Flash Lite | No | 1.41 s | $0.000003 |
GPT-4o-mini | Yes | 1.60 s | $0.000006 |
Claude 3.5 Haiku | Yes | 1.81 s | $0.000067 |
deepseek-chat | Maybe | 14.25 s | $0.000015 |
Claude 3 Opus | Long refusal | 4.62 s | $0.012060 |
Full 25-row table + blog post: ↓
Full Blog
👉 Try it yourself on all 25 endpoints (same prompt, live costs & latency):
Try this compare →
Just a test, but a neat snapshot of real-world API behaviour.
submitted by /u/Double_Picture_4168
[link] [comments]