Back to AI Pulse

267 tokens per second on a single RTX 5080

Ollama runs Llama 3.2 1b at impressive speeds.

267 TOKENS PER SECOND ON A SINGLE RTX 5080 this is ollama running llama 3.2 1b and it’s not even a large model but the speed is the whole point two years ago getting 30 tokens per second on consumer hardware felt like a win and now a single gpu is doing nearly 10x that the gap

Source
267 tokens per second on a single RTX 5080 | AI Pulse