267 TOKENS PER SECOND ON A SINGLE RTX 5080 this is ollama running llama 3.2 1b and it’s not even a large model but the speed is the whole point two years ago getting 30 tokens per second on consumer hardware felt like a win and now a single gpu is doing nearly 10x that the gap
267 tokens per second on a single RTX 5080
Ollama runs Llama 3.2 1b at impressive speeds.