Benchmarks Test Biological Knowledge and Skills

LifeSciBench tests models' reasoning and decision-making under real-world constraints.

Benchmarks often test biological knowledge or narrow skills. The tasks in LifeSciBench test whether models can reason from evidence, work with scientific artifacts, handle uncertainty, and make useful decisions under real-world constraints. GPT‑Rosalind scores above GPT‑5.5