Today’s AI agents still struggle to pass real human-verification checks (CAPTCHAs) on websites. The paper proposes HLL, a benchmark where agents must solve 10 types of CAPTCHA tasks by seeing the page, clicking or dragging correctly, tracking state, and submitting the answer. A
AI agents struggle with CAPTCHA verification
Current AI agents struggle to pass real human verification checks.