Benchmarking and Training Web Agents in Highly Realistic Simulators via Verifiable Rewards
The First Benchmark for Browser Infrastructure Stealth
Benchmarking browser agents on ~2.5k tasks across 452 websites