# UseDesktop Evals UseDesktop Evals publishes verifier-backed RL environments for computer-use agents. The site is the public evidence layer for UseDesktop workflow environments: task prompts, mock app state, grader contracts, verifier assumptions, pass@k model results, failure modes, and run evidence. It is designed for researchers, data buyers, and teams evaluating computer-use agent post-training data. Core entities: - RL environment: the app or workflow state a model is situated in. - Task: the prompt or objective the model must complete. - Grader: the quantitative scoring contract for the task. - Run: one model attempt with score, reward, verdict, and trace evidence. - Model: a tested CUA adapter with pass@k and average score summaries. UseDesktop focuses on evidence, not volume: solvability checks, verifier audits, difficulty calibration, contamination control, and model-run evidence for workflow data. Canonical sections: - `/` introduces the public eval surface and current catalog. - `/tasks` lists verifier-backed task prompts and grader summaries. - `/environments` lists public RL environments. - `/environments/{slug}` shows source workflow, reset behavior, action space, grader, and model result summaries. - `/environments/{slug}/tasks/{task}` shows a single task's prompt, environment state, grader contract, pass@k results, and failure taxonomy. - `/models` lists tested computer-use models. - `/runs/{runId}` shows a single model attempt with score evidence. Related UseDesktop sites: - Main site: https://usedesktop.com - Blog: https://blog.usedesktop.com - App: https://app.usedesktop.com/dashboard