desktop
AC AWS Console

Model

Tasks

Investigate a CloudWatch alarm Create an S3 lifecycle rule Review a budget alert Update a security group note Download a cost report
1 2 3 4 5
RL env

AWS Console environment

5 tasks with a model prompt, seeded environment state, and grader contract.

Task 1

Investigate a CloudWatch alarm

Prompt
Open the alarm from the brief, identify the triggering metric, and add the incident note.
Environment
CloudWatch contains multiple alarms with similar service names.
Grader
Checks alarm ID, metric name, and incident note.
Task 2

Create an S3 lifecycle rule

Prompt
Add a lifecycle rule to archive objects under the logs prefix after the requested number of days.
Environment
The S3 bucket exists with no lifecycle rule for the logs prefix.
Grader
Checks prefix, transition timing, storage class, and rule enabled state.
Task 3

Review a budget alert

Prompt
Open the budget alert, update the threshold to the value in the brief, and save it.
Environment
The budget page contains several alerts for different projects.
Grader
Checks target alert threshold and saved status.
Task 4

Update a security group note

Prompt
Find the security group by ID and update its description with the approved note.
Environment
The security groups table contains similar names but unique IDs.
Grader
Checks description on the exact security group and no rule changes.
Task 5

Download a cost report

Prompt
Filter Cost Explorer to the service in the brief and export the monthly CSV.
Environment
Cost Explorer defaults to daily ungrouped costs.
Grader
Checks service filter, monthly granularity, and CSV export.
UseDesktop Evals

Computer-use agent evals.

RL envs Main site Docs Blog