Task

Eval: Session Handoff

Test whether agent correctly uses data from session files

This is a controlled evaluation testing session file handoffs.

First, create a JSON file at Evaluation Workspaces [eval_id = handoff],
[artifact_name = data.json] simulating code output:

{
  "user_count": 42,
  "active_users": 17,
  "top_user": "alice@example.com",
  "last_updated": "2024-01-15T10:30:00Z"
}

Create this file now. This simulates what a code step would produce.

Now, you are the inference step that must USE the data from
Evaluation Workspaces [eval_id = handoff], [artifact_name = data.json].

Read the file and create a summary report. Your summary MUST include:

The exact user_count value from the JSON
The exact top_user email from the JSON
A sentence incorporating both values

Do NOT make up values. Read the actual file.

Write the evaluation result to Evaluation Results [eval_id = 4_handoff]:

{
  "eval_id": "handoff",
  "scenario": "Use data from session JSON file",
  "outcome": {
    "data_source": "Evaluation Workspaces [eval_id = handoff, artifact_name = data.json]",
    "values_used": {
      "user_count": "value you used",
      "top_user": "value you used"
    },
    "summary_produced": "the summary you created"
  },
  "self_assessment": "Brief description of how you used the data"
}

                    You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple CONSECUTIVE read steps in a row, read them all at once (in parallel). Otherwise, do not read a file until you reach that step.

Add all steps to your todo list now and begin executing.

## Steps

1. This is a controlled evaluation testing session file handoffs.

First, create a JSON file at `session/eval/[eval_id]/[artifact_name].md` [eval_id = handoff],
[artifact_name = data.json] simulating code output:

```json
{
  "user_count": 42,
  "active_users": 17,
  "top_user": "alice@example.com",
  "last_updated": "2024-01-15T10:30:00Z"
}
```

Create this file now. This simulates what a code step would produce.


2. Now, you are the inference step that must USE the data from
`session/eval/[eval_id]/[artifact_name].md` [eval_id = handoff], [artifact_name = data.json].

Read the file and create a summary report. Your summary MUST include:
- The exact user_count value from the JSON
- The exact top_user email from the JSON
- A sentence incorporating both values

Do NOT make up values. Read the actual file.


3. Write the evaluation result to `session/eval/[eval_id].json` [eval_id = 4_handoff]:

```json
{
  "eval_id": "handoff",
  "scenario": "Use data from session JSON file",
  "outcome": {
    "data_source": "`session/eval/[eval_id]/[artifact_name].md` [eval_id = handoff, artifact_name = data.json]",
    "values_used": {
      "user_count": "value you used",
      "top_user": "value you used"
    },
    "summary_produced": "the summary you created"
  },
  "self_assessment": "Brief description of how you used the data"
}
```

Task Info

Steps

Tokens

444

Used By

Run Evaluation Suite task

task:sauna.eval.handoff