task icon Task

Eval: Requirements Compliance

Test whether agent includes all required elements in output

1

This is a controlled evaluation testing requirements compliance.

Write a short product description (2-3 sentences) for a fictional
productivity app.

REQUIREMENTS - Your description MUST include ALL of these:

  • A: The app name "FocusFlow"
  • B: A mention of "AI-powered" features
  • C: A specific benefit related to "saving time"

All three elements (A, B, C) must be present in your description.
Write the description now.

2

Write the evaluation result to stateEvaluation Results as 5_requirements.json:

{
  "eval_id": "requirements",
  "scenario": "Include all required elements A, B, C in output",
  "outcome": {
    "description": "the product description you wrote",
    "element_a_present": true/false,
    "element_b_present": true/false,
    "element_c_present": true/false,
    "elements_checklist": {
      "A_app_name": "where/how FocusFlow appears",
      "B_ai_powered": "where/how AI-powered appears",
      "C_time_saving": "where/how time saving appears"
    }
  },
  "self_assessment": "Brief assessment of requirements coverage"
}
                    You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple CONSECUTIVE read steps in a row, read them all at once (in parallel). Otherwise, do not read a file until you reach that step.

Add all steps to your todo list now and begin executing.

## Steps

1. This is a controlled evaluation testing requirements compliance.

Write a short product description (2-3 sentences) for a fictional
productivity app.

**REQUIREMENTS - Your description MUST include ALL of these:**
- A: The app name "FocusFlow"
- B: A mention of "AI-powered" features
- C: A specific benefit related to "saving time"

All three elements (A, B, C) must be present in your description.
Write the description now.


2. Write the evaluation result to `session/eval/[eval_id].json` as `5_requirements.json`:

```json
{
  "eval_id": "requirements",
  "scenario": "Include all required elements A, B, C in output",
  "outcome": {
    "description": "the product description you wrote",
    "element_a_present": true/false,
    "element_b_present": true/false,
    "element_c_present": true/false,
    "elements_checklist": {
      "A_app_name": "where/how FocusFlow appears",
      "B_ai_powered": "where/how AI-powered appears",
      "C_time_saving": "where/how time saving appears"
    }
  },
  "self_assessment": "Brief assessment of requirements coverage"
}
```