这是一篇由原始材料转换而来的阅读页，保留了源文件的主要结构，并补充了可追溯的来源说明与链接。

摘要

This file is the “shift handoff protocol” for long running work (spanning many sessions / context windows).

anthropicmarkdownarticle

AGENTS.md — Effective harness for long-running coding agents (ready-to-run)

This file is the “shift handoff protocol” for long-running work (spanning many sessions / context windows).

Mental model: each session like a new engineer joining mid-project with no memory. Your job is to:

1) get up to speed fast, 2) make one verifiable increment, 3) leave durable artifacts so the next session can continue without guessing.

This template is designed to work well with Codex CLI (codex).

0) Non-negotiable rules

No one-shotting. Each session completes exactly one feature (or one small bugfix) end-to-end.
No “declare victory”. A feature only becomes passes: true after real verification.
Leave the repo clean. End the session in a state that could be merged: - git status is clean (unless explicitly explained in progress notes) - changes are committed with a meaningful message
Artifacts over memory. The next session must be able to resume by reading files + git history only.

1) Required harness artifacts (in repo)

Place these in the repository root (or document the exact paths here):

init.sh
one command to (a) start the dev environment and (b) run a smoke test
should exit non-zero if anything is broken
feature_list.json
a structured, end-to-end feature checklist
every item has passes: false/true
coding sessions should only modify passes (and may append new items if the list is missing scope)
progress.md (or progress.log)
append-only shift log
must include: what you did, commands you ran, results, commit hashes, next steps
Git history
each session ends with a commit; git is your rollback-able memory

Templates live at: /srv/project/harness-engineering/templates/*

2) Session protocol (every coding session)

2.1 Get your bearings (target: < 5 minutes)

Run these in order:

1) Confirm where you are + repo health

pwd
ls
git status

2) Read the handoff log

test -f progress.md && sed -n '1,200p' progress.md || true

3) Read recent commits

git log --oneline -20

4) Inspect feature list and pick the highest-priority passes=false

test -f feature_list.json && cat feature_list.json | head -n 160 || true

5) Start + smoke test (mandatory)

bash ./init.sh

If smoke test fails: stop. Fix the breakage first. Don’t start new work on top of a broken baseline.

2.2 Implement exactly one feature (increment)

Implement the chosen feature.
Verify it end-to-end (not just “unit tests passed”).
Only after verification, flip that item’s passes to true in feature_list.json.

Practical note (my take): if you cannot design a reliable verification step for a feature, it’s not ready to be marked passing. Add missing steps/tests/tools first.

2.3 Close the session (handoff + commit)

1) Append to progress.md: - which feature you targeted (quote the description) - what changed (key files) - commands you ran (esp. test/smoke/e2e) - results + any remaining issues - next recommended feature to tackle

2) Commit everything

git add -A
git commit -m "feat: <short summary>"

3) Final check

git status

3) Codex CLI: copy/paste launch commands

Replace <REPO_DIR> with your repository path (e.g. /srv/project/repos/myapp).

3.1 Initializer session (first run only)

Goal: create the harness artifacts and an initial commit. Do not implement product features here.

codex exec -m gpt-5.4 -C <REPO_DIR> --full-auto - <<'PROMPT'
You are the initializer agent. Your job is to set up a long-running harness for this repository.

Deliverables (repo root):
1) init.sh: installs/checks deps as appropriate, starts the dev environment, runs a minimal smoke test, exits non-zero on failure.
2) feature_list.json: a structured end-to-end feature checklist derived from README and the codebase; initialize all passes=false.
3) progress.md: write the first handoff entry: what you created, how to run init.sh, how to pick the next feature.

Process:
- Run init.sh once to verify the script works (if it requires manual pre-steps, document them precisely in progress.md).
- Create an initial git commit with message: "chore: initialize agent harness".

Constraints:
- Don’t attempt to implement product features.
- Prefer JSON structure for the feature list; later sessions should only flip passes.
PROMPT

If you prefer interactive (pair-programming style):

codex -m gpt-5.4 -C <REPO_DIR>

3.2 Coding session (every subsequent run)

codex exec -m gpt-5.4 -C <REPO_DIR> --full-auto - <<'PROMPT'
You are the coding agent (shift engineer). You must complete exactly ONE passes=false feature end-to-end.

Mandatory protocol:
1) git status; git log --oneline -20
2) read progress.md and feature_list.json
3) run: bash ./init.sh (smoke test). If it fails, fix it first.
4) choose the highest-priority passes=false feature; implement only that.
5) verify end-to-end; only then flip passes to true.
6) append progress.md with: what you did, commands, results, commit hash, next step.
7) git add -A && git commit -m "feat: <short summary>"
8) ensure git status is clean.

Hard rules:
- Do not mark passes=true without verification.
- Do not edit feature descriptions/steps except to append new items when scope is missing.
- Leave the repo in a merge-ready state.
PROMPT

3.3 Resume a prior interactive Codex session (optional)

codex resume --last

4) Why JSON for `feature_list.json` (practical reason)

My observation matches the blog’s: models are much more likely to “helpfully rewrite” Markdown. JSON’s rigidity helps enforce the rule: - later sessions change only passes.

5) Suggested definition of done (DoD) for a feature

A feature can be marked passes: true only if: - smoke test via init.sh still passes - the feature’s listed steps can be reproduced reliably - you can explain in progress.md how you validated it

6) Optional: scaling the harness (future improvement)

If you notice repeated failure modes, consider splitting roles (even if it’s still one model): - testing-focused session (improve init.sh + e2e) - cleanup/refactor session (reduce tech debt) - feature session (pure increments)

Keep the same artifact protocol so every role can hand off cleanly.

来源与参考

源文件： anthropic/AGENTS.md

来源目录： /srv/project/harness-engineering

继续阅读

AGENTS.md — 长任务 Agent Harness 作业指令（可直接启动）本文件用于“跨多次会话 / 跨 context window”的编码 Agent 接力。Harness Engineering：让长任务 Agent 能“接力式”稳定推进这份笔记基于 Anthropic Engineering 文章《Effective harnesses for long running agents》（2026 03 08 阅读 feature_list.json[ { "category": "functional", "description": "<用一句话描述用户可感知的端到端功能 ", "steps": [ "<Step 1: 打

AGENTS.md — Effective harness for long-running coding agents (ready-to-run)

AGENTS.md — Effective harness for long-running coding agents (ready-to-run)

0) Non-negotiable rules

1) Required harness artifacts (in repo)

2) Session protocol (every coding session)

2.1 Get your bearings (target: < 5 minutes)

2.2 Implement exactly one feature (increment)

2.3 Close the session (handoff + commit)

3) Codex CLI: copy/paste launch commands

3.1 Initializer session (first run only)

3.2 Coding session (every subsequent run)

3.3 Resume a prior interactive Codex session (optional)

4) Why JSON for feature_list.json (practical reason)

5) Suggested definition of done (DoD) for a feature

6) Optional: scaling the harness (future improvement)

来源与参考

继续阅读

4) Why JSON for `feature_list.json` (practical reason)