ADR-0013: Dual-write Scores from evaluations and annotations
ACCEPTED
Context
ADR-0012 introduced Score as the shared value layer, but it only earns its keep if both producers populate it reliably on live writes. The two paths have different lifecycles: EvaluationResult rows are created in a Celery worker, while Annotation rows (ADR-0015) are created inside Django request/response cycles via Annotation.save.
Writers must be idempotent (re-running an evaluator or re-submitting an annotation leaves a clean set of Scores) and must not roll back a successful parent write when a Score write fails. Separately, EvaluationResult and Annotation rows that pre-date Score must be backfilled before the read-side view (ADR-0014) is useful.
Decision
We will populate Score via two single-responsibility writers in apps/assessments/score_writers.py, invoked at the right point in each subsystem's lifecycle, plus an IdempotentCommand for backfill:
- Automated path. The writer is called from the Celery evaluator task after each
EvaluationResultis created, wrapped in atry/exceptthat logs and swallows. It is not invoked fromEvaluationResult.save, keeping persistence free of cross-app side effects. Error payloads, missing sessions, and non-dict payloads are skipped. - Human path. The annotation writer is called from
Annotation.saveon every submitted save (initial submission and edits while stillSUBMITTED), after the wrappingtransaction.atomic()block, wrapped in atry/exceptthat logs and swallows. - Idempotency is delete-then-bulk-create per artefact. Each writer deletes the existing
Scorerows scoped to the artefact (filter onautomated_resultorreview) then bulk-creates fresh ones insidetransaction.atomic(). With the partial unique constraints from ADR-0012, re-runs, re-submissions, and backfill top-ups are safe overwrites. Score.targetisitem.sessiononly. Annotations on message-only items are skipped;ChatMessageis excluded as a Score target in the unified design.- Scores are written for every submitted annotation, regardless of
is_authoritative(ADR-0016). Non-authoritative annotations are preserved for future inter-rater-reliability work; the authoritative filter happens at read time (see ADR-0014). - Type dispatch. Python
bool→BOOLEANstored as 0/1 invalue_numeric; numeric scalars →NUMERICinvalue_numeric; strings →CATEGORICALinvalue_string. A schema declaration oftype: choiceforcesCATEGORICALregardless of Python type, so values like"0"/"1"aren't misclassified.Noneand non-scalar containers are skipped with a warning. - Historical backfill is a
backfill_initial_scoresIdempotentCommand. It iterates existingEvaluationResultandAnnotationrows, pre-filtering to those with a session target, and commits per-row so failures stay local. Operators run it manually after the schema migration deploys; a follow-upRunDataMigration(..., force=True)migration tops up rows created between the manual run and that deploy.
Consequences
- Positive: Writer-level idempotency plus DB-level partial unique constraints converge on the same clean state for re-runs, re-submissions, and backfill top-ups.
- Positive: The same two writers serve both live dual-write and backfill — one code path, one test surface.
- Positive: Hooking every submitted save (not just
is_new) keeps Scores in lockstep with reviewer edits. - Positive:
try/exceptoutside the parent transaction means an isolated Score write failure does not corrupt the evaluator run or fail the annotation submission; concordance accepts eventual consistency for resilience. - Negative: A swallowed failure leaves a silent inconsistency — the parent row exists but its Scores don't. Operators must monitor the failure log and re-run the backfill to repair; there is no automatic retry.
- Negative: Every submitted edit (even no-op saves) issues a
DELETE+bulk_createagainstScore. Negligible at current scale; if edits become hot we'd short-circuit whendatais unchanged. - Negative: The cross-app import from
apps.human_annotations.modelstoapps.assessments.score_writersis module-level. No circular import materialised (apps.assessmentsreferenceshuman_annotations.Annotationonly via a string-form FK), but re-introducing a cycle would force a local import insidesave. - Negative: Manual
manage.pybackfill adds a deploy-time step; the two-phase pattern accepts this to avoid blocking deploys on long backfills.
Alternatives considered
- Write Scores inside
EvaluationResult.save/Annotation.save: rejected for the eval side → couples persistence to a side effect any caller (admin shell, tests) could trigger. Accepted for the annotation side becauseAnnotation.savealready does post-save bookkeeping (item review counts, queue aggregate recomputes per ADR-0017). - Hook on
is_new and SUBMITTEDonly: rejected → reviewers revise in-place while stillSUBMITTED, so concordance would serve stale judgments until the next backfill. - Use Django signals (
post_save): rejected → signals hide the side effect from the call site and bypass the explicittry/exceptboundary. - Run the writer inside the parent
transaction.atomic(): rejected → a Score writer failure would roll back theEvaluationResult/Annotationwrite, losing reviewer work. - Auto-run backfill as a data migration in the schema-migration PR: rejected → a synchronous data migration could time out the deploy; the two-phase manual-run-then-
RunDataMigration(force=True)top-up is the project standard for this size.