awoss Documentation

AWOSS-VAL turns control promises into tests. The goal is to show which agent paths were checked, what passed, what failed, who owns the fixes or accepted risks, and when the review has to run again.

Paper controls are not enough for agentic work. Approval screens, sandboxes, logging settings, source reviews, DLP rules, and policy gates can all look reasonable until a real workflow reads context, calls tools, uses connectors, writes files, runs commands, stores memory, requests approvals, handles sensitive data, or triggers a downstream business action.

The output should be a reviewable validation packet: coverage, fixtures, findings, retests, owner decisions, and independent challenge for higher-impact systems where appropriate.

What This Family Covers

In scope:

Validation coverage matrices that say which candidate controls were checked, how they were checked, and which controls were not checked.
Review artifacts with scope, method, reviewer or owner, date, evidence references, finding status, assumptions, and claim limits.
Gap, exception, residual-risk, and untested-control records discovered during validation.
Pre-production and pre-expansion tests for approval gates, denied-action paths, source-trust controls, sensitive-data controls, logging controls, human oversight paths, incident handling, and rollback procedures.
Finding lifecycle records that connect validation failures to owner, remediation, accepted risk, target date, retest trigger, and closure state.
Repeatable validation fixtures, review checklists, policy tests, adversarial prompts, context-boundary tests, safe evidence queries, and production-log samples.
Recurring validation for high-impact workflows after model, prompt, source, connector, policy, boundary, data, evidence-store, monitoring, or provider changes.
Separated, independent, or qualified review for high-assurance validation where feasible.
Adversarial testing, red-team exercises, tabletop exercises, and abuse-case scenarios for material agentic workspace risks.

Out of scope:

Creating a general awoss certification, assessor, auditor, or public conformance program.
Proving that a model evaluation, red-team scan, benchmark, or single guardrail test validates the complete scoped workspace.
Guaranteeing absence of prompt injection, data leakage, unsafe tool use, source drift, logging gaps, or governance failure.
Replacing legal, regulatory, privacy, safety, employment, procurement, or sector-specific review.
Storing raw exploit payloads, prompts, screenshots, media, credentials, personal data, customer records, or confidential documents where synthetic fixtures, masked samples, hashes, summaries, or protected references are sufficient.

Level Summary

Levels are cumulative. Level 2 builds on Level 1, and Level 3 builds on both.

Level	Plain-language meaning	Why this level exists	Typical evidence
Level 1	The organization knows what was checked, has at least one review artifact, and records known gaps instead of overclaiming.	A scoped system cannot support assurance discussions until reviewed controls, methods, gaps, and assumptions are visible.	Coverage matrix, review artifact, gap register, assumptions record, untested-control list.
Level 2	Production or expanded use is preceded by meaningful tests of important gates, denied paths, data handling, logging, oversight, and rollback, with findings tracked to decision or retest.	Managed production use needs repeatable validation and a finding lifecycle, not one-off screenshots or informal signoff.	Pre-production test plan, fixture results, denial receipts, approval test records, finding tracker, retest records.
Level 3	High-impact workflows are revalidated over time, challenged by separated or qualified reviewers where feasible, and tested against adversarial or incident scenarios.	High-assurance environments need recurring review, drift checks, independent challenge, and abuse-case coverage for material risks.	Scheduled validation runs, drift review, production-log sample, independent review summary, red-team or tabletop report.

Candidate Controls

AWOSS-VAL-L1-001: Validation Coverage Matrix Level 1

Requirement summary

Identify which candidate controls were reviewed by documentation, configuration inspection, sampled evidence, manual test, automated test, monitoring review, or not reviewed in the current draft assessment.

Why it exists

Without a coverage matrix, a team may mistake a few screenshots, eval runs, or policy notes for complete validation. The matrix makes the review method explicit and shows where no review happened.

Why this level

This belongs at Level 1 because it is the foundation for honest assurance. It does not require advanced tooling, but it requires naming the controls, methods, evidence references, and gaps.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Control coverage matrix	Evidence or audit owner with family control owners	Before assurance discussion and after material scope or control changes	Candidate controls, method used for each, evidence reference, reviewer or owner, and not-reviewed status	Shows review coverage; does not prove controls were effective.
Untested-control register	Evidence or audit owner	During each validation pass	Controls, workflows, data classes, tools, or scenarios not reviewed and why	Prevents overclaiming; does not prove untested paths are low risk.
Review-method taxonomy	Evidence or audit owner	Before validation planning and after method changes	Definitions for documentation review, configuration inspection, sampled evidence, manual test, automated test, monitoring review, and no review	Standardizes method labels; does not prove method quality.

AWOSS-VAL-L1-002: Minimum Review Artifact Level 1

Requirement summary

Maintain at least one validation or review artifact for the scoped boundary before using awoss candidate controls in internal assurance discussions. Include scope, method, reviewer or owner, date, and finding status.

Why it exists

Internal assurance claims need a durable record. A conversation, meeting memory, or undocumented walkthrough cannot show later what was reviewed, by whom, against which boundary, or with what result.

Why this level

This belongs at Level 1 because every scoped system should have at least one review packet before anyone discusses awoss readiness, mapping, or control support.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Validation review packet	Evidence or audit owner	Before internal assurance discussion and after material review updates	Scoped boundary, reviewed controls, method, reviewer or owner, date, findings, gaps, and evidence references	Supports review of selected controls; does not prove conformance or complete coverage.
Reviewer signoff note	Reviewer, control owner, or evidence owner	At review completion	Reviewer identity or role, relationship to system, scope reviewed, result, and open findings	Records review participation; does not prove reviewer independence or assessor qualification.
Sample evidence bundle	Evidence owner with runtime, source, log, and governance owners	During validation packet preparation	Representative receipts, logs, configuration exports, test results, and redacted references tied to the scoped boundary	Supports sampled review; does not prove all workflows were tested.

AWOSS-VAL-L1-003: Known Gaps And Assumptions Level 1

Requirement summary

Record known gaps, assumptions, exceptions, residual risks, or untested controls discovered during review.

Why it exists

A useful validation pass should make uncertainty visible. Hidden assumptions and unstated exceptions are a common source of overclaiming, especially when hosted products, local desktop agents, connectors, logs, and governance records expose different evidence.

Why this level

This belongs at Level 1 because transparent gap recording is required before stronger testing, retesting, or independent review can be meaningful.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Gap and assumption register	Evidence or governance owner	During review and after findings, incidents, provider changes, or scope changes	Known gaps, assumptions, untested controls, exception references, residual risks, owners, and review dates	Shows acknowledged limitations; does not make the risk acceptable by itself.
Residual-risk note	Governance owner with control owner input	When a gap cannot be remediated before use	Risk description, affected controls, evidence basis, mitigation, owner, and expiry or review date	Supports governance review; does not prove legal or business acceptability.
Claim-limit update	Governance or evidence owner	When a gap affects internal or external wording	Claims that must be blocked, narrowed, delayed, or reviewed because of validation results	Controls wording; does not prove the underlying risk is fixed.

AWOSS-VAL-L2-001: Pre-Production And Expansion Tests Level 2

Requirement summary

Test or review approval gates, denied-action paths, source-trust controls, sensitive-data controls, and logging controls before production deployment or material boundary expansion. Include human oversight paths and incident or rollback procedures for high-impact workflows.

Why it exists

Production use and boundary expansion are where paper controls often fail. A new connector, memory source, source package, workflow, approval policy, file path, SaaS action, or data class can introduce paths that were never exercised.

Why this level

This belongs at Level 2 because managed production use needs practical testing of important gates and bad paths, not only a control inventory.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Pre-production validation plan	Evidence owner with runtime, workspace, source, data, log, and governance owners	Before production deployment or material boundary expansion	Approval, denial, source-trust, sensitive-data, logging, oversight, incident, and rollback tests to run	Defines tests; does not prove they passed.
Denied-path and approval test result	Runtime or evidence owner	Before production use and after policy or workflow changes	Safe fixture, expected deny or approval path, actual result, receipt ID, reviewer, and finding if bypassed	Validates named paths only; does not prove all bypasses are closed.
Rollback or emergency procedure drill	Runtime, workspace, or incident owner	Before high-impact production use and after rollback-path changes	Test workflow, stop or rollback action, restored state, owner signoff, and gaps	Tests selected rollback path; does not prove every downstream side effect is reversible.

AWOSS-VAL-L2-002: Finding Lifecycle And Retest Triggers Level 2

Requirement summary

Track validation findings, remediation status, risk acceptance, owners, target dates, and retest or review triggers for material gaps.

Why it exists

A failed test should not disappear into a chat thread, spreadsheet, or informal TODO. Material validation findings need a lifecycle that records who owns the decision, what changed, whether risk was accepted, and when the issue must be retested.

Why this level

This belongs at Level 2 because production validation needs closed-loop management. Level 1 can record gaps; Level 2 must track material findings to remediation, acceptance, or retest.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Validation finding record	Evidence or security owner with affected control owner	When a validation gap is found	Finding ID, affected controls, scenario, severity or impact, owner, evidence reference, and status	Tracks finding state; does not prove remediation is sufficient.
Retest trigger record	Evidence owner or release owner	When remediation, risk acceptance, scope change, or provider change occurs	Trigger, required retest, owner, target date, fixture or scenario, and closure requirement	Schedules retest; does not prove the retest passed.
Risk acceptance record	Governance owner with evidence owner input	When a finding remains open by decision	Residual risk, rationale, owner, expiry or review date, claim limits, and compensating controls	Supports decision review; does not prove the risk is acceptable outside the named scope.

AWOSS-VAL-L2-003: Repeatable Fixtures And Review Queries Level 2

Requirement summary

Use repeatable validation fixtures, review checklists, policy tests, adversarial prompts, context-boundary tests, or evidence queries for recurring production reviews where practical.

Why it exists

One-off validation is hard to compare over time. Repeatable fixtures and queries let a team check whether an approval gate, denied-action path, source-trust assumption, context boundary, sensitive-data rule, log reconstruction path, or rollback path still behaves as expected.

Why this level

This belongs at Level 2 because repeatability is needed once the scoped system is used in production or expanded. The fixtures may still be manual or semi-automated, but they should be stable enough to rerun.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Validation fixture catalog	Evidence owner with control owners	Before recurring review and after fixture changes	Fixture ID, covered controls, safe setup, expected behavior, evidence fields, owner, and next review trigger	Supports repeatability; does not prove coverage of unlisted scenarios.
Review checklist or evidence query set	Evidence or audit owner	During recurring review and after evidence-source changes	Questions or queries for approvals, denials, source drift, context state, sensitive handling, logs, and findings	Helps consistent review; does not prove evidence sources are complete.
Fixture run record	Evidence owner or test owner	Each validation run	Fixture version, scoped system state, expected result, actual result, receipt IDs, findings, and retest status	Shows a named fixture result; does not prove complete workspace safety.

AWOSS-VAL-L3-001: Recurring High-Impact Validation And Drift Review Level 3

Requirement summary

Perform recurring validation for high-impact workflows, including boundary enforcement, runtime action control, context-poisoning resistance, sensitive-data handling, logging integrity, and incident or rollback procedures, with review of drift, monitoring signals, and human-intervention records where applicable.

Why it exists

Agentic systems drift. Models, prompts, instructions, memory, retrieval corpora, tools, connectors, source versions, permissions, policies, logs, monitoring rules, providers, and business workflows can change after the first review.

Why this level

This belongs at Level 3 because high-impact workflows need stronger ongoing assurance. The focus is not continuous perfection; it is a recurring and trigger-driven review that can detect material drift and preserve evidence.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Recurring validation schedule	Governance or evidence owner	Before high-impact use and after review-cadence changes	Covered workflows, cadence, triggers, owners, fixtures, evidence sources, and escalation path	Defines cadence; does not prove reviews are effective.
Drift review packet	Evidence owner with runtime, source, context, log, and governance owners	On schedule and after material changes	Model, prompt, source, tool, connector, policy, context, data, log, finding, monitoring, and provider changes reviewed	Supports drift review; does not prove all drift was detected.
Production-log sample review	Evidence or audit owner	Periodically and after incidents or monitoring signals	Sampled workflow, receipt IDs, reconstruction result, missing fields, findings, and retest triggers	Reviews selected records only; does not prove all production activity is safe.

AWOSS-VAL-L3-002: Separated Or Qualified Review Level 3

Requirement summary

Use separated, independent, or qualified review for high-assurance validation where feasible, and record the reviewer relationship or qualification basis.

Why it exists

Builders are often too close to their own controls. A separated reviewer, qualified internal reviewer, model risk reviewer, red team, or external assessor can challenge assumptions, evidence quality, finding closure, and claim posture.

Why this level

This belongs at Level 3 because it adds stronger assurance and governance discipline for high-impact workflows. It also requires careful claim language because awoss does not yet define an assessor qualification or independence model.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Reviewer relationship record	Governance or evidence owner	Before high-assurance review and at review completion	Reviewer identity or role, relationship to build team, independence or separation basis, conflicts, and scope	Shows relationship; does not prove auditor independence or certification.
Qualification basis note	Governance owner or review lead	Before relying on review conclusions	Experience, role, training, domain knowledge, red-team responsibility, or external engagement scope relevant to the review	Supports reviewer selection; does not create an `awoss` assessor credential.
Challenge review summary	Separated reviewer, red team, or qualified reviewer	At review completion	Evidence challenged, findings opened, assumptions questioned, accepted limitations, and management response	Supports high-assurance review; does not prove complete security.

AWOSS-VAL-L3-003: Adversarial And Abuse-Case Exercises Level 3

Requirement summary

Include adversarial testing, red-team exercises, tabletop exercises, or abuse-case testing for material agentic workspace risks, including source-trust abuse, context manipulation, tool misuse, sensitive-data exposure, and incident-response paths.

Why it exists

Happy-path testing does not show how the system behaves when a document contains hostile instructions, a connector exposes too much data, a tool tries an unsafe action, a source changes unexpectedly, a secret appears in a prompt, or responders need to stop and reconstruct a harmful workflow.

Why this level

This belongs at Level 3 because adversarial and incident-style testing is stronger, riskier, and more specialized than basic production validation. It should be scoped, harmless by default, and tied to findings and retests.

Evidence examples

Evidence	Likely owner/provider	When collected	What it should show	Claim limit
Abuse-case scenario list	Security, evidence, or red-team owner with control owners	Before adversarial review and after risk changes	Source-trust, context poisoning, tool misuse, sensitive-data exposure, logging, rollback, and incident scenarios	Defines scenarios; does not prove all abuse paths are covered.
Red-team or adversarial test summary	Security, red-team, or evidence owner	After approved adversarial exercise	Safe payload or fixture references, expected behavior, actual behavior, findings, remediation, and retest plan	Supports scenario review; does not prove prompt-injection resistance or complete safety.
Tabletop exercise packet	Governance or incident owner with evidence owner	During scheduled exercises and after major incidents	Roles, decisions, evidence retrieved, stop or rollback path, escalation route, claim-limit decision, and improvement backlog	Tests decision-making and evidence retrieval; does not prove technical controls operated in production.

External Mapping Notes

The completed crosswalk treats AWOSS-VAL as the broadest-covered awoss family. It is shaped by verification, testing, monitoring, human oversight, recurring review, vulnerability scoring, red-team, threat-modeling, risk management, and improvement themes across many sources.

Relevant external-source signals include:

EU AI Act official sources inform oversight, monitoring, input review, and validation evidence angles, but AWOSS-VAL does not prove legal compliance, conformity assessment, high-risk classification, or other legal judgments.
OWASP AISVS informs output controls, adversarial tests, drift review, and kill-switch or emergency exercises, but current public AISVS material does not create an awoss certification or complete-safety claim.
AIUC-1 is useful as a commercial comparator for annual review, quarterly testing, human review, and intervention records, but there is no AIUC-1 certificate equivalence.
OWASP Agentic Skills Top 10, OWASP AIVSS, CSA AICM, CSA MAESTRO, NIST AI RMF, NIST AI 600-1, ISO/IEC 42001, ISO/IEC 23894, Five Eyes guidance, and MITRE ATLAS inform selected testing, assessment, monitoring, red-team, risk-review, and remediation practices, but none of those sources by itself validates the complete agentic workspace boundary.
The tooling research notes show that practical validation support exists across eval frameworks, guardrail tests, red-team scanners, traces, logs, issue trackers, and governance records, but current tooling remains fragmented across products and layers.

Formal Standard Link

Use this guide with the formal AWOSS-VAL candidate requirements. If the guide and the standard draft disagree, the standard draft controls.