Position Paper
Knowledge Architecture Beyond the Single Consumer
Information architecture has well-established principles for organizing knowledge: structural decomposition (DITA), semantic description (Schema.org, JSON-LD), progressive disclosure (standard in UX and technical writing since the 1980s). These principles were developed for human consumers. They assumed a reader who tolerates ambiguity, fills gaps through intuition, and compensates for missing structure through contextual inference.
Bowker and Star (1999) demonstrated that classification systems become invisible infrastructure whose categories shape cognition long after their design rationale is forgotten. Star and Griesemer (1989) showed that boundary objects enable coordination across communities of practice precisely because they maintain structure while permitting local adaptation. Both insights apply with particular force to knowledge architecture, where the classification decisions embedded in documentation structures determine what can be known, by whom, and under what conditions.
That assumption is breaking. Autonomous agents, AI verification systems, regulatory automation, and AI-augmented workflows are consuming the same knowledge artifacts as human developers, with fundamentally different failure modes when those artifacts are ambiguous, incomplete, or structurally coupled to a single modality.
Gigerenzer's (2000) ecological rationality framework explains the specific mechanism: cognitive strategies perform well only when matched to the informational structure of their environment. A human developer uses recognition heuristics (familiar API patterns, analogies to known systems) that depend on the knowledge layer providing recognizable structural cues. An autonomous agent uses constraint-satisfaction search that depends on the knowledge layer providing machine-readable constraint metadata. A regulatory system uses compliance-checklist matching that depends on the knowledge layer providing structured claim-evidence mappings. Each reasoner's strategy is adapted to a different environmental structure, and knowledge architecture designed for one structure degrades the strategies of every other consumer class.
This paper identifies eight specific extensions to established information architecture principles, tested against failures observed across platform documentation ecosystems, regulatory disclosure practices, agent-platform integration, and infrastructure analysis. The extensions address what existing IA principles leave unresolved when the consumer is unknown at design time.
Observations
Four failures that existing IA principles leave unresolved
Documentation designed for a single audience becomes functionally useless when a different audience arrives, despite being extensive. API documentation written for human developers required complete restructuring when agents needed to consume the same capabilities programmatically. Regulatory filings designed for compliance reviewers required rebuilding when automated verification systems needed structured inputs. Infrastructure documentation written for prospective users required restructuring when structural analysts needed mechanism-level evidence. In every case, the information existed but its structure served one consumer class, and each new consumer class triggered a rebuild. Star and Griesemer's (1989) boundary object concept explains the structural cause: documentation that functions as a locally adapted artifact within one community of practice lacks the shared structure that would make it useful across communities. The rebuild cycle is the cost of converting a local adaptation into a boundary object after the fact, rather than designing the boundary object properties in from the start. The pattern suggests a design problem rather than a content problem: the knowledge was coupled to a consumption modality rather than organized around the underlying facts.
Human readers tolerate ambiguous descriptions because they bring contextual understanding. An autonomous agent encountering the same ambiguity in a capability schema has no context to resolve it. The degradation is proportional: the more rigid the reasoner (less tolerance for ambiguity, fewer compensating heuristics), the more costly each ambiguity in the knowledge artifact. Sweller's (1988) cognitive load theory provides the mechanism for human consumers: ambiguous documentation imposes extraneous cognitive load (processing effort that serves the presentation format rather than the subject matter), consuming working memory capacity that could otherwise be devoted to intrinsic load (processing the actual complexity of the system being documented). Pinch and Bijker's (1984) concept of interpretive flexibility explains why ambiguity persists: different social groups (engineers, investors, regulators, users) read the same documentation with different interpretive frameworks, and documentation that is optimally ambiguous for the producer (allowing each audience to read what it expects) is maximally costly for the most rigid consumers (agents and automation systems that cannot disambiguate). This pattern was visible across domains: evaluation frameworks produced divergent conclusions from the same documentation because the documentation tolerated ambiguity that the framework exposed; agents misinterpreted tool constraints described in natural language because the descriptions relied on contextual understanding the agent lacked; regulatory automation systems produced inconsistent assessments from the same disclosure because the disclosure permitted multiple valid interpretations.
MCP gives agents discovery: they know a tool exists. OpenAPI gives developers discovery: they know an endpoint exists. Disclosure standards give regulators discovery: they know a system publishes information. In each case, the consumer can find the capability but lacks the metadata to evaluate whether it fits their current constraints (cost, latency, trust requirements, context budget). The result is first-match selection rather than best-match selection. Simon's (1956) bounded rationality provides the theoretical framework: when information acquisition is costly (in time for humans, in tokens for agents, in processing for automation systems), reasoners satisfice rather than optimize, selecting the first option that meets a minimum threshold rather than searching for the best option across all possibilities. The transition from satisficing to optimizing requires that comparison metadata be available at low cost relative to the decision's importance. Without constraint metadata, the cost of comparison exceeds the cost of accepting a suboptimal match, making satisficing the rational strategy even when better options exist. In multi-server agent environments, this produces measurably suboptimal tool selection. In developer ecosystems, this manifests as integration decisions based on the first adequate-looking API rather than the best-fit option. In evaluation frameworks, this manifests as assessments that rank systems by the metrics they happen to publish rather than the metrics that matter.
The phrasing, granularity, and completeness of a knowledge artifact shape consumer behavior through at least four mechanisms: what descriptions include, what schemas omit, how errors are categorized, and how version boundaries are signaled. The behavioral effect holds whether the consumer is a human developer, an autonomous agent, or a regulatory triage system. Winner's (1980) thesis that artifacts have politics applies directly: technical artifacts embody political and behavioral choices through their design, and knowledge artifacts are no exception. Lessig's (1999) framework of code as law extends the point to digital infrastructure: the architecture of a system regulates behavior as effectively as formal rules, and knowledge schemas function as a regulatory architecture for every consumer that depends on them. Callon's (1998) performativity concept adds that the knowledge artifact does more than describe the system; it actively constitutes the informational environment within which decisions about the system are made. This was the most consistently observed pattern across all domains examined, and the one with the broadest implications for how knowledge systems should be designed. The four mechanisms are elaborated in the 'Knowledge artifacts as behavioral governance' section below.
Principles
Design principles for unknown consumers
Each principle below builds on an established information architecture concept and identifies a specific extension required when the consumer is heterogeneous and unknown at design time. The principles are necessary for this specific scenario (multiple reasoner classes with different failure modes consuming the same knowledge artifacts); for known, homogeneous consumer populations, some of the extensions impose overhead that established IA principles handle more efficiently. The extension structure draws on a method analogous to Lakatos's (1970) protective belt: the established IA concept forms the hard core of each principle, and the extension identifies where the protective belt of assumptions (about who the consumer is, what compensating heuristics they bring, and how much ambiguity they tolerate) must be revised when those assumptions no longer hold. The set addresses the failure modes documented in the observations above; it addresses the observed failure modes and should be tested empirically rather than accepted as exhaustive.
Each unit must be intelligible to a reasoner encountering it without prior context or compensating heuristics. DITA assumes a human reader who can infer relationships from document structure. Simon's (1962) architecture of complexity provides the theoretical foundation: complex systems that are decomposable into nearly independent subsystems are easier to understand, design, and evolve than systems with dense interdependencies. The extension requires that knowledge artifacts mirror this property: each unit should be a nearly independent subsystem of the knowledge architecture, with explicit interfaces to other units rather than implicit dependencies. When the consumer may be an agent with no structural memory, or a regulatory automation system processing units in isolation, each unit must carry its own scope boundary and relationship pointers. This failure appeared across domains: platform documentation was modular but modules assumed knowledge from other modules, making isolated consumption unreliable; API reference pages assumed familiarity with authentication flows documented elsewhere; disclosure sections assumed context from other sections of the same filing.
Can a reasoner with no prior exposure to this system extract a single unit and form a correct belief about its scope and implications?
Self-description must include authority, currency, and verification pointers alongside structural metadata. Schema.org describes what an entity is. The extension requires describing when that description was valid, who asserts it, and how the assertion can be checked. This requirement emerged from disclosure standard design, where the critical decision was binding disclosure to a specific system release rather than to a project name. A Schema.org annotation that describes a platform without specifying which version is temporally unbound and potentially misleading to any consumer. The same pattern appeared in API documentation (version-agnostic endpoint descriptions) and agent capability schemas (tool descriptions without currency signals).
Can a reasoner determine what this artifact is, when it was valid, who asserts it, and how to verify the assertion?
Every claim must be paired with the method by which any consumer can independently verify it. Existing verification systems are domain-specific: cryptographic proofs for system state, audit logs for financial systems, citations for academic claims. The extension requires verification surfaces that are consumable by heterogeneous reasoners. Popper's (1963) falsificationist epistemology provides the theoretical foundation: claims that cannot be subjected to potential refutation carry no epistemic weight, regardless of how precisely they are stated. Meyer's (1992) design by contract operationalizes this for software systems: every interface specifies preconditions, postconditions, and invariants that can be checked mechanically. Lamport's (2002) TLA+ extends the principle to temporal properties, demonstrating that system behaviors over time can be formally specified and verified. Systems routinely publish claims (scalable, reliable, secure, decentralized) without publishing the evidence that would allow any external reasoner to check. Disclosure standards address this through machine-readable manifests alongside human-readable guides. The principle generalizes: every knowledge artifact that makes a claim should expose the evidence that supports it in a form any consumer can process.
Can a reasoner verify this claim through a method that works for their specific capabilities (human inspection, automated checking, agent verification)?
Knowledge artifacts must be bound to a specific system state, with explicit signals for currency and staleness that are machine-parseable. Standard versioning practices (changelogs, 'last updated' timestamps) serve human consumers who understand version semantics. Agents, regulatory automation, and verification systems need structured temporal metadata: effective date, expiry signal, version identifier, and a pointer to the current version. This requirement emerged from disclosure standard design, where release-bound conformance was the foundational decision, and from infrastructure analysis, where documentation consistently described a previous system state with no signal of the divergence. API documentation exhibits the same pattern: endpoint behavior changes across versions while the documentation reflects a previous state.
Does this artifact specify the exact system state it describes, in a format that any consumer can parse, with an explicit currency signal?
Modality independence must extend to the verification surface. It is well established that content should be separable from presentation (write once, render for web, PDF, mobile). The extension is that evidence and verification methods must also be renderable across modalities. A human analyst verifies a system's reliability claim by inspecting distribution charts. An agent verifies the same claim by parsing a structured data feed. A regulatory system verifies it by checking against a compliance threshold. The underlying evidence is the same; the verification rendering differs. This was absent from nearly every documentation system we examined: evidence was coupled to its original presentation modality.
Can this knowledge and its verification surface be faithfully rendered for a consumer the original author did not anticipate?
Each resolution level must be complete and correct at its own depth, verifiable independently of deeper levels. Sweller's (1988) cognitive load theory explains why progressive disclosure works when done correctly: it manages intrinsic load by presenting complexity in stages, allowing the reasoner to construct mental models incrementally rather than processing the entire information space at once. The extension requires that summaries are true, meaning a reasoner forming beliefs from the summary level forms correct beliefs, even if less detailed. We observed the opposite across multiple domains: executive summaries described aspirations while detailed technical documentation described constraints; API overview pages described ideal usage while reference pages documented limitations; platform marketing described capabilities while changelogs documented regressions. Pinch and Bijker's (1984) interpretive flexibility identifies the structural cause: different organizational functions (marketing, engineering, legal) produce different resolution levels with different interpretive commitments, and without a mechanism to enforce consistency across levels, each level reflects the interpretive framework of its authors rather than a coherent picture of the system. An agent, a regulatory triage system, or a human scanning at summary level would form incorrect beliefs. The degradation was systematic.
Can a reasoner form a correct belief at the summary level, one that is less detailed but still true?
Composed knowledge must preserve provenance, authority, and temporal binding from each source. Standard API composition preserves data integrity. The extension requires preserving knowledge integrity: when an agent composes capabilities from multiple platforms, or an analyst synthesizes disclosures from multiple systems, or a governance body aggregates compliance evidence, the composite must carry the provenance, authority, and currency metadata of each component. Composition failures were observed across domains: comparative assessments lost temporal binding and evidence quality when aggregated; agent workflows that combined outputs from multiple tools lost provenance tracking; regulatory reviews that synthesized multiple filings lost the currency signals of individual submissions.
Can artifacts from independent sources be combined while preserving each source's provenance, authority, and currency?
Knowledge systems must degrade gracefully when producers have incentives to obscure, and must surface the absence of expected evidence as a signal rather than a gap. Existing adversarial systems protect data integrity. The extension protects knowledge integrity: the system should make it visible when expected disclosures are missing, when verification surfaces are absent, and when claims lack evidence. Williamson's (1985) analysis of opportunism under bounded rationality provides the economic foundation: when information asymmetry exists and enforcement is costly, rational actors have incentives to misrepresent or withhold. Zuboff's (2019) surveillance capitalism analysis documents a systemic version: platforms that extract behavioral data while rendering their own extraction mechanisms opaque. Noble's (2018) work on algorithmic discrimination demonstrates that opacity in classification systems perpetuates structural harm. In adversarial environments, narrative substitution (replacing structural description with marketing), context privatization (restricting material facts to private channels), and verification absence (publishing claims without evidence) are common strategies used to resist external analysis. Disclosure standards address this by making disclosure categories explicit: a missing category is a visible signal, visible to any consumer.
Does this system surface missing evidence as an explicit signal rather than a silent gap?
Application · Autonomous agents
DNCE: Declaration, Negotiation, Contract, Evidence
The principles above apply to any consumer. DNCE renders them for a specific and increasingly important consumer class: autonomous agents. Each layer addresses a specific failure mode observed in agent-platform interactions, grounded in the corresponding design principle. The four-layer structure has a structural parallel in Meyer's (1992) design by contract methodology: Declaration corresponds to the contract's interface specification, Negotiation to the precondition evaluation, Contract to the postcondition and invariant guarantees, and Evidence to the runtime verification of contract satisfaction. Toulmin's (1958) argument model provides a deeper structural insight: every capability claim is an argument that can fail at any of its components. A Declaration that omits constraint metadata fails at the data level (insufficient grounds). A Negotiation that accepts a tool without evaluating fit fails at the warrant level (the justificatory principle connecting capability to task is unexamined). A Contract that specifies outcomes without verification conditions fails at the backing level (no grounds support the warrant's authority). An Evidence layer that records outcomes without linking them to specific claims fails at the rebuttal level (counter-evidence cannot be brought to bear). The four DNCE layers map to Toulmin's components, and the failure mode at each layer corresponds to the argumentative failure at the corresponding component.
Machine-readable schemas declaring operations, constraints, authentication requirements, rate limits, cost, and output guarantees. Implements semantic self-description and structural decomposability for agent consumption. Motivated by the observation that agents encountering tool descriptions designed for human developers consistently misinterpret constraint boundaries, because the descriptions rely on contextual understanding the agent lacks.
Artifacts
Capability manifests with constraint metadata, cost declarations, authentication boundary specifications, deprecation signals
Failure mode
Incomplete declarations cause agents to discover capabilities with incorrect assumptions about behavior, producing downstream failures that are difficult to trace to the discovery layer.
Implementation guidance
In practice, declaration encoding begins with existing standards: OpenAPI 3.1 for HTTP-based capabilities, MCP tool schemas for agent-facing tools, JSON-LD for semantic metadata. The extension beyond current practice is constraint metadata. An OpenAPI operation description might declare the endpoint exists; the declaration layer requires that it also publish rate limits (X-RateLimit headers or a structured rateLimit object), authentication scope requirements, cost-per-call estimates, and output format guarantees. A concrete starting point: extend an existing OpenAPI spec with a custom x-constraints object carrying these fields, then validate that an agent consuming only the spec (without human-readable docs) can determine correct invocation boundaries.
How an agent determines whether a capability meets its workflow requirements under current constraints. Implements progressive disclosure and composability at the capability selection layer. Directly addresses Observation 3 (discovery without constraint metadata): current discovery protocols surface capability existence without the comparison metadata needed for informed selection.
Artifacts
Capability matching schemas, constraint satisfaction protocols, dependency graphs, alternative capability pointers, context-cost estimates
Failure mode
Without negotiation infrastructure, agents select the first matching capability. In multi-server environments, this produces suboptimal tool selection at scale.
Implementation guidance
Negotiation requires structured comparison metadata that current discovery protocols omit. A practical implementation: for each declared capability, publish a suitability profile containing latency percentiles (p50, p95, p99), cost per invocation, required context window (in tokens for agent consumers), authentication complexity (number of steps, credential types), and a list of alternative capabilities that serve the same function. This profile can be encoded as a JSON sidecar to existing tool definitions. MCP's current tool listing returns name, description, and input schema; the negotiation layer extends this with the fields an agent needs to choose between competing tools. Teams can begin by instrumenting their most-used endpoints and publishing the resulting profiles alongside existing documentation.
The runtime agreement between agent and platform. Implements temporal binding and verification surface for execution-time knowledge. Motivated by the observation that missing error taxonomies are the most common cause of agent workflow failures: the distinction between 'retry safely' and 'abort immediately' is invisible, producing either cascading retries or premature abandonment.
Artifacts
Typed request/response schemas, error code taxonomies with resolution paths, idempotency contracts, state lifecycle specifications, timeout and retry policies
Failure mode
Missing contracts force agents to treat every failure as ambiguous, with no basis for distinguishing transient from permanent failures.
Implementation guidance
The contract layer builds on typed schemas (JSON Schema, Protocol Buffers, TypeSpec) and extends them with behavioral metadata. The most impactful starting point is a structured error taxonomy: for each operation, publish a machine-readable mapping from error codes to resolution strategies (retry with backoff, retry with different parameters, abort, escalate to human). OpenAPI's responses object supports this partially; the extension is adding a resolution field to each error response with structured retry policy (max attempts, backoff strategy, conditions for escalation). Idempotency contracts can follow Stripe's model: publish an idempotency-key header specification with explicit documentation of which operations are safe to retry. Teams with existing APIs can start by auditing their five highest-traffic endpoints and publishing structured error taxonomies for each.
How an agent confirms that an operation succeeded. Implements verification surface and adversarial resilience at the execution result layer. Motivated by the same pattern observed across domains: claims without verification surfaces propagate through downstream reasoning unchecked. In agent workflows, unverified results compound through multi-step chains.
Artifacts
Output schemas with validation rules, event streams with causal ordering, proof-of-execution patterns, audit trail specifications, rollback conditions
Failure mode
Without evidence infrastructure, agents propagate unverified results through downstream workflows. Errors compound silently across execution chains.
Implementation guidance
Evidence encoding requires that each operation response include verifiable completion markers. A practical pattern: return a structured result object containing the output data, a status field with defined semantics (completed, partial, failed), a verification hash or checksum when the output is deterministic, and a pointer to the audit trail entry for that invocation. For non-deterministic operations, the evidence layer should include the input parameters that produced the result and a timestamp, enabling downstream consumers to assess whether the result remains valid. CloudEvents provides a useful envelope format for event-based evidence. Teams can begin by adding structured result metadata to their most critical operations and validating that an agent can programmatically confirm success without parsing human-readable status messages.
Principles
Context as architectural constraint
Every reasoner operates under context constraints: working memory for humans (~4 chunks, per Miller and Cowan), token windows for LLMs, attention budgets for regulatory reviewers. This is well understood in cognitive science and increasingly studied in LLM research (Anthropic, 2024; Liu et al., 'Lost in the Middle,' 2023). Shannon's (1948) information theory provides the formal foundation: a communication channel has finite capacity, and the efficiency of information transfer depends on how well the encoding matches the channel's constraints. Sweller's (1988) cognitive load theory operationalizes this for human consumers: knowledge artifacts impose three kinds of load (intrinsic, from the subject matter's inherent complexity; extraneous, from the presentation format; germane, from the effortful construction of mental models), and well-designed knowledge architecture minimizes extraneous load while supporting germane processing. Hutchins's (1995) distributed cognition framework extends the analysis beyond individual reasoners: cognitive work is distributed across people, artifacts, and environments, and the design of knowledge artifacts shapes the cognitive ecology within which reasoning occurs. The contribution here is treating context efficiency as an architectural concern for knowledge systems specifically, extending beyond individual LLM prompting strategies to the design of knowledge artifacts themselves.
Expose capability metadata in layers of increasing detail, where each layer is complete at its own depth (per the progressive disclosure principle above). An agent selecting from available tools needs a different resolution than an agent constructing a specific invocation. A human developer scanning options needs a different resolution than one debugging an integration. The knowledge system should support at minimum three fidelity levels: summary (name + one-line constraint), standard (full parameter schema), and extended (examples, edge cases, versioning history).
Starting point
A practical encoding: structure each capability as a JSON document with three nested levels. Level 1 (summary) contains the tool name, a one-sentence description, and primary constraints (auth required, rate limit, cost tier). Level 2 (standard) adds the full parameter schema, response schema, and error codes. Level 3 (extended) adds usage examples, edge case documentation, version history, and deprecation timeline. Each level is independently parseable. An agent operating under context pressure can consume Level 1 for selection, then fetch Level 2 only for the selected tool. OpenAPI's existing summary and description fields approximate Level 1; the extension is making the levels explicit and ensuring Level 1 alone produces correct beliefs.
Every knowledge artifact should declare its consumption cost in consumer-relevant units. For agents: token count of the full schema. For humans: estimated reading time at each fidelity level. For automated systems: parsing complexity and dependency depth. This metadata enables reasoners to make informed allocation decisions when context is scarce. Current standards (OpenAPI, MCP, Schema.org) omit this metadata entirely, forcing consumers to discover context cost through trial.
Starting point
Add a contextCost object to each knowledge artifact with fields for tokenCount (computed by running the artifact through a standard tokenizer), readingTimeSeconds (word count divided by average reading speed, per fidelity level), dependencyCount (number of other artifacts that must be loaded to use this one), and refreshFrequency (how often the artifact changes, informing cache decisions). This metadata can be generated automatically in a CI pipeline: tokenize each schema file, count words in each documentation page, traverse dependency graphs. The overhead is minimal because the computation is automated; the value is that consumers can make informed context allocation decisions before loading the full artifact.
Related knowledge artifacts that are frequently co-consumed should be discoverable as clusters. This reduces independent lookups for agents, cognitive load for humans navigating related topics, and cross-referencing burden for regulatory reviewers. The clustering principle is established in information architecture (card sorting, affinity diagramming). The extension is making clusters machine-discoverable with explicit membership and relevance metadata.
Starting point
Publish a cluster manifest: a JSON document listing groups of related capabilities with co-usage frequency data. For APIs, this can be derived from access logs (which endpoints are called together in typical workflows). For documentation, it can be derived from navigation analytics (which pages are read in sequence). The manifest should include a cluster identifier, member list, a brief description of the shared workflow, and the total context cost of loading the full cluster. MCP servers can expose this as a dedicated resources/clusters endpoint. Teams can start by analyzing their top ten user workflows and publishing the capability clusters those workflows require.
When context pressure forces a reasoner to drop knowledge artifacts, the eviction decision should be informed by metadata: recency of use, task relevance score, and re-acquisition cost. Current agent frameworks evict context arbitrarily or by position (Liu et al., 2023, documented the 'lost in the middle' effect where LLMs underweight information in the middle of long contexts). Knowledge systems should make the cost of forgetting visible through explicit eviction-priority metadata.
Starting point
Add an eviction object to each knowledge artifact with fields for reacquisitionCost (token count or latency to reload), volatility (how often the artifact's content changes, affecting the cost of holding stale data), and taskRelevanceHint (a category tag that agent orchestrators can match against current task context). A practical starting point: assign each artifact a priority tier (critical, standard, supplementary) based on how frequently it is needed across workflows and how costly errors from stale data would be. Agent frameworks can use these tiers as a first-pass eviction policy, dropping supplementary context before standard, and standard before critical. The metadata is static and can be set at authoring time, refined later with usage telemetry.
Core contribution
Knowledge artifacts as behavioral governance
Observation 4 identified a pattern: knowledge artifacts govern consumer behavior regardless of the author's intent. Lessig's (1999) thesis that code is law provides the foundational framework: the architecture of a system regulates behavior as effectively as formal rules, and knowledge schemas function as a regulatory architecture for every consumer that depends on them. MacKenzie's (2006) performativity analysis extends the insight: documentation does more than describe a system; it actively constitutes the informational environment within which decisions about the system are made. Thaler and Sunstein's (2008) choice architecture research demonstrates the mechanism at the individual decision level: the structure of choice presentation systematically influences decisions, even when all options remain formally available. This section isolates four specific mechanisms through which that governance operates. Each mechanism was documented independently across agent workflows, developer integration patterns, and disclosure analysis. The mechanisms are testable: teams can audit their own documentation against each one and measure the behavioral effects.
An agent's interpretation of a tool description determines whether it invokes that tool. A description that says 'deletes user data' suppresses invocation in most contexts; 'cleans up resources' does the opposite. A developer's reading of an API description determines whether they integrate. A regulator's reading of a disclosure determines whether they investigate. This is empirically observable in agent tool-calling logs and developer integration patterns. The implication is that description phrasing should be treated with the same rigor as access control policy, because it produces equivalent behavioral effects.
When a capability schema omits a rate limit, consumers treat the capability as unthrottled. When it omits authentication, consumers assume none is needed. When it omits cost, consumers assume free. When a system's disclosure omits concentration data, analysts default to the system's own claims about distribution. Every omission in a knowledge artifact is an implicit behavioral grant. The practical consequence is that knowledge architects must design omission policies with the same deliberation as inclusion policies.
An error taxonomy that distinguishes retryable from terminal failures shapes whether a consumer retries, escalates, or abandons. This applies to agent retry logic, human debugging heuristics, and automated monitoring. The error taxonomy determines how every class of reasoner responds to adversity. We derived this from agent workflow analysis, where the absence of structured error taxonomies is the single most common cause of cascading retry failures.
When a knowledge artifact changes across versions, every consumer must determine whether their existing assumptions still hold. Agents need to detect workflow validity. Developers need to assess migration cost. Regulators need to re-evaluate compliance. This was the core design insight of the PDAS disclosure standard: release-bound conformance forces disclosure to be temporally specific, making version boundaries into explicit signals rather than silent transitions. The principle applies equally to API versioning, documentation versioning, and capability schema evolution.
Open problems
Governance when AI participates in knowledge creation
AI systems are already participating in knowledge creation, maintenance, and consumption. Four governance problems follow directly, each observable in current production systems. Jasanoff's (2004) co-production framework predicts that the relationship between AI-generated knowledge and the social order that forms around it will be mutually constitutive: the governance mechanisms we design for AI-mediated knowledge will shape what AI produces, and what AI produces will shape what governance mechanisms are possible. Ostrom's (1990) design principles for commons governance provide structural guidance: clear boundaries, proportional equivalence between benefits and costs, collective-choice arrangements, and monitoring are as relevant to knowledge commons as to natural resource commons. Bovens's (2007) accountability framework adds a structural requirement: effective accountability demands an actor, a forum, and an obligation to explain, and when AI systems participate in knowledge creation, the attribution of the 'actor' role becomes ambiguous in ways that existing accountability structures were built to avoid. Dewey's (1927) argument that effective publics form only when citizens have access to material facts about conditions that affect them extends the point: if AI-mediated knowledge production introduces opacity into the knowledge commons, the publics that depend on that knowledge lose their capacity for informed self-governance. These problems are unsolved, and knowledge architecture in the AI era must account for them.
When AI generates a documentation draft and a human edits it, who is the author? When an agent annotates a capability schema based on observed behavior, is that annotation authoritative? Knowledge provenance becomes a governance question the moment AI participates in knowledge creation. Current systems have no standard for attributing authority across human-AI collaboration, and the absence of such standards will produce provenance ambiguity at scale.
Two agents operating against the same platform form different beliefs about a capability's behavior because they observed it at different times or under different configurations. When both contribute to a shared workflow, their inconsistent beliefs produce inconsistent outputs. Mechanisms for detecting and resolving belief conflicts remain an open research problem, analogous to the consistency challenges in distributed systems.
Capability metadata degrades as platforms evolve. In human-only systems, documentation staleness is a quality problem with gradual consequences. In AI-mediated systems, stale knowledge is a failure vector: agents form beliefs about systems that no longer exist, verification systems validate against outdated baselines, and AI-generated content propagates obsolete facts at machine speed. Detection requires continuous verification against live evidence, with cost-aware strategies for verification frequency.
As AI generates increasing volumes of documentation and capability metadata, the traditional editorial governance model collapses under scale. At production volumes, human review gives way to governance infrastructure: automated quality standards, consistency checking against source-of-truth systems, attribution policies, and escalation frameworks. The governance design question is analogous to code review automation: which checks can be automated, which require human judgment, and how do you detect when the automated checks are failing?
Measurement
Five metrics for knowledge system quality
The design principles above imply measurable properties. The following five metrics operationalize them. Each can be computed automatically for agent workflows (token counts and action outcomes are observable) and approximated for human workflows (reading time, integration success rates). Their correlation with downstream reasoner success requires empirical validation, which these metrics are designed to enable.
The percentage of a system's knowledge artifacts that carry full self-describing metadata: scope, authority, currency, relationships, and verification surface. Partial coverage creates asymmetric reasoning: some knowledge is reliable, some is degraded, with no signal to distinguish them.
The ratio of context spent on knowledge the reasoner actually uses versus total context spent on knowledge acquisition. Higher ratios suggest more efficient disclosure structures, though the optimal ratio varies by task complexity, reasoner architecture, and domain specificity. This metric can be measured automatically for agent workflows (token counts are observable) and approximated for human workflows (reading time vs. task-relevant reading time).
The percentage of actions (by any consumer) that produce the outcome the system intended, as determined by the knowledge artifact. Low fidelity indicates the knowledge layer is misleading: the consumer forms beliefs about what will happen, and the results diverge. This is the most direct measure of whether the knowledge architecture produces reliable beliefs, and the most straightforward to validate empirically.
The percentage of claims in the knowledge system that can be independently verified. Claims without evidence are assertions that consumers must accept on trust. In adversarial environments, unverifiable claims are the primary vector for exploiting any reasoner's decision-making.
The percentage of knowledge artifacts that accurately describe the current version of the system. Temporal incoherence means the knowledge describes a system state that may no longer exist. Documentation describing previous system architectures with no signal of the divergence was the single most common failure mode observed across our analysis work.
Boundaries
Where the principles reach their limits
The design principles above assume that system properties can be formalized into structured knowledge artifacts. This assumption has limits that the framework must acknowledge. Wittgenstein's (1953) private language argument and Polanyi's (1966) tacit dimension both point toward the same boundary: some knowledge resists the kind of explicit articulation that structured schemas require, and the framework's scope is defined by where that boundary falls.
Polanyi (1966) demonstrated that practitioners know more than they can articulate: a skilled diagnostician's pattern recognition, a master craftsperson's material judgment, a senior engineer's architectural intuition all contain knowledge that resists specification in structured schemas. In knowledge systems, this manifests as the gap between what documentation says and what experienced practitioners know. Nonaka and Takeuchi's (1995) SECI model describes the conversion process between tacit and explicit knowledge: socialization (tacit to tacit, through shared experience), externalization (tacit to explicit, through articulation), combination (explicit to explicit, through systematization), and internalization (explicit to tacit, through practice). The design principles address combination and enable externalization, but they cannot compel it. Some system knowledge (the judgment calls behind governance decisions, the operational intuitions of experienced validators, the social dynamics that determine development priorities) may remain structurally tacit, accessible only through the socialization channel that context privatization disrupts.
Simon (1962) demonstrated that complex adaptive systems exhibit emergent properties that cannot be predicted from component-level specifications. The behavior of a DeFi protocol under stress, the governance dynamics of a large token-holder community, the cascading effects of a bridge failure across interconnected systems: these are properties of the system-as-a-whole that resist decomposition into the structured primitives the design principles require. The principles address the specifiable space effectively. The boundary between what can and cannot be specified is empirical rather than theoretical, and pushing that boundary further toward the specifiable end is an ongoing engineering challenge rather than a solved problem.
Even perfectly structured knowledge requires interpretation, and interpretation varies across consumers. Pinch and Bijker's (1984) concept of interpretive flexibility holds that different social groups construct different meanings from the same artifact. A machine-readable manifest eliminates syntactic ambiguity but preserves semantic ambiguity: two agents consuming the same structured schema may interpret constraint boundaries differently depending on their training data and inference architecture. Two human analysts reading the same structured disclosure may reach different evaluative conclusions depending on their domain expertise and analytical framework. The design principles reduce interpretive flexibility by increasing structural precision, but they cannot eliminate it. The residual interpretive flexibility is a feature of heterogeneous reasoning rather than a defect of knowledge architecture.
Convergence
Independent design processes, shared structural requirements
The design principles above and the PDAS disclosure standard were developed independently for different audiences. The following three mappings show where they converge on identical structural requirements, arrived at from different starting points. The mappings are specific enough to test: any system that satisfies one side of each mapping should satisfy the other without additional work.
A system that publishes its interfaces, dependencies, governance structure, and operational parameters in structured form has already produced a declaration layer. Disclosure surfaces map directly to capability declarations when the consumer is an agent evaluating integration, and to compliance evidence when the consumer is a regulatory body. This was the original observation that motivated the design principles: the same structured facts, organized by the same principles, serve both audiences from a single source.
Disclosure standards bind disclosure to a specific release. DNCE binds execution terms to a specific version. API versioning strategies bind documentation to a specific endpoint configuration. The temporal binding principle requires this of all knowledge artifacts. The convergence is direct: independent design processes in different domains arrived at the same requirement from different consumer classes. This independent convergence strengthens the case that temporal binding is a property of reliable knowledge, regardless of consumer.
Disclosure standards require machine-readable manifests alongside human-readable guides. DNCE requires machine-readable verification of execution results. OpenAPI requires machine-readable endpoint specifications alongside human-readable documentation. All refuse to accept prose as proof. The convergence suggests that regulatory disclosure, agent-facing knowledge architecture, and API documentation are expressions of the same requirement: structured, verifiable evidence about what a system is and what it does, consumable by any reasoner that needs to form beliefs about it.
Trajectories
Knowledge as protocol layer
The following trajectories are conditional on three observable trends continuing: (1) autonomous systems becoming widespread consumers of platform metadata (current MCP adoption, enterprise agent deployment), (2) market reward for structured knowledge investment (emerging in developer experience metrics), and (3) regulatory mandates for structured disclosure (EU AI Act effective August 2026, MiCA white paper requirements). If these trends stall, the trajectories slow accordingly.
Early adopters begin generating human-readable and machine-readable artifacts from shared source material, driven by the practical need to serve both developer and agent consumers without maintaining parallel documentation. AI-augmented authoring is likely to become the dominant production method for teams at scale: AI generates, humans curate. Verification systems that check documentation against live system state are beginning to appear in CI pipelines. If the EU AI Act's structured documentation requirements take effect as scheduled, they will accelerate the shift by creating regulatory demand for machine-readable system descriptions. Knowledge architecture may begin to emerge as a distinct subdiscipline, though whether it achieves the same formality as existing engineering roles depends on tooling maturity and market demand for structured knowledge investment.
If structured knowledge investment proves economically viable in the near term, documentation portals are likely to decline as primary interfaces. Knowledge systems would become queryable: consumers ask questions against structured knowledge rather than reading documents. AI-mediated knowledge governance would move into production for organizations operating at scale. The design principles proposed here could become engineering requirements with the same formality as API contracts. The boundary between 'using a system' and 'understanding a system' would begin to blur as knowledge protocols enable any consumer to form reliable beliefs through structured interaction. The pace depends on whether the tooling developed in the near term reduces the cost of structured knowledge production to a level that makes broad adoption economically rational.
In this trajectory's most developed form, knowledge architecture becomes a protocol-layer concern: systems expose capabilities, constraints, governance, and verification surfaces through standardized knowledge protocols with the same formality as network protocols. Technical writing would transform into knowledge protocol engineering, concerned with the design, verification, and evolution of knowledge surfaces that serve heterogeneous consumers from a single structured source. This outcome is speculative and depends on the medium-term trajectory materializing. It represents the logical endpoint of the trends described above rather than a prediction.
Structural shifts
When any reasoner's ability to operate a system depends on the knowledge layer's accuracy, the knowledge layer inherits the reliability, versioning, and correctness requirements of the execution layer. A capability schema that returns incorrect metadata is functionally equivalent to an API that returns incorrect data.
The core competency shifts from clarity of exposition to correctness of specification. The quality metric shifts from readability scores to schema fidelity: the rate at which consumers form correct beliefs about system behavior from knowledge artifacts.
Building separate knowledge systems for developers, agents, regulators, and internal tools is economically unsustainable and produces inconsistency. The design principles require a single structured source from which consumer-specific renderings are derived.
Disclosure standards target human analysts and regulators. DNCE targets agents. OpenAPI targets developers. As all converge on the same structural requirements, the standard that serves all audiences from the same principles is positioned to define the next generation of knowledge infrastructure.
Contributions
Contributions and open questions
To information architecture
Eight extensions to established IA principles, each identifying where existing principles break when the consumer is unknown at design time. The extensions draw on Bowker and Star's (1999) insight that classification decisions shape cognition, Star and Griesemer's (1989) boundary object theory for cross-community coordination, and Gigerenzer's (2000) ecological rationality for understanding consumer-environment fit. The extensions are falsifiable: any system that passes all eight tests has addressed the failure modes documented here. Any system that fails them is vulnerable to the same failures regardless of domain.
To agent-platform design
The DNCE framework provides a four-layer architecture for agent-facing knowledge, structurally parallel to Meyer's (1992) design by contract and Toulmin's (1958) argument model, with implementation guidance grounded in existing standards (OpenAPI, MCP, JSON Schema, CloudEvents). Teams can begin adopting individual layers without committing to the full framework.
To documentation practice
Four specific mechanisms through which knowledge artifacts govern consumer behavior (descriptions as authorization, omissions as permissions, error taxonomies as behavioral contracts, version boundaries as migration signals), grounded in Winner's (1980) politics of artifacts, Lessig's (1999) code as law, and Callon's (1998) performativity of economic instruments. Each mechanism is independently auditable against existing documentation.
To governance theory
The behavioral governance analysis extends Ostrom's (1990) commons governance principles to knowledge infrastructure, identifies AI-mediated knowledge as a co-production problem in Jasanoff's (2004) sense, and applies Williamson's (1985) transaction cost framework to information asymmetry in knowledge systems. The four open governance problems provide a research agenda for the intersection of AI systems and knowledge governance.
To measurement
Five metrics (declaration coverage, context efficiency ratio, schema fidelity, evidence completeness, temporal coherence) that define what should be measured in knowledge systems serving heterogeneous consumers. Evaluated through Lakatos's (1970) progressiveness criteria: the metrics are predictive instruments whose correlation with downstream reasoner success rates provides the empirical test of the research programme's validity.
This paper reflects active research. The design principles, DNCE framework, and measurement framework will be revised as implementation patterns stabilize and empirical validation data becomes available. It connects directly to the legibility thesis that unifies this work with protocol disclosure and structural analysis.