entviz¶

Version: 15

Part A: Framing¶

Part A is non-normative. It orients the reader; every obligation lives in Producing an entviz and Conformance and verification.

Introduction and goal¶

Entviz is a simple way to visualize values with high entropy — cryptographic keys and signatures, UUIDs, blockchain payment addresses, post-quantum keys, genomes, and so forth — so a human can compare them visually. The goal is to allow an untrained adult with reasonably good vision to easily decide whether two chunks of entropy are the same or different.

example entviz

Figure 1. A representative entviz, showing every channel at once.

Compare entmotif, which turns entropy into music. The excellent randomart algorithm used with SSH keys is also related; it has a similar goal to entviz, but accepts different constraints and uses a different approach.

Notation and requirements language¶

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 and RFC 8174 when, and only when, they appear in all capitals. Lowercase uses of these words carry their ordinary English meaning and impose no requirement.

This document interleaves normative text (the requirements an implementation must satisfy) with non-normative text (explanation of why a requirement exists). Non-normative passages are introduced by Note or Rationale, or set off as block quotes. An implementation cannot become non-conformant by disregarding a note; such notes are retained deliberately for readability, but they never add or relax an obligation. Figures are illustrative and non-normative except where a normative passage cites a specific value they depict.

The Conformance section defines what it means to produce a correct entviz: the abstract render model an implementation must compute, the equivalence relation a checker uses to compare two renderings, the required SVG profile and paint order, and the error conditions every implementation MUST enforce. Read it together with Producing an entviz: that part says how to compute an entviz; Conformance says what a checker is entitled to verify.

Requirements and non-requirements¶

Requirements¶

Work in environments that can draw bitmapped or vector graphics.
Losslessly represent all bits of entropy up to 512 bits. For larger inputs, losslessly represent the head and tail in the text channel and bind the entire input through the fingerprint (see Large-input handling).
Make it easy to read the entropy value out loud without the reader losing track of where they are.
Support efficient partial comparisons (spot-checking).
Guarantee that input entropy with even minor differences produces obvious visual differences, even when the input lacks an avalanche effect of its own.
Use 16 million colors (256 × 256 × 256). However, guarantee that entropy with even minor differences continues to have obvious visual differences in 256-color environments and in 256 shades of gray.
Be usable by people with red-green, blue-yellow, and complete color blindness.
Be trivial to implement correctly, with no significant dependencies.

Non-requirements¶

Make it easy to remember all the details in a visualization. (Remembering a few arbitrarily chosen features of an entviz should be easy, but remembering all its details is unrealistic. The more appropriate goal is easy comparison to a saved copy.)
Work in pure text environments. (Few pure text environments exist; even linux shells can save a file for viewing in a browser. Use randomart or invent a variation on this algorithm instead.)

Concepts and vocabulary¶

A diagram produced by this algorithm is called an entviz. Entvizes can be categorized according to the dimensions of the grid into which they render: a "3x4 entviz", a "5x9 entviz", etc. Dimensions are given in Width x Height order. The maximum expressive capacity of an entviz of dimensions NxM is equal to 24 * N * M bits, although slightly less information may be communicated, depending on how the entropy is serialized to text.

The input being visualized is the entropy. The entropy is serialized to text and chopped into tokens, each of which represents 24 bits of entropy (or as close as possible on even character boundaries). The number of tokens is the token count.

Most of the entviz is drawn from the fingerprint rather than from the entropy directly. Specifically, the text of each cell and the background color of each cell's nucleus are derived from the entropy, preserving losslessness for inputs of 512 bits or less. Everything else — the surround-box pattern in each cell, the median and quartile calculations, blank cell placement, the entviz background color, the blank-cell fills, the color bar, and the ellipse overlay — is derived from the fingerprint. (A cell's surround edge color is normally derived from its nucleus, hence indirectly from the entropy; v10 makes the edge color of three cells — the top-left cell and the 1st/2nd quartile cells — fingerprint-derived instead. See Casual avalanche.)

The channels at a glance¶

This section is non-normative. It names each of the six channels and the job it does, as a first-read mental model; all mechanism is specified in Producing an entviz.

Each entviz conveys its entropy through six channels. Some are lossless carriers of the input; others are fingerprint-driven and exist to make differences pop at a glance. No single channel is meant to be the sole comparison method — they reinforce one another.

1. Text. Each entviz conveys its entropy fully and independently, in a first visual channel, as text. If the text is read aloud, taking into account case-sensitivity, all information is transferred. Text is tokenized into cells organized into a grid, read left-to-right and top-to-bottom. For inputs of 512 bits or less this channel is fully lossless; for larger inputs it shows the head and tail plus a fingerprint readout in the middle (see Large-input handling). The text channel does not, by itself, provide a visual avalanche effect: two inputs differing by a single character show nearly identical text. Its role is verbatim fidelity, not difference amplification.

Note: when reading entviz text aloud, the convention is to precede each capital letter with the one-syllable prefix "cap", to read the hyphen character - as "dash", and to read the underscore character _ as "under". This minimizes the number of syllables while eliminating all ambiguity.

text channel

Figure 2. The text channel: the normalized input, tokenized into grid cells.

2. Surround. A second channel rings each cell's nucleus with a pattern of 24 small boxes, each toggled by one fingerprint bit, so the surround avalanches for careful cell-by-cell comparison.

surround channel

Figure 3. The surround channel: 24 fingerprint-driven boxes ringing each nucleus.

3. Nucleus color. A third channel colors the background behind each cell's text from the entropy itself, so it stays lossless for inputs of 512 bits or less. Fine color gradations may be imperceptible, and disappear below 16 million colors, so the nucleus color is a partially redundant hint: never misleading, but not a primary comparison method.

nucleus color channel

Figure 4. The nucleus-color channel: each cell's 24-bit token read as an RGB background.

4. Blank cells and quartile marks. Zero or more cells may be blank, and four cells carry small quartile marks; both are placed from the fingerprint. They are easily checked by eye and act as a sort of visual CRC, surfacing differences that might otherwise hide in the middle of long strings or at the ends of tokens.

visual CRC

Figure 5. The visual CRC: the blank-cell map's plus (max) and dot (min) markers and quartile marks.

5. Color bar. A color bar along the left edge, derived from the fingerprint, provides a redundant channel for rapid gestalt comparison: two entvizes with different 2-bit-pattern histograms differ visibly in the bar before any cell-by-cell comparison begins.

6. Ellipse overlay. A partially transparent ellipse overlay, also fingerprint-derived, darkens or lightens the surround boxes and grid background beneath it (never the nuclei or text), contributing a large, organic shape to the overall gestalt so a quick, high-level glance is more informative.

Avalanche — the guarantee that even a one-character change produces an obvious visual difference — is provided by the fingerprint-driven channels (surround, blank/quartile placement, color bar, ellipse, and the v10 color levers), not by the text or nucleus-color channels.

Part B: Producing an entviz¶

This part is normative and is written in pipeline order: each section produces an intermediate result the next one consumes.

Input normalization and characterization¶

Normalize the input.

Remove all whitespace.
Detect the entropy type, if possible, and split the input into prefix, core, and suffix, with all three pieces of data normalized. Normalization MUST eliminate case differences, putting the entropy in canonical case, with canonical punctuation. It MUST identify prefixes that are not true entropy (e.g., the "0x" prefix on an Ethereum address, the "AAAA" at the front of an SSH key, etc.). It MUST identify suffixes that are checksums or derivations of the true entropy. A suffix is reserved for material bound to the entropy — a checksum or derivation that cannot vary while the entropy is fixed (e.g. an LEI's MOD 97-10 check digits, a base58check checksum). Material that is merely a free annotation — attached to the value but able to vary while the value stays fixed, and often unbounded in length (an SSH public-key comment such as user@host, or SWHID qualifiers such as ;origin=…;lines=…) — is not a suffix: the parser still recognizes the input via its core, but the annotation is dropped (not entered into the core, not surfaced as a suffix). Dropping free annotations keeps non-value, attacker-mutable, potentially-unbounded text out of the visualization (see the label-strip step), and is consistent with the out-of-band-only treatment of user captions. The reference implementation in python has an entropy module with a parse(txt) method that can be used as an oracle, and it has unit tests that can provide a test vector. Presentation vs. identity vs. annotation (the swap test). Every part of the input is classified by whether it carries identity bits — information that distinguishes this value from another value of the same shape. The fingerprint (and, where possible, the cells) MUST bind exactly the identity bits and MUST NOT bind the rest. There are three classes:
Presentation — describes how the value is written, not what it is: 0x (hex notation), a multibase base selector, SSH/PEM serialization framing, and case itself. Presentation is normalized away: it is shown in the label (as the encoding/type name) but enters neither the cells nor the fingerprint. This is the same principle as case-normalization, generalized to radix and serialization.
Identity — part of what the value is; a type/role discriminator whose choice changes the denoted object. It MUST bind the fingerprint. The canonical case is the CESR derivation code: B, C, D, E may each precede the same 43 base64url characters to mean a non-transferable key, an encryption key, a transferable key, or a Blake3-256 digest — four different, security-relevant objects (B vs D is non-transferable vs transferable).
Annotation — freely attached and independently variable (an SSH key comment, SWHID ;origin=… qualifiers). It is dropped entirely (see the free-annotation rule above).

The swap test (presentation vs. identity). Hold the body fixed and ask: is there another legal prefix that could occupy this position and make the string denote a different value — with the body unchanged? * No, or the body would have to re-encode to stay legal → presentation. Either the prefix is constant for the format (0x — there is no 0y for the same 20 bytes), or swapping it forces the body into a different alphabet (multibase f↔b re-encodes hex↔base32). The body-must-re-encode case is the tell of an encoding selector. → strip; bind nothing. * Yes, body byte-for-byte unchanged, meaning changes → identity. CESR B↔D↔E; a SWHID cnt↔rev over the same hash. → bind it.

How identity material is bound — two mechanisms by alphabet. * In the core (preferred, when the discriminator shares the body's alphabet and is contiguous): it is not split off — it stays in the core string, so it is rendered in the cells and bound by the fingerprint (which hashes the core text). This is correct precisely because we hash text, not decoded bytes. Used by the CESR derivation code (base64url, leading) and the LEI LOU issuer code (base36, leading); the decoded type goes in the label. Keeping the code in the cells matters for short inputs, which have no fingerprint-middle cells — the cells are then the only read-aloud carrier of the distinction. * Prefix-fold (when the discriminator is a different alphabet from the body or is otherwise not cleanly part of the cell stream): it is kept as the prefix (shown decoded in the label, not in the cells), and the fingerprint hash input becomes prefix ‖ core text. Used by the SWHID/gitoid object-type and hash-algorithm (letters ahead of a hex body). Folding the whole self-framing prefix is sufficient: its constant parts (swh:1:, gitoid:) contribute uniformly and the discriminator (cnt, blob) distinguishes; without it a SWHID and a gitoid over the same git hash would collide in every fingerprint channel.

A note on the multiformats family, since multibase and multicodec are easily conflated: multibase (the leading b/f/z/m/u base selector) is presentation — strip it. multicodec (the content codec / hash-fn code) is identity — but in a CID it is encoded inside the base32 body, so it is already part of the core text and already bound; no special handling is needed.

(This is the prefix-side analogue of the bound vs. free rule for suffixes above: both ask whether a piece of the input carries identity bits. See this.i:s3mpr3fx.)

If no specific-format parser matches, the implementation MUST attempt alphabet detection by disproof: iterate through the known alphabets from most-restrictive to least and return the first one whose character set contains every character of the input. The order MUST be: hex → base32 → bech32 → base58 → base64 → base64url. Hex/base32/bech32 detection is case-insensitive; base58/base64/base64url are case-sensitive (they treat upper and lower case as distinct characters). A successful disproof match treats the input itself as the normalized core under the detected alphabet — "no re-encoding" meaning the core is not re-serialized into a different alphabet or radix (unlike the UTF-8→base64url fallback below). It does not waive case normalization: for the case-insensitive detections (hex, base32, bech32) the core is still case-canonicalized by the normalization step's per-alphabet rule — base32 → UPPER (RFC 4648), hex and bech32 → lower — exactly as the specific-format parsers do. Without this, the same value would fingerprint differently depending on the case it was pasted in, and a bare base32 fragment would diverge from the same value parsed by a specific base32 parser (which uppercases). Known limitation: bech32's alphabet excludes 1 (it's the bech32 separator character), so a bare bech32 fragment that happens to contain a 1 will fail bech32 disproof and resolve to the next matching alphabet (base58, base64, base64url, or UTF-8 fallback). Real bech32 addresses with their bc1/tb1/ltc1/addr1/bitcoincash: prefixes are handled correctly by the specific-format parsers ahead of disproof; this caveat applies only to bare fragments pasted without their prefix.
If even disproof finds no fit (e.g., the input contains a space, punctuation, or other unencodable characters), fall back to treating the input as an arbitrary bag of bits: encode the input string to UTF-8 bytes, then re-render those bytes as a URL-safe base64 string (no padding). The resulting base64 string is treated as the normalized core; the type is base64. UTF-8 is the canonical byte encoding for the fallback path; implementations MUST NOT use other encodings (Latin-1, UTF-16, etc.) because that would change the fingerprint of identical-looking inputs.

Case normalization is intentional and load-bearing. For every case-insensitive alphabet, normalization canonicalizes case before the input is fingerprinted. Most alphabets (hex, UUID, bech32, crockford32, and EOS's base alphabet) canonicalize to lower case; base32 canonicalizes to upper case, which is its RFC 4648 convention (see the base32 alphabet note below). The direction does not matter — what matters is that it is consistent per alphabet. As a result, two inputs differing only in case for a case-insensitive alphabet produce identical entvizes. This is required: without it, a benign tooling difference (one system emits uppercase hex, another lowercase) would render as an entropy difference. Cell text shows the normalized form, not the input form. (See this.i:c4s3norm.)

Ethereum (EIP-55) case validation. Ethereum is the one alphabet where case carries semantic information: EIP-55 encodes a checksum in the address's hex-digit case pattern. An all-lowercase or all-uppercase Ethereum address is conventionally understood as "checksum not asserted" and is accepted unchanged. A mixed-case Ethereum address whose case pattern matches the EIP-55-derived canonical case is accepted. A mixed-case Ethereum address whose case pattern fails the EIP-55 checksum MUST be rejected at parse time with an error identifying the first mismatched-case digit; it must not be silently re-normalized to the canonical case (which would let a substituted-but-corrupted address render identically to the legitimate one). (See this.i:3ip55rj1.)

Checksum verification (v14). A parser MAY surface a bound checksum as a suffix (shown in the bottom label strip) only if it has VERIFIED that checksum. When an input structurally matches a checksummed scheme — the right leading character(s), length, and reserved bytes — but its bound checksum fails, the implementation MUST reject the input with an error (as for EIP-55), rather than either rendering it with a bad checksum on display or silently falling through to a bare-encoding label. Showing a checksum that has not been checked would let a substituted-but-corrupted value render as if valid; verifying it closes that gap. The schemes and their checks:

base58check (Bitcoin legacy, Litecoin legacy): base58-decode the whole address and verify that the trailing 4 bytes equal the first 4 bytes of SHA-256(SHA-256(payload)). On mismatch, reject.
bech32 / bech32m (Bitcoin segwit bc1…/tb1…, Litecoin ltc1…, Cardano Shelley addr1…/stake1…, and the generic <hrp>1<data> Cosmos-family form): verify the BIP-173 (bech32, constant 1) or BIP-350 (bech32m, constant 0x2bc830a3) polymod over hrp_expand(hrp) ‖ data on every bech32 path — including the specific bc1/ltc1 parsers, which historically skipped it. On mismatch, reject.
CashAddr (Bitcoin Cash, bitcoincash:/bchtest: or a bare q…/p… body): verify Bitcoin Cash's own 40-bit BCH checksum — a distinct code from the bech32 polymod — over [c & 0x1f for c in prefix] ‖ [0] ‖ [base32_charset_index(c) for c in payload], where payload includes its 8 trailing checksum characters and the default prefix for a bare body is bitcoincash. On mismatch, reject.
LEI (GLEIF, ISO 17442): a 20-character base36 string with the reserved 00 at positions 4–5 is an unambiguous LEI, so a failing ISO/IEC 7064 MOD 97-10 check on the 2 trailing check digits MUST reject — it MUST NOT fall through to a generic base36 encoding.
Ethereum EIP-55: already covered above; unchanged.

Accepted trade-off. A base58, bech32, or CashAddr blob that structurally matches a checksummed address but has a bad checksum is rejected, not rendered as a bare-encoding value. This is the intended "no entviz from an invalid checksum" behavior: an entviz of a corrupted address would be actively misleading. Cardano Byron is the one recognized scheme whose integrity check is not verified here: its check is a CRC-32 embedded inside the CBOR-decoded payload — there is no trailing base58 checksum field — so verifying it would require a full CBOR decode (out of scope), and there is nothing to peel off as a suffix. Accordingly a Byron address surfaces no suffix at all (suffix = null); the whole body is its core. It is thus recognized-but-unverified, but it never displays an unverified checksum, which is what the v14 rule requires. (Shelley, the modern bech32 addr1…/stake1… form, is fully checksum-verified above.) See this.i:v14lbl and reviews/v14-label-redesign.md.

UUID dash handling. A UUID may be supplied in canonical 8-4-4-4-12 form (with dashes) or as 32 contiguous hex characters (without dashes). Both forms are accepted and produce identical entvizes, because dashes are a display convention per RFC 4122, not part of the value. This is a permanent intentional invariant. (See this.i:uu1ddash.)

Decentralized Identifiers (DIDs). A DID has the form did:<method>:<method-specific-id>, optionally followed by a DID URL tail — a path (from /), a query (from ?), and/or a fragment (from #). Per the W3C DID Core ABNF the method-specific-id MAY itself contain : as a segment separator (method-specific-id = *( *idchar ":" ) 1*idchar; idchar = ALPHA / DIGIT / "." / "-" / "_" / pct-encoded), so an implementation MUST treat : as an ordinary body character and MUST end the method-specific-id only at the first /, ?, or # (or end of input). Recognition requires a non-empty method and a non-empty method-specific-id; input that does not match this shape falls through to alphabet-detection-by-disproof / the UTF-8 fallback like any other unrecognized value.

The DID URL tail is a free annotation and MUST be dropped. Everything from the first /, ?, or # onward (a service path, a ?versionTime=… or ?-ion-initial-state=… query, a verification-method #fragment) is dereferencing context that varies independently of the identifier. It is not entered into the core and not surfaced as a suffix — the treatment already given SWHID ;… qualifiers and SSH key comments (see the free-annotation rule above).
The method name is identity; it binds by prefix-fold. By the swap test, the same method-specific-id under a different method denotes a different DID (resolution, key binding, and the denoted controller all change), so the method is identity, not presentation. The did:<method>: literal is kept as the prefix (shown in the label, not in the cells) and the fingerprint hash input becomes prefix ‖ core — the prefix-fold mechanism used by SWHID/gitoid (see How identity material is bound and the fingerprint step). The constant did: contributes uniformly; the method discriminates. Without this, two DIDs sharing a body but differing in method would collide in every fingerprint-driven channel.
The method-specific-id is the core, kept verbatim. Every character — internal :/. separators, a leading multibase selector (did:key's z), a network or chain identifier (did:ethr:0x89:…, did:cheqd:mainnet:…), and any self-certifying hash — stays in the core, rendered in the cells and bound (via prefix ‖ core) by the fingerprint. This is sound precisely because the fingerprint hashes text, not decoded bytes (see Why the fingerprint hashes text, not decoded bytes): method, encoding selector, network, and key/hash material are all bound, and the text channel stays a faithful read-aloud of the DID as written. The core MUST NOT be percent-decoded (decoding would insert a parser ahead of the hash — the malleability surface the text-hashing rule exists to avoid).
Case is preserved. DIDs are case-sensitive (DID Core) and most method-specific-ids are case-sensitive encodings (base58btc, base64url), so the core is kept exactly as written; the per-alphabet case-canonicalization that applies to bare encodings (the Case normalization rule) does not apply inside a DID. The consequence is fail-safe: two DIDs differing only in case render differently — a false negative (the safe direction), never a false match.
Tokenization alphabet = base64url. The method-specific-id is tokenized as base64url (4-character, 24-bit tokens). This uniform default is deliberate: base64url's alphabet is a superset of the base58/base64url bodies the great majority of methods use, and every non-hex alphabet in this spec already tokenizes on the same 4-character boundary — so choosing base58 vs bech32 vs base64url for a DID body would change only the nucleus-color hint channel, never the token boundaries, the fingerprint, or the read-aloud text. A character outside the base64url alphabet (a domain's ., a : separator, a % escape) contributes a zero quant to its cell's nucleus color and is otherwise carried verbatim. Tokenization never case-folds.
Label. A DID's top strip is did:<method> (the self-describing-prefix PRIMARY slot — see the label-strip step); the method is shown verbatim and no per-method sub-label is decoded.

Note (non-normative): no per-method special-casing in v11. Because the generic handling above already binds a DID's full identity and is encoding-agnostic at the token boundary, v11 visualizes every well-formed DID — recognized method or not — by this single path; there is deliberately no per-method table. Two refinements were considered and deferred as additive, non-breaking options, each touching only a named method: (a) reusing the Ethereum EIP-55 reject for did:ethr/did:erc725 (declined for now because it conflicts with the clean case-preservation rule and is fail-safe without it); and (b) dropping the long-form initial-state of did:ion/did:prism as a free annotation (declined for now because the ?-ion-initial-state= form is already dropped by the DID-URL rule, while the :-suffix form needs per-method structure not yet validated — e.g. the did:ion:test:Ei… network ordering). See this.i:d1dm3th0.

Uniform Resource Names (URNs). A URN (RFC 8141) has the form urn:<NID>:<NSS> — a namespace identifier (NID) and a namespace-specific string (NSS) — optionally followed by r-/q-/f-components (an r-component beginning ?+, a q-component beginning ?=, and/or an f-component beginning #). A URN is the same shape as a DID — urn: ↔ did:, NID ↔ method, NSS ↔ method-specific-id, the components ↔ the DID URL — and is handled by the same generic path, with the two differences noted below. (DIDs were designed in the URN tradition; the parallel is deliberate.) Recognition requires a non-empty NID and a non-empty NSS; input that does not match this shape falls through to alphabet-detection-by-disproof / the UTF-8 fallback.

The rules shared with Decentralized Identifiers above: the NID is identity (urn:isbn:… ≠ urn:issn:…) and binds by prefix-fold (the prefix is urn:<nid>:; the fingerprint hash input is prefix ‖ core); the NSS is the core, kept verbatim (rendered in the cells, bound via prefix ‖ core, and not percent-decoded); the core is tokenized as base64url; and the URN renders with no type and urn:<nid>:... as its self-describing prefix label. The r-/q-/f-components are a free annotation and MUST be dropped — RFC 8141 states explicitly that they are not part of URN equivalence, so they carry no identity (a resolution hint or fragment, exactly like the DID URL).

The two differences from a DID:

Body terminator. In a URN, / is a legal NSS character and is part of the identity (urn:example:a123,z456/foo), and the components begin only at ? (which introduces both ?+ and ?=) or #. The NSS therefore ends at the first ? or # (or end of input) — not at /, unlike a DID's method-specific-id. (The NSS also admits a wider punctuation set than an idchar; under base64url tokenization any out-of-alphabet character contributes a zero quant to its nucleus color and is otherwise carried verbatim, exactly as for a DID.)
Case. RFC 8141 makes the urn scheme and the NID case-insensitive while the NSS is case-sensitive. The implementation therefore lowercases the urn:<nid>: prefix (so URN:ISBN:0451450523 and urn:isbn:0451450523 produce identical entvizes — the correct equivalence) and preserves the NSS exactly (so it stays a faithful, case-sensitive read-aloud of the body). This is the one place the URN rule diverges from the DID rule, which preserves all case.

Like DIDs, URNs are not per-namespace special-cased in v11: urn:uuid:…, urn:oid:…, urn:isbn:…, and the rest all ride this single generic path (the NSS is bound verbatim regardless). See this.i:d1dm3th0.

Entropy characterization¶

The entropy characterization re-expresses the parser's recognition of an input along independent axes. It replaces the single opaque type-label as the structured source consumers (labels, UI pills, developer APIs) read, and defines the derived entropy_type field. An implementation MUST emit the characterization onto the root <svg> as data-* attributes — data-encoding, data-scheme, data-role, data-size-basis, data-entropy-type (each a string; the empty string for a null scheme/role), data-size-bits (the integer as a decimal string), and data-qualifiers / data-parts (compact JSON) — and the conformance model extractor recovers the eight fields from those attributes, so each implementation is compared against its own characterization rather than one the checker recomputes. These attributes are reporting-only metadata that adds no ink (the closed profile explicitly permits additional data-* attributes), so emitting them changes no rendered pixel, no fingerprint input, and no label string; the golden raster is unaffected. The characterization is not part of the equivalence relation (two conformant renderings of the same value have the same characterization, but the characterization is metadata about the input, not about the drawing). It also appears as additional top-level fields in the golden model.json, alongside the render model, for every render vector.

A characterization comprises exactly the following fields, identical in shape for every input:

encoding — the declared alphabet name of the core (hex, base58, bech32, base32, base64, base64url, base36, crockford32, decimal). This is the alphabet that drives tokenization.
scheme — the recognizer or namespace that fired (for example cesr, btc, bch, eth, xlm, xrp, ltc, ada, eos, cid, did, urn, gitoid, swhid, ssh, lei, snowflake, uuid, ulid, bech32), or null when no recognizer beyond bare-encoding detection matched (the disproof fallback and the UTF-8 fallback both yield null).
role — the semantic role of the bits, drawn from the CLOSED enum {key, signature, digest, address, identifier}, or null when undetermined. role MUST be asserted only from the generic recognizer — see Principle: role from the generic recognizer below.
qualifiers — an object of independently-varying facets recovered by the recognizer: for example {"network": "mainnet"}, {"variant": "legacy"}, {"algorithm": "ed25519"}, {"version": 1, "codec": "dag-pb", "hash": "sha2-256"}, {"method": "ethr"}, {"nid": "isbn"}. An empty object {} when the recognizer determines none.
size_basis — either "decoded" or "utf8"; see Resolution A below. It is scheme-driven and MUST NOT be inferred from the alphabet or from the content's appearance.
size_bits — the value size in bits, always a whole multiple of 8, computed from the core only (never the input string, never a folded identity prefix); see Resolution A. size_bits is reporting-only and is NOT the >512-bit truncation basis — see the warning below.
parts — the ordered list of {"text", "bind"} objects, in reading order, that the normalization step cut the input into. bind is one of none, fold, or core; see Principle: bind at part granularity below. This replaces the earlier prefix + prefix_semantic overload.
entropy_type — a derived convenience field equal to scheme when scheme is non-null, otherwise encoding.

Resolution A — `size_bits` and `size_basis`¶

The characterization carries a single size measure, size_bits, always a whole multiple of 8, computed from the core — never the input string, never a folded identity prefix. It is defined by one of two branches selected by the explicit size_basis field:

Encoding cores (size_basis = "decoded") — the core is a serialization of binary under a declared alphabet. size_bits = (decoded byte length of the core) × 8. For the power-of-2-density alphabets (hex = 4, base32/bech32/crockford32 = 5, base64/base64url = 6 bits per character) this equals floor(core_char_count × bits_per_char / 8) × 8; base64 = padding is stripped before counting and any sub-byte final-group bits are zero and drop out. For the non-power-of-2 alphabets (base58, base36, decimal) an implementation MUST decode the core to its integer value and take its minimal byte length; it MUST NOT use the tokenizer's bits_per_char (a token-packing convention — base58 = 6 — that overstates true density, base58 ≈ 5.86).
Text cores (size_basis = "utf8") — the core is inherently text with no underlying binary: a DID method-specific-id, a URN namespace-specific string, or the UTF-8 fallback. The text is the value, so size_bits = (UTF-8 byte length of the core text) × 8. For the UTF-8 fallback this equals the original input's byte length.

size_basis is driven by scheme: did, urn, and the UTF-8 fallback are "utf8"; every other recognized scheme (and the disproof fallback) is "decoded". It MUST NOT be inferred from encoding (a DID method-specific-id and a real base64url value both declare base64url) nor from the content's appearance (a did:jwk method-specific-id is base64url-encoded JSON but is a text core, because entviz never decodes a DID method-specific-id).

size_bits measures the serialized value entviz renders, not the underlying cryptographic material. A CESR E (Blake3-256) primitive reports size_bits = 264 — 33 core bytes including the derivation-code alignment byte — not the 256 bits of the digest it carries. The semantic-material size, when known, is conveyed by qualifiers (for example algorithm), never by size_bits.

Warning — size_bits is reporting-only; it is NOT the truncation basis. The large-input (head/middle/tail) trigger MUST keep using the existing tokenization byte length (floor(len(core) × bits_per_char / 8)), unchanged. The two measures coincide for encoding cores but diverge for text cores (a 65–86-character text core would truncate under size_bits but not under the tokenization basis); re-pointing the trigger at size_bits would move the truncation boundary and break the golden corpus. Keep them distinct.

Resolution B — substring approximation and folded-prefix exclusion¶

Where an encoding core is a substring of a larger checksummed value (a base58check body whose version byte and checksum are split into separate none-bound parts), the decoded substring is not independently byte-aligned, so size_bits is approximate. This is accepted: size_bits feeds only the label and the coarse >512-bit threshold, neither of which needs bit-exact precision.

A folded identity prefix (a part with bind = "fold": a SWHID/gitoid scheme, did:<method>:, or urn:<nid>:) is not counted in size_bits: it binds the fingerprint but is not part of the rendered core.

Principle: role from the generic recognizer¶

role is asserted only where the format self-declares it through the generic recognizer that entviz already runs — never by decoding a namespace or method entviz does not otherwise interpret. SSH's generic recognizer decodes an algorithm field, so role = "key"; a CESR derivation code decodes to a primitive, so role is key/digest/signature accordingly; a blockchain address recognizer gives role = "address"; a gitoid/SWHID gives role = "digest". But a did:key is role = "identifier" (not key), a did:pkh carrying an Ethereum address is role = "identifier" (not address), and a urn:isbn is role = "identifier" (not a book) — entviz does no per-method/per-namespace decoding, so asserting a narrower role would require logic the spec excludes, and an implementation that special-cased it would diverge. When the recognizer determines no role (arbitrary text, a bare encoding), role is null. Entviz does not guess.

Note (non-normative): a recognized primitive outside the role enum is role = null, not a default. The CESR recognizer identifies some primitives that are not keying material, digests, or signatures — most notably a Dater (a CESR-encoded datetime, code 1AAG), and by extension any future counter/number/temporal primitive. Such a primitive is recognized only so it can be labeled correctly (a datetime label beats the bare-encoding raw fallback); it is not a comparison target the way a key or signature is — a datetime is low-entropy and directly human-readable. It therefore carries role = null, and an implementation MUST NOT let it fall through to a key (or any other) default. This is the CESR counterpart of the general rule above: the recognizer fired, but it determined no enum role. (Indexed signatures — the CESR Indexer table, code A/B/C/… at their fixed sizes — are ordinary signatures and take role = "signature" like any other …sig primitive.)

Principle: bind at part granularity¶

bind is a property of a part at the recognizer's granularity, not of a character's abstract role. The z multibase selector is bind = "none" as a standalone bare value but bind = "core" inside a did:key method-specific-id — no contradiction, they are different parts of different inputs. Likewise CIDv0's Qm multihash header is none as constant-format framing. The three bind values are: core (in the hashed core text, rendered in cells — including in-core discriminators such as a CESR derivation code or an LEI LOU code); fold (identity-bearing but a different alphabet/framing from the body — kept as a prefix, hashed as prefix ‖ core, shown in the label, not in cells — a SWHID/gitoid object-type, did:<method>:, urn:<nid>:); and none (carries no identity bits and binds nothing — presentation framing such as 0x, a multibase selector, or a base58check checksum, shown when it aids recognition, or dropped entirely when it is a free annotation such as an SSH comment or a DID URL tail).

The fingerprint¶

The fingerprint is the SHA-512 hash of the canonical normalized text of the entropy. Implementations MUST hash the UTF-8 bytes of the normalized core string (the characters as written in the core's declared alphabet, after case/punctuation normalization) — not the value's decoded raw bytes. (See Why the fingerprint hashes text, not decoded bytes below.) Because the fingerprint is produced by a cryptographic hash, it exhibits a strong avalanche effect: a single-bit change anywhere in the entropy changes roughly half the bits of the fingerprint. This is what lets entviz amplify differences even when the entropy itself is chosen rather than generated (for example, a UUID, a raw hex string, or a base64url blob), and what lets entviz handle inputs of any size. The fingerprint is tokenized exactly as the entropy is — into 24-bit chunks of base64url text. A token of the fingerprint is called an ftok. Because SHA-512 is always 512 bits (64 bytes), the fingerprint always yields exactly 22 ftoks: 21 full ftoks of 24 bits each, plus one partial ftok formed from the trailing byte and extended to 24 bits as described below.

Compute the fingerprint as the SHA-512 hash of the UTF-8 bytes of the canonical normalized core text (the normalized core string itself; implementations MUST NOT decode the core to its underlying raw bytes before hashing — see Why the fingerprint hashes text, not decoded bytes). When the parser marks the prefix as identity-bearing but not in the core (the prefix-fold case below — e.g. a SWHID/gitoid object-type), the hash input is prefix ‖ core text; otherwise it is the core text alone. Serialize the 64-byte fingerprint to base64url text and split it into ftoks using exactly the same tokenization rule applied to the entropy: each ftok represents 3 bytes (24 bits) of the fingerprint. This yields 21 full ftoks plus one partial ftok formed from the trailing byte; extend the partial ftok to 24 bits by repeating its low-order bits, exactly as for a partial token. The fingerprint therefore always provides 22 ftoks. Assign each ftok an ftok index between 0 and 21, inclusive. The fingerprint is never displayed as text.

Why the fingerprint hashes text, not decoded bytes¶

The fingerprint hashes the canonical normalized text of the entropy — the characters of the core string after case/punctuation normalization, as UTF-8 — and deliberately does not decode the value to its underlying raw bytes first. A consequence is that encoding-invariance is not a goal: the same 32 bytes written as hex and as base64 produce different entvizes, and the same CID in two multibases renders differently. This is intentional. The identity entviz protects is "this value in its canonical textual form," and the choice is justified on three grounds:

It fails safe. The text channel is verbatim by design — it always shows the characters the user actually holds — so it can never be made encoding-invariant without destroying that fidelity. If the fingerprint hashed decoded bytes, two encodings of the same value would share a gestalt while showing different cell text: the channels would disagree about identity. Text-hashing keeps every channel in agreement. Its worst case is a false negative (two encodings of one value look different, so a reader fails to notice they match) — a fail-safe outcome that prompts further checking, never the acceptance of a wrong value.
It removes a collision surface. Decoding before hashing inserts a parser ahead of the hash, and several text encodings are malleable: distinct base64/base58/bech32/base32 strings can decode to identical bytes (non-canonical padding, trailing bits, alternate alphabets). Under byte-hashing those become different text, identical gestalt — collisions an attacker can manufacture. Hashing the (normalized) text gives them no such lever. See threat-model.md.
It is well-defined and reproducible across implementations. Hashing UTF-8 text requires only case/punctuation normalization, which is simple and identical across languages; byte-decoding would require every certified implementation to decode every alphabet byte-for-byte identically (base58 leading-zero handling, bech32 5-bit unpacking, base64 trailing-bit policy, decimal width). Any divergence would make the same input fingerprint differently across implementations — a conformance and security failure. Some inputs (the arbitrary-text fallback) have no "raw bytes" other than their UTF-8 at all.

Note that decoding is used elsewhere — to compute a token's 24-bit quant from its characters, and to measure the core's byte length for the >512-bit truncation threshold — but never as the input to the fingerprint hash. The identity-bearing parts of a typed value (a CESR derivation code, a SWHID object-type) are bound by being present in the hashed text (in the core, or folded in as prefix ‖ core), not by decoding. (See this.i:h4shtext.)

Tokenizing the entropy¶

Split the entropy string into tokens. Each token represents 3 bytes (24 bits) of binary entropy, or as close to that amount as possible while respecting whole-character boundaries of the underlying encoding. The token length (chars per token) is determined by the alphabet the parser declared for the input — not by inspecting the content of the core or by string-matching the type name. (Content inspection is unsound: a base32 value, for instance, can use only characters from the hex alphabet and would be indistinguishable from hex on inspection. Each parser knows which alphabet its core uses and must declare it.) The alphabets in this spec are:

hex (4 bits per char): token length = 6 characters (= 24 bits). Used by raw hex inputs, hex multihash, UUID, Ethereum addresses, and git-hash prefix schemes (SWHID swh:1:<type>:<40-hex> and gitoid gitoid:<obj>:<algo>:<hex>, whose scheme is split off as a non-entropy prefix exactly like Ethereum's 0x).
base58 (6 bits per char in this spec's tokenization; note that base58's true information density is ~5.86 bits/char, but this spec treats base58 chars as 6-bit values for tokenization purposes, matching the reference implementation): token length = 4 characters (= 24 bits). Used by Bitcoin legacy, Ripple, Litecoin legacy, Cardano Byron, and IPFS CID v0.
base36 (6 bits per char for token alignment, matching the same convention base58 uses; true information density is ~5.17 bits/char): token length = 4 characters (= 24 bits). The alphabet is 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ. Used by GLEIF LEIs (ISO 17442; 20 chars total = 5 tokens; structure is 4-char LOU + 00 reserved + 12-char entity body + 2-char MOD 97-10 checksum).
base64 and base64url (6 bits per char): token length = 4 characters (= 24 bits). Used by CESR, SSH keys, EOS addresses, and the unknown-input fallback (the input is re-encoded as base64url before tokenization); base64url is also the uniform tokenization alphabet for DID method-specific-ids and URN namespace-specific strings (see Decentralized Identifiers and Uniform Resource Names in the normalization step).
bech32 (5 bits per char per BIP-173): token length = 4 characters (= 20 bits, then extended to 24 by the bit-extension rule). The alphabet is qpzry9x8gf2tvdw0s3jn54khce6mua7l, which intentionally excludes 1, b, i, o to reduce ambiguity (and 1 doubles as the bech32 separator). 4 chars per token is chosen over the alternate 5-chars-per-token (= 25 bits) because the quant is defined as a 24-bit value; a 25-bit token would overshoot the budget. Used by Bitcoin SegWit (bc1... / tb1...), Litecoin (ltc1...), Cardano Shelley (addr1... / stake1...), Bitcoin Cash CashAddr (which is commonly called "base32" but actually uses the bech32 character set), and Cosmos-SDK chains (cosmos1..., osmo1..., juno1..., …) via a generic parser that validates the BIP-173/BIP-350 checksum and names the chain from its human-readable prefix.
base32 (5 bits per char per RFC 4648): same 4-chars-per-token tokenization as bech32, but with a different character set: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567. Excludes 0/1/8/9 (and is conventionally case-insensitive with most uses being all-uppercase). Used by Stellar (G... accounts and M... muxed accounts) and IPFS CID v1 (b...).
crockford32 (5 bits per char): same 4-chars-per-token tokenization as bech32 and base32 (= 20 bits per token, extended to 24 by the bit-extension rule). The canonical alphabet is 0123456789ABCDEFGHJKMNPQRSTVWXYZ — excludes I, L, O, U for visual disambiguation. The spec also accepts I, L (→ 1) and O (→ 0) as case-insensitive input aliases; U is not an alias and remains forbidden. Used by ULIDs (26 chars total).
decimal (4 bits per char for token alignment; true information density is ~3.32 bits/char): token length = 6 characters (= 24 bits). The alphabet is 0123456789. Used by Twitter/Discord/Mastodon "snowflake" IDs (17–20 decimal digits encoding a 64-bit integer with a 42-bit timestamp / 10-bit machine / 12-bit sequence layout). Like base36 and base58, this is a non-power-of-2 alphabet whose bits_per_char field controls token packing rather than true entropy density; the slight overshoot from 3.32 to 4 means a few low-order quant bits are zero-padding instead of entropy. Decimal is not in the disproof-fallback set: a pure-digit string that doesn't match a specific decimal-format parser (snowflake) falls through to hex (whose alphabet contains all decimal digits) and is visualized as hex. Snowflake detection MUST be deterministic and MUST NOT consult the wall clock (doing so would violate the determinism requirement above — the same input could classify as snowflake at one time and hex at another, changing the output): classify a 17–20-digit decimal as a snowflake iff its value fits in a signed 64-bit integer (the sign bit is clear, i.e. the value is < 2⁶³). With the sign bit clear, the 41-bit timestamp field is structurally bounded to a date in [2015, ~2084], so plausibility is a property of the bit pattern rather than of the current date; a value with the sign bit set (an implied timestamp past ~2084, or a ≥ 64-bit overflow) is not a snowflake and is visualized as hex. (A digit string that is structurally a snowflake but is in fact some other 17–20-digit number renders identically and deterministically either way; only the type label and tokenization differ, never the comparison.)

The general rule is: token length = floor(24 / bits_per_char). For bits_per_char ∈ {4, 6} this divides evenly (6 chars and 4 chars respectively); for bits_per_char = 5 it gives 4 chars (= 20 bits) which then extend to 24 via the rule below. Call the number of tokens the token count. Assign to each token a token index between 0 and token count - 1, inclusive. If the entropy is greater than 512 bits, do not tokenize the whole input; instead apply the large-input handling rule below to select a head group (the first 8 tokens), a middle group (4 tokens taken from a second, domain-separated fingerprint), and a tail group (the last 8 tokens). In all cases, token count will be at most 22.

split string into tokens

Figure 6. Splitting the normalized input into 24-bit tokens.

Also, if a token represents less than 24 bits of entropy, extend the bits of the token by repeating low-order bits until a full 24 bits is used. Call the 24-bit value associated with the token its quant.

Specifically, given an integer value v with actual_bits bits of information (where 0 < actual_bits < 24), the extension proceeds by repeated doubling of the current value, taking each pad chunk from the low-order bits of the current (already extended) value:

quant = v
while actual_bits < 24:
    shift = min(actual_bits, 24 - actual_bits)
    pad = quant & ((1 << shift) - 1)        # low-order `shift` bits of quant
    quant = (quant << shift) | pad
    actual_bits += shift

Worked examples:

8-bit value 0xAB (binary 10101011): iteration 1 (shift=8) → 0xABAB; iteration 2 (shift=8) → 0xABABAB. Final quant: 0xABABAB.
4-bit value 0x5 (binary 0101): iteration 1 (shift=4) → 0x55; iteration 2 (shift=8) → 0x5555; iteration 3 (shift=8) → 0x555555. Final quant: 0x555555.
12-bit value 0xABC: iteration 1 (shift=12) → 0xABCABC. Final quant: 0xABCABC (one iteration suffices when actual_bits doubles cleanly to 24).

The shift size at each step is min(actual_bits, 24 - actual_bits), so the algorithm terminates in at most a few iterations regardless of the starting size.

Large-input handling¶

When the normalized core is longer than the budget for a 22-cell entviz (in concrete byte terms: when the core's underlying byte length, computed by decoding the core under its declared alphabet, exceeds 64 bytes), tokenize the input in three groups:

Head group (H = 8 tokens). Tokenize the first H · token_len = 8 · floor(24 / bits_per_char) characters of the core exactly as you would tokenize a short input — i.e. 8 whole tokens. Because token_len = floor(24 / bits_per_char), the head covers 192 bits for 4-bit (hex → 8×6 chars) and 6-bit (base64/base64url → 8×4 chars) alphabets, and 160 bits for the 5-bit alphabets (bech32/base32/crockford32 → 8×4 chars = 32 chars × 5 bits), where the token floors to 4 characters and the low 4 bits of each notional 24-bit slot are simply not consumed by the head. This is intentional: the head is a recognition anchor sized in whole tokens, not a fixed bit count. These are token indices 0..7.
Middle group (M = 4 tokens). Four tokens rendering a second, domain-separated fingerprint of the whole input — second = SHA-512(DOMAIN_TAG ‖ core) where DOMAIN_TAG = "entviz/fingerprint-middle/v6\0" (the trailing NUL byte included). Implementations MUST use this exact byte string as DOMAIN_TAG and MUST NOT change it to track the spec-document version — changing it alters the middle cells of every >512-bit entviz and breaks comparison against any previously rendered copy. second is computed for every input (one extra SHA-512 over the whole input), not only for large ones: on inputs of any size it drives the two color-bar markers (see the color-bar step), and on >512-bit inputs it additionally drives these four middle cells. The DOMAIN_TAG and its construction-version v6 are unchanged in v9; only new consumers (the markers) and the middle-cell encoding (Crockford base32, above) are new. Token i ∈ [0..3] renders the 3-byte group second[3i .. 3i+2] as 5 lowercase Crockford base32 characters (a 24-bit big-endian value; see below). These are token indices 8..11. Crockford base32 is used regardless of the input's alphabet, so each cell always carries a full 24 bits and the rendering is injective (32⁵ = 2²⁵ ≥ 2²⁴). Because they carry no input entropy in their background, their nucleus background is set to the entviz background color (so the nucleus reads as neutral/hollow, visually distinct from the entropy-colored head/tail), and each is framed by a 1-px border flush with the nucleus edge (the stroke's outer edge coincides with the nucleus boundary) — colored gold (#e7be00) on a white-background entviz, white (#ffffff) otherwise (the same contrast rule as the blank-cell map fill) — to mark the four cells as the fingerprint group. Their surround stays driven by the primary fingerprint and therefore still avalanches. Note (non-normative). The v6 in DOMAIN_TAG is the version of this fingerprint-middle construction (introduced in v6, unchanged since) — not the spec-document version. It stays v6 to acknowledge the spec version that introduced it, even though the spec has since moved beyond that version; the normative requirement to keep it constant is stated above. Change it only on a breaking change to this derivation.
Tail group (T = 8 tokens). Tokenize the last T · token_len = 8 · floor(24 / bits_per_char) characters of the core in the same way — 8 whole tokens, covering 192 bits for 4-/6-bit alphabets and 160 bits for 5-bit alphabets, exactly as for the head. These are token indices 12..19.

These 20 tokens carry contiguous token indices 0..19 in head → middle → tail order, preserving reading order. Cell placement uses the same blank-shift rule as short inputs (see the blank-cell step below): choose a grid with a few spare cells for the 20 tokens (at target_ar = 1.0 this is a 4×6 grid → 4 blank cells), then insert blanks at the median ftok's position and the ASCII-sort endpoints. The blanks therefore vary with the fingerprint and carry the same CRC-like signal they do for short inputs. There are no fixed separator blanks: head/middle/tail are a logical token ordering, not fixed cell positions. (v5 instead placed two fixed separator blanks at cells 8 and 13 and bypassed the shift; v6 drops that special case so a large input's blank layout discriminates inputs exactly as a short input's does. Because the fingerprint cells are individually marked — neutral bg + gold/white frame — and the +hash marker already signals a non-linear read, the explicit separators are no longer needed.)

Accepted tradeoff (adversarial-2026-06-02 F5). Because the shift is fingerprint-driven, a blank can land inside the head (token indices 0–7) or tail (12–19) run rather than only at the head/middle/tail boundaries, so the head or tail is not always a visually contiguous block. This is accepted: reading order is preserved (correctness is unaffected — a user reassembles the head by reading past the blank), and the per-input CRC-like blank-position signal that the shift restores for large inputs is judged worth more than head/tail contiguity. The head/tail cells remain individually identifiable (entropy-colored, unframed) versus the framed neutral fingerprint cells, so the recognition cost is a mild ergonomic one, not an ambiguity.

Choice of H, T, M: the allocation is fixed at H=8, T=8, M=4. Rationale: it keeps the head and tail — the real-entropy anchors a user recognizes and can verify against a known value — visibly dominant; and it gives the fingerprint readout 4 cells, enough to be a legible cross-check. (Recognition value drops off after the first/last cell or two, so the head/tail need not be longer; and the middle's avalanche guarantee — see below — is satisfied by even a single fingerprint cell, so the middle need not be larger.)

Head/tail are anchors, not a representative sample (scale caveat). The recognition rationale above is strongest for inputs only modestly over 512 bits, where the head and tail together still cover a meaningful share of the input and a reader may genuinely recognize a known prefix (a key, an address). As the input grows, that rationale weakens monotonically: for a truly massive input — a multi-gigabyte genome, say — the 384 displayed head/tail bits (192 each for 4-/6-bit alphabets; 160 each for 5-bit alphabets) are a vanishing fraction of the data and are unlikely to be visually recognizable at all, and the literal ends of large files are frequently headers, padding, or boilerplate rather than distinguishing bytes. The allocation deliberately does not adapt to size: a size-varying split would trade away the "trivial to implement correctly" property and the fixed 20-token / 4×6-grid invariant for little gain. The reason the fixed split remains sound at any size is that head and tail are never claimed to be representative of the input — they are a verification convenience (spot-check the ends against a known-good copy), not a summary of the content — and the binding of the input's bulk does not rest on them. Everything between the head and tail is bound at full SHA-512 avalanche into every fingerprint-driven channel (the surround pattern on all 20 cells, the color bar, the ellipse, the blank-cell positions, the quartile marks, and the background color) and, in the text channel itself, into the 4 middle cells — a 96-bit injective readout of the domain-separated second digest that avalanches on any input change even in a text-only / read-aloud comparison. For a large input, therefore, the fingerprint-driven channels carry essentially all of the comparison signal; the head/tail text should be read as anchors, not as a sample of the content, and the larger the input the more this is so. Counterintuitively, the input's middle — not its ends — is the best-protected part of the picture, because it is the part every high-bandwidth channel depends on. (See this.i:v6htscal.)

Why the middle is a fingerprint, not the body (v6). v5 filled the middle with body slices sampled at fingerprint-derived offsets. That made the middle text differ between two inputs only probabilistically: for a low-entropy or structured body, two different inputs could render identical middle cells (the sampled windows land on equal bytes), so a screen-reader / read-aloud comparison — which has no access to the gestalt channels — could miss a real difference. v6 instead fills the middle from a fingerprint, so it avalanches by construction: any one-bit change to the input flips ~half the digest, so the middle text differs on any input change. The head and tail remain real input entropy (recognition + verification). Two refinements over the first v6 draft (adversarial review F1/F2) make this guarantee literally true for every alphabet and independent of the gestalt:

Crockford base32, not the input's alphabet (F1). The first v6 draft rendered the middle in the input's own alphabet, which on 5-bit alphabets (bech32/base32/crockford32) displayed only the top 20 of each cell's 24 bits — dropping a nibble per cell, so the rendering was not injective and two inputs differing only in those dropped bits showed identical middle text; and on non-power-of-2 alphabets reachable by an oversized pasted address (a 200-char base58/base36 string does take this path) the mod-fallback aliased group values, losing far more. Rendering 24 bits as 5 Crockford base32 characters is injective for every input (32⁵ = 2²⁵ ≥ 2²⁴), so "the middle avalanches on any input change" holds universally. The middle cells are already marked as a hash readout (neutral bg + framed + the +hash label), so a fixed digest alphabet rather than the input alphabet is appropriate — it signals "this is a digest, not your data." The Crockford alphabet (and, on a hex input, the size difference — 6-char hex head/tail flanking 5-char Crockford middles) is the candid "this is a digest, not your data" cue: v6 used hex, whose cue was weakest precisely on a hex input (all glyphs hex; only the neutral bg + gold/white frame distinguished the middle), whereas Crockford differs from a hex input by both alphabet and rendered size. (See this.i:cr0ckmid.)
A separate, domain-separated digest (F2). The first v6 draft read the displayed middle from the primary fingerprint (digest bytes 24–35) — the same digest whose bytes drive the color bar and the middle cells' own surround — so matching the displayed middle also matched those gestalt channels for free; the "match the middle and independently match the gestalt" framing was overstated. v6 derives the middle from SHA-512(DOMAIN_TAG ‖ core), a digest that domain separation guarantees is uncorrelated with the primary fingerprint. The displayed middle is therefore independent evidence of the primary-fingerprint gestalt.

With both refinements, an attacker who matches the head, the tail, and the 4 displayed fingerprint tokens must produce a 96-bit partial preimage of the second digest (4 cells × 24 injective bits, ≈2⁹⁶) and — independently — match the primary-fingerprint gestalt channels. The 96-bit figure is now exact and uniform across alphabets (it was 80 bits, non-injective, for 5-bit alphabets in the first draft). The losslessness promise is unaffected: the text channel is only promised lossless for ≤512-bit inputs, and that path does not use this rule at all.

Middle cell text rendering. Each middle token reads its 3 bytes from the second digest, second = SHA-512(DOMAIN_TAG ‖ core) with DOMAIN_TAG = "entviz/fingerprint-middle/v6\0", as a 24-bit big-endian value second[3i] · 2¹⁶ + second[3i+1] · 2⁸ + second[3i+2], and renders it as exactly 5 lowercase Crockford base32 characters — high-order zero-padded, over the Crockford alphabet 0123456789abcdefghjkmnpqrstvwxyz (which excludes i, l, o, u for visual disambiguation) — independent of the input's alphabet (the middle is a digest readout, not lossless input, so it need not share the input's character set). The encoding is injective because 32⁵ = 2²⁵ ≥ 2²⁴ (the leading character never exceeds symbol value 15, so it carries the high 4 bits of the 24-bit value losslessly).

Why 5 Crockford characters (optimality). 5 characters is the floor for "injective on 24 bits, single letter-case, homoglyph-clean." 4 characters would require a ≥ 64-symbol (6-bit) alphabet, which no single-case homoglyph-clean alphabet reaches (digits + one letter case is 36 symbols at most); reaching 64 forces mixed case or punctuation (-/_), which re-imports the read-aloud "cap" syllable cost and a homoglyph surface — so base64url is a wash on syllables and worse on safety/signal. base58 cannot reach 24 bits in 4 characters either, since 58⁴ = 11,316,496 < 2²⁴, and a 4-char base58 rendering would re-open exactly the F1 mod-fallback aliasing this construction was built to kill. A 32-symbol disambiguated alphabet at 5 characters is therefore the provable optimum, and Crockford's exclusion of i/l/o/u makes it strictly safer than hex (it kills the 0/o, 1/l, 1/i confusions). (See this.i:cr0ckmid.)

Because these cells are 5 characters, the cell-text rendered-size rule (round(reference × max(0.75, min(1.0, 4/token_chars))), applied per cell on the middle's own character count, not the entviz's input alphabet) yields 4/5 = 0.80× automatically (10 pt at the 12 pt reference) — slightly larger and roomier than v6's clamped-to-floor 0.75× for 6-char hex. On a hex input the head/tail are 6-character cells rendered at 0.75×, so the middle cells differ from them by both alphabet and size; on a 5-/6-bit-alphabet input the head/tail render at full size and the middle cells are correctly smaller (otherwise the 5 glyphs overflow the nucleus). The rendering is injective on the 12 displayed digest bytes for every input.

Used ftoks follow the same one-to-one rule as for short inputs: the used ftok at token index i drives the token with that token index (per the used-ftoks step below), regardless of which cell that token lands in after the blank shift. The 20 large-input tokens carry token indices 0..19 and so use ftoks 0..19; ftoks 20 and 21 are unused. The fingerprint is still SHA-512 over the entire normalized entropy.

A T1+T5+T6 attacker (see threat-model.md) who could previously collide only the head and tail must now additionally reproduce the 4 displayed fingerprint tokens — i.e. 96 specific bits of the second, domain-separated SHA-512 digest (an injective partial preimage, ≈2⁹⁶, uniform across all input alphabets) — on top of matching the head, tail, and the primary-fingerprint gestalt channels. Because the second digest is domain-separated from the primary, those two are independent requirements. This closes the head-and-tail-only collision pattern that adversarial review finding F5 demonstrated against v4. (v5 closed the same gap by forcing matching body bytes at fingerprint-selected offsets; v6 forces matching fingerprint bytes instead, which is comparable in cost and additionally guarantees the middle text avalanches, injectively, for read-aloud comparison.)

Grid and geometry¶

The complete entropy is visualized as a rectangular grid consisting of a certain number of cells. Call this number of cells the cell count. Each token is rendered into one cell in the grid, and if the rectangle of the grid has more cells than token count, one or more cells will be empty.

Grids of a single row or a single column are invalid: the minimum grid is 2 columns by 2 rows. Each cell touches its neighbors directly and has an aspect ratio of 3:2 (= cell_width : cell_height = 3.75·font_size_px : 2.5·font_size_px). Given a target aspect ratio for the entviz (or, if none is given, using 1:1 as the target), the implementation MUST choose the grid layout that produces an overall rectangle with an aspect ratio closest to the target, without being less than the target when the ratios are written as fractions, and with at least 2 columns and 2 rows.

Using more entropy than the example we've been building, just to show how this works in more complicated situations: 256 bits of entropy is 44 base-64 characters or 11 tokens. 11 tokens can be rendered as a grid with 6 columns and 2 rows (rounding token count to 12; aspect ratio (6·3):(2·2) = 18:4 = 9:2), 4 columns and 3 rows (12:6 = 2:1), 3 columns and 4 rows (9:8), or 2 columns and 6 rows (6:12 = 1:2). Given a target aspect ratio of 1:1, the grid layout with an aspect ratio closest to 1:1 but not less than 1:1 is the one with 3 columns and 4 rows.

The following pseudocode is non-normative; it illustrates the rule above and mirrors the reference choose_grid. A cell is 3:2, so a cols × rows grid has aspect ratio (cols · 3) / (rows · 2):
```
# tightest[rows] = the fewest cols that hold token_count in >= 2 rows
tightest = {}
for cols in 2 .. token_count:
    rows = ceil(token_count / cols)
    if rows < 2: continue                   # a single row is invalid
    if rows not in tightest or cols < tightest[rows]:
        tightest[rows] = cols               # keep only the tight layout (fewest blanks)

if tightest is empty:                       # token_count <= 2: no natural 2x2+ layout
    return grid(cols=2, rows=2)             # force 2x2; the spare cells become blanks

candidates = [(cols, rows, (cols*3)/(rows*2)) for (rows, cols) in tightest]
at_or_above = [g for g in candidates if g.ar >= target_ar]
if at_or_above:                             # closest to target without dropping below it
    return argmin(at_or_above, key = g.ar - target_ar)
return argmax(candidates, key = g.ar)       # target unreachable: fall back to the widest
```
The "tightest cols per row count" dedup is what makes the choice unambiguous: without it a target above every achievable ratio would pick a wasteful layout (e.g. 10×2 for 11 tokens — 9 blanks) instead of the tight 6×2 (1 blank). The final argmax fallback is the one case where the result is below the target — chosen only when no 2×2+ layout reaches it.

Figure 7. Choosing the grid whose aspect ratio is closest to the target.
Moving from left to right and top to bottom — which is how ASCII text should read if it wraps — number the cells from 0 to N, and call the number associated with each cell its cell index. Assign a cell index to each token. Unless changed, the cell index of a token will equal its token index.

Figure 8. Cell indexing, read left-to-right and top-to-bottom.
Choose a fixed-width font such as Courier, and an appropriate font size for reading. In our example, we will use 12 point, but the algorithm will work at any reasonable font size. The size of the font determines the scale of the entviz.

Font-family fallback chain and homoglyph risk. Implementations SHOULD render every text element (cell text, label strips, color-bar letters) with the font-family fallback chain "JetBrains Mono", "Menlo", "Consolas", "DejaVu Sans Mono", "Liberation Mono", "Roboto Mono", "Noto Sans Mono", monospace. The chain routes each major platform to a good preinstalled monospace — Menlo on macOS and iOS, Consolas on Windows, DejaVu Sans Mono / Liberation Mono on Linux, and Roboto Mono / Noto Sans Mono on Android and ChromeOS — so two viewers on different platforms see glyph metrics that are close (though not identical). JetBrains Mono leads the chain: it is not preinstalled anywhere, but it is widely installed by developers and has the strongest homoglyph disambiguation, so viewers who have it get the best rendering and everyone else falls through to their platform's native monospace. The final monospace fallback ensures something readable is always chosen. (Truly identical glyphs across all platforms would require embedding the font in the SVG; that is intentionally out of scope — the font-independent gestalt channels carry visual comparison, and the text channel only needs to be individually readable and unambiguous.) Implementations MUST NOT use a bare monospace declaration without a fallback chain: the user's OS-default monospace varies enough (Menlo on macOS, Consolas on Windows, DejaVu/Liberation/Ubuntu on Linux) that the same entviz renders with materially different glyph widths and homoglyph behavior across viewers. The homoglyph risk is real and security-relevant: characters like 0/O, 1/I/l/|, 5/S, and -/_ are visually-distinguishable in some monospace fonts and visually-confusable in others, and an entviz is only as trustworthy as the user's ability to distinguish characters within it.
Convert the point size of the font into pixels and call this value font_size_px. Use the formula: pixels = (points · DPI) / 72. Most devices use 96 DPI, although other values are possible. At 96 DPI, a 12-point font = 16 pixels. This is the em-size of the font; the actual glyph bounding box (ascender top to descender bottom) is typically slightly larger, which the cell geometry below accounts for.

The chosen point size is called the reference font size. Throughout this spec, all geometry — nucleus dimensions, cell dimensions, grid dimensions, box dimensions, GM, bounding rect, color bar width — is derived from font_size_px. The reference is independent of the size actually applied to any specific piece of rendered text; some text elements are drawn at a smaller rendered font size (see the cell rendering algorithm below). The rendered font size never affects geometry.
Compute the geometry, all anchored on font_size_px:
- nucleus width = 3·font_size_px (so the nucleus is wide enough to hold 4 monospace glyphs at full reference size with horizontal margin)
- nucleus height = 1.25·font_size_px (the 25% vertical extra accommodates the glyph descenders of typical monospace fonts, whose bounding box extends below the em-box by ~20–25%)
- box width = nucleus_width / 8 = 0.375·font_size_px. Derived from the horizontal tiling: 10 top-row boxes span nucleus_width + 2·box_width, so 10·box_width = nucleus_width + 2·box_width, i.e. 8·box_width = nucleus_width.
- box height = nucleus_height / 2 = 0.625·font_size_px. Derived from the vertical tiling: 2 side-column boxes stack to nucleus_height.
- cell height = nucleus_height + 2·box_height = 4·box_height = 2.5·font_size_px
- cell width = nucleus_width + 2·box_width = 10·box_width = 3.75·font_size_px. Cell aspect is 3:2.
- grid width = cell_width · cols
- grid height = cell_height · rows
- GM (grid margin) = box_height / 2
- bar width (color bar width) = 2·box_height = 1.25·font_size_px. The color bar is twice as wide as v5's box_height so its per-band color letters render legibly.
At 96 DPI with a 12-point font: font_size_px = 16, nucleus_width = 48, nucleus_height = 20, box_width = 6, box_height = 10, cell_width = 60, cell_height = 40, GM = 5, bar_width = 20. Surround boxes are 6×10 (no longer square); the 3:5 width:height ratio of a box is not a fundamental constant — both dimensions are derived independently from their tiling constraints. If a future revision changes nucleus_height or nucleus_width independently, the box dimensions follow.

Figure 9. Cell and 24-box surround geometry, all derived from font_size_px.
Allocate the grid rect, a rectangle of dimensions grid width x grid height that contains only the cells of the grid. We will assume that the top left corner of the grid rect is at position (0, 0) on the canvas for the purpose of the cell calculations, but its actual position is determined by the bounding rect below.
Allocate the bounding rect, the outermost rectangle of the entviz (the SVG canvas). Inside it, the frame rect — inset by a quiet margin of MARGIN = 1 user unit on every side — holds the gray frame, the color bar at its left, the grid rect, and the label strips (see the label-strip step below). The frame-rect (frame-to-frame) dimensions are:
- frame_width = 1 + bar_width + 1 + GM + grid_width + GM + 1
- frame_height = 1 + GM + top_label_height + grid_height + bottom_label_height + GM + 1
and the bounding rect (the canvas) adds the quiet margin on all four sides:
- width = frame_width + 2·MARGIN
- height = frame_height + 2·MARGIN
where top_label_height = nucleus_height (always present) and bottom_label_height = nucleus_height when the parsed result has a suffix or a user note is supplied (see the label-strip step), else 0 (so the frame_height reduces to 1 + GM + nucleus_height + grid_height + GM + 1 when no bottom strip is needed — the GM below the grid is then just the bottom margin). Read the width left to right: a MARGIN-wide transparent quiet margin; then a 1-pixel gray left border; then the color bar (width = bar width = 2·box_height = 4·GM); then a 1-pixel gray interior separator between the color bar and the grid area; then a GM margin; then the grid rect; then a GM margin; then a 1-pixel gray right border; then a MARGIN-wide transparent quiet margin. Read the height top to bottom: a MARGIN-wide transparent quiet margin; then a 1-pixel gray top border; then a GM margin; then the top label strip (which abuts the grid — no GM between them); then the grid rect; then (if present) the bottom label strip (also abutting the grid); then a GM margin; then a 1-pixel gray bottom border; then a MARGIN-wide transparent quiet margin. The GM sits only on the outer (border) side of each label strip, so the label text is the same distance from the grid as nucleus text is from a nucleus edge.

Fill only the frame rect with white — a rectangle at (MARGIN, MARGIN) of size frame_width × frame_height, whose edges coincide with the frame's outer edges. The MARGIN-wide ring outside the frame is left unpainted, so it renders transparent and no fill or ink of any kind reaches the canvas edge. Draw a 1-pixel #808080 line along all four edges of the frame rect (i.e. inset by MARGIN from the canvas edge), and a 1-pixel #808080 line down the column between the color bar and the grid area (forming the color bar's right edge). Each border line is centered on a half-pixel coordinate one MARGIN inside the canvas edge — x = MARGIN + 0.5 for the left border, x = width − MARGIN − 0.5 for the right border, y = MARGIN + 0.5 / height − MARGIN − 0.5 for the top / bottom borders, and x = MARGIN + 1 + bar_width + 0.5 for the interior separator — and rendered with shape-rendering="crispEdges" so a 1-px stroke covers exactly one pixel column or row without antialiasing halos; the four outer lines share the same span endpoints (MARGIN to width − MARGIN horizontally, MARGIN to height − MARGIN vertically) so the corner pixels are painted by both adjacent borders. Soft gray rather than pure black avoids visual competition with the black edge color in the palette. The color bar is the inset rectangle bounded on its left by the frame's left gray border and on its right by the interior separator; its drawing region runs from y = MARGIN + 1 (just below the top gray border) to y = height − MARGIN − 1 (just above the bottom gray border). Position the grid rect with its top-left corner at (MARGIN + 1 + bar_width + 1 + GM, MARGIN + 1 + GM + top_label_height) within the bounding rect.

Frame invariant (v12). The gray frame MUST render as a complete, solid, closed rectangle whose outer edge sits exactly MARGIN user units from the canvas edge on every side, and it MUST NOT touch or cross the canvas boundary at any render scale. The frame is the load-bearing anchor for raster-based comparison/localization (a matcher finds the solid #808080 rectangle in a screenshot or photo, then derives scale and geometry from it), so it must survive being displayed at any size. Through v11 the four lines were centered on the canvas boundary itself, so the outer half of each stroke lay on the edge; when the SVG was scaled to a fractional pixel size (e.g. rendered at a non-default font size in a browser), the default overflow: hidden shaved that outer half and the frame rendered thin or missing along the bottom/right edges — degrading or defeating frame detection. MARGIN = 1 user unit is the smallest integer inset that is provably un-clippable at any scale ≥ 1, because the fractional overflow clip removes strictly less than one device pixel (i.e. less than one user unit) from each edge.

Use the grid rect as a clipping region for the ellipse overlay (see below). The color bar and gray border lines are drawn outside the grid rect and need no clipping. Draw all clipped content first; draw the gray border lines last so the borders are never overwritten.

Fingerprint-derived structure¶

Define the used ftoks as the first token count ftoks of the fingerprint, taken in ftok index order. The used ftoks map one-to-one to tokens: the used ftok at index i corresponds to the token with token index i. (Because token count is at most 22 and the fingerprint provides 22 ftoks, there are always enough.) Any ftoks beyond token count are not used. From here on, all fingerprint-based calculations operate on the used ftoks. The 24-bit value of an ftok is its quant, defined exactly as for a token.
Sort the used ftoks in ASCII order — case-sensitive bytewise (lexicographic) comparison of the ftok's base64url text. Since base64url characters are all in the ASCII range, this is equivalent to UTF-8 bytewise comparison. Shorter strings sort before longer strings that share their full content as a prefix (standard lexicographic ordering; partial ftoks therefore sort below full ftoks that begin with the same chars). Use a secondary sort by ftok index, in case the same ftok appears in more than one place. Identify the ftok at the median position of the ASCII-sorted list — the element at 0-based index ⌊(token count − 1) / 2⌋. (For an even token count this is the first ftok of the middle pair.) Call this the median ftok.
Also sort the used ftoks by the ASCII order of their mirror image (with a secondary sort on the ftok index, in case the same ftok appears in more than one place). For example, if an ftok is "a4W6", its sort key would be "6W4a". If the number of used ftoks is not evenly divisible by 4, act as if 4 - (token count mod 4) blank items existed at the bottom of the list. These padding items are placeholders with no value and no sort key; they occupy only the bottom slots and are never selected as a quartile's first ftok — if a quartile's first slot falls in the padding region, that quartile simply has no ftok (its quartile mark and any fingerprint-edge recoloring are skipped). Now divide the sorted list into 4 sections and call each section a quartile. Identify the first ftok in each quartile and call it the first quartile ftok, the second quartile ftok, and so on.
If token count is less than cell count, the grid will have blank cells. We want to use blank cells to create visual gaps in a consistent way that is more meaningful than simply putting all the blanks at the beginning or end, because this will aid comparison. Each used ftok corresponds to a token (and therefore to a cell); use that correspondence to locate the cells named below. Insert a blank cell at the cell index of the token corresponding to the median ftok by incrementing the cell index of all tokens whose token index >= that token's token index. This essentially shifts these tokens to the right or down in the grid. If token count + 1 is still less than cell count, insert a second blank cell before the cell of the last ftok in the ASCII-sorted list, again shifting cells that render after. If token count + 2 is still less than cell count, insert a third blank cell before the cell of the first ftok in the ASCII-sorted list, again shifting cells that render after. Do not perform more than 3 shifts. The ASCII-sorted list is computed once (in the median-ftok step above), on the original ftok identities, and all three potential shifts consult that same list; a shift changes only tokens' cell indices, never their token indices — the stable identity that keys the ftok-to-token correspondence. (Inputs greater than 512 bits use this same rule: their 20 head/middle/tail tokens are placed in a grid with spare cells — a 4×6 grid at target_ar = 1.0 — and blanks are inserted by the same median/ASCII-endpoint shifts. There are no fixed separator blanks; see the large-input handling subsection.)

Palette and entviz background color¶

Let the array of possible edge colors be [white #ffffff, gold #e7be00, red #ff3f2f, blue #2f3fbf, black #000000]. The first four entries (indices 0-3) are the background candidates; black at index 4 is always an edge color and is never selected as the entviz background. This is intentional: black is too visually heavy to serve as a background.

entviz palette swatch

Figure 10. The entviz palette, spaced by CIELAB lightness (L*).

Palette rationale. The five colors are spaced primarily along lightness (CIELAB L*), not hue, because lightness is the one channel that survives every color-vision deficiency, monochrome rendering, and CSS color filtering — the channels (hue, chroma) that read as vivid to normal vision are exactly the ones that collapse under CVD. Their L* values are white 100, gold ≈78, red ≈57, blue ≈34, black 0. Gold sits at the maximin point between its neighbors: the white→gold and gold→red lightness gaps are equalized at ΔL* ≈ 21, so neither is the weak link. (Gold was darkened from v5's #ffd966 at L*≈88, where the white/gold gap was only ΔL* ≈ 12 — white and gold were near-indistinguishable on a grayscale/achromat rendering and the weakest pair in the whole palette.) Gold/red carries an additional hue cue (yellow vs red) on top of its lightness gap, so it tolerates the smaller gap; white/gold has no such backup, which is why the budget is spent equalizing it. Honesty caveat: the design target is ΔL* ≥ 20 between every palette pair under each simulated dichromacy, and three pairs dip below it. The worst by far is protanopia red/blue, which collapses to ΔL* ≈ 7 regardless of palette choice — red darkens under protan simulation and no lightness assignment can prevent it. Two milder exceptions also fall just under the floor: deuteranopia gold/red at ΔL* ≈ 17 and tritanopia red/blue at ΔL* ≈ 16 — both far more separable than the protan case, but short of the full ≥ 20 target. Each sub-floor pair stays distinguishable via a retained opponent axis (the blue-yellow axis for the red/blue pairs) and, decisively, the color-bar letters (r/g/b). The palette is robust, not CVD-proof; the letters (see the color bar below) are the guaranteed fallback. These three exceptions are pinned in tests/test_v6_palette_lightness.py (CVD_EXCEPTIONS), so any palette change that introduces a new sub-floor pair fails the suite. The figure below shows the palette under normal vision plus the three dichromacies and achromatopsia (lightness-only), with the closest pair flagged per row; see reviews/palette-optimization-findings.md for the full derivation and rejected alternatives.

entviz palette under color-vision deficiency

Figure 11. The palette under color-vision deficiency (normal vision, three dichromacies, and achromatopsia).

The implementation MUST select the 2 low-order bits of the quant of the median ftok and use this 2-bit number as an index into the background-candidates portion of the array (indices 0-3) to select the entviz background color. For example, if the 2-bit number == 1, the background color is gold. The implementation MUST then remove the selected color from the full possible edge colors array to form a new array of the 4 remaining colors, the edge palette. Black is therefore always present in the edge palette regardless of which background was chosen.

Rationale (non-normative). The entviz background color carries only 2 bits of entropy (4 possible values), so a grinding attacker can match a target's background color in an expected ~4 candidate inputs. This is intentional and acceptable: the background is a hint channel — its job is to make two unrelated entvizes look unrelated at a glance, not to provide independent collision resistance. The serious collision resistance lives in the surround pattern, the color bar histogram, the ellipse overlay, the blank-cell positions, and the quartile marks. A would-be attacker who matches the background color must still independently match each of those higher-bandwidth channels.

Rendering one cell¶

Inside the grid rect, render each token T into its appropriate cell in the grid, using its corresponding used ftok and the edge palette, according to the cell rendering algorithm below.

Casual avalanche (v10) — a non-normative note on two substeps below. Two of the substeps below exist to serve casual, at-a-glance comparison: the fingerprint-edge cells under Edge color, and the fingerprint Blank fill. The reasoning:

Comparison happens in two modes. A careful reader works cell by cell and has the whole fingerprint to draw on; almost any change betrays itself. A casual reader glances, and a glance reads neither text nor surround pattern — it takes in the color gestalt (the background and the broad field of cell colors). entviz promises that the glance suffices for most differences, so the glance is the mode that matters and the one an adversary attacks.

The two modes have different bandwidths, and a channel rich in one can be empty in the other. The surround pattern carries 24 fingerprint bits per cell and avalanches for careful comparison, yet it is casually near-silent — about half its boxes can toggle on a one-character change and a glance still reads the same texture. What a glance does read — color — is the channel that moves least on a small change: text and nucleus color are entropy-derived (so they barely shift when one character does, and five of six cells are untouched), the surround color echoes the nucleus, and the only color that could move the whole picture is the entviz background, which carries just 2 bits and therefore stays unchanged once in four — freezing the entire palette with it.

Measured over 100,000 one-character-neighbor pairs (CIEDE2000 color distance; see experiments/casual-avalanche/), about a quarter (≈24%) of the background-unchanged cases are casually color-identical at baseline, concentrated in dense full-grid inputs (a random UUID and its one-character neighbor: ~61% within that quarter). v10 closes this by moving fingerprint signal into color as a few discordant color singletons — cells colored against the grain of their neighbors, which the eye finds pre-attentively. The effect requires the singletons to stay rare (many recolored cells become noise and nothing pops), so the rule is deliberately partial; this also scales down gracefully to small grids. Two changes implement it: fingerprint-sourced surround edge color on three cells (below), and a hybrid fingerprint blank fill (in the blank-cell step). The locked design takes the background-unchanged quarter to ~0.33% color-miss. These levers add casual salience only — they are not a collision-resistance claim; the careful-comparison channels (surround pattern, color bar, ellipse, blank positions, quartile marks) are unchanged.

A cell is rendered from a token T and the used ftok F that corresponds to it. The token supplies the cell's text and nucleus background color; the ftok supplies the surround pattern.

For a given token T, identify the origin point within the grid rect with coordinates x, y with the following formulas: x = (T.cell index mod column count) * cell width; y = int(T.cell index / column count) * cell height.
Convert the quant for T into an RGB value the same way CSS does it — red in the low-order byte, and so forth — and call this RGB value the nucleus background color. The foreground color is white (#ffffff) or black (#000000), picked by the Oklab perceptual lightness L of the bg: if L < 0.6, use white; otherwise use black. L is computed via the Oklab transform (Björn Ottosson, 2020) — sRGB → linear-light → LMS → cube-root → L. See the reference implementation (src/entviz/colors.py::oklab_lightness) for the exact coefficients.

WCAG relative luminance Y over-weights green (0.7152·G), so saturated dark greens like #55841c land at Y = 0.185 — just past the WCAG-AA equal-contrast crossover at Y ≈ 0.179 and thus pair with black, even though the eye reads them as dark and expects white text. Oklab places the same color at L = 0.559, much closer to the perceptual midpoint. The threshold sits at 0.6 rather than the rigorous Oklab midpoint of 0.5 because small dark glyphs on mid-gray fields read less crisply than small light glyphs of the same lightness gap — the +0.1 bias flips dark-green-class colors (L ≈ 0.54–0.59) to white where they read better.

The naive Y < 0.5 rule used in v3 was the original wrong approach: it mis-paired medium-luminance backgrounds (e.g., light beige #c3b2a1 at Y ≈ 0.47) with white, producing WCAG ratios of 2-3:1 that fail AA. The WCAG Y ≈ 0.179 crossover always yields the higher-contrast pairing in luminance terms, but as noted above, perceptual lightness is a better predictor of how small glyphs actually read.
Determine this cell's edge color as the entry of the edge palette (the 4 non-bg colors) with the minimum weighted RGB distance to the nucleus background color. The distance metric is:
```
d(c1, c2) = sqrt( 2·(r1−r2)² + 4·(g1−g2)² + 3·(b1−b2)² )
```
Green is weighted highest because cone-peak sensitivity in the human visual system is in the green range; blue is weighted lowest. This formula is a cheap stand-in for CIELAB ΔE; implementations MAY substitute true CIELAB ΔE if they prefer, with the understanding that the choice of palette entry per cell may differ on borderline cases.

Fingerprint-edge cells (v10). Three cells override the nearest-palette rule above and take their edge color directly from the fingerprint, so a one-character change is visible to a casual glance (see Casual avalanche): the cell at grid position 0 (top-left, the first-fixation point in left-to-right reading) and the cells corresponding to the 1st and 2nd quartile ftoks (the same cells that carry those quartile marks, and which move with the fingerprint). For each such cell the implementation MUST set the edge color to edge_palette[q & 0b11], where q is that cell's own used-ftok quant and the 2 low-order bits index the 4-entry edge palette (in its established order). The nucleus background color is unchanged (still entropy-derived and lossless). Every other filled cell keeps the nearest-weighted_rgb_distance rule above. If grid position 0 or a quartile cell is blank, or a quartile ftok is null (a small input whose mirror-sort quartile falls in padding), there is no surround to recolor and the override is simply skipped for that cell. If the top-left cell is also a quartile cell, the override applies once — both derivations name the same cell and ftok, so the result is identical. (The background uses the median ftok's low 2 bits; a per-cell ftok's low 2 bits are an independent draw under avalanche, so each fingerprint-edge cell's hue changes with probability ¾ on any input change. The set is kept to ≤ 3 cells so the discordant hues remain pre-attentive singletons rather than confetti.)
Surround layout. Inside the cell, divide the region around the nucleus into 24 surround boxes. Every box is box_width × box_height (= 0.75·box_height × box_height = 6 × 8 at 12pt). The 24 boxes are arranged:
- Top row (10 boxes): each box at y = nucleus.top − box_height. Box i (for i in 0..9) starts at x = nucleus.left − box_width + i·box_width. The row spans x = nucleus.left − box_width to x = nucleus.right + box_width (= nucleus_width + 2·box_width = 10·box_width exactly).
- Right column (2 boxes): each box at x = nucleus.right. Box at index 10 starts at y = nucleus.top; box at index 11 starts at y = nucleus.top + box_height. The two boxes together span the full nucleus height (2·box_height = nucleus_height).
- Bottom row (10 boxes): each box at y = nucleus.bottom. Box at index 12 starts at x = nucleus.left − box_width + 9·box_width, and successive indices step left by box_width, so box 21 is at the same x as the leftmost top-row box.
- Left column (2 boxes): each box at x = nucleus.left − box_width. Box at index 22 starts at y = nucleus.top + box_height; box at index 23 starts at y = nucleus.top.
Box indices 0..23 are numbered clockwise from the top-left of the top row. There are no corner rects — the top and bottom rows extend past the nucleus's left and right edges to cover what would otherwise be corner regions, and the surround tiles the cell's full perimeter flush with cell boundaries.
Surround fill. For each i in 0..23: if bit i of the ftok quant (LSB = bit 0) is 1, fill box i with the cell's edge color. If the bit is 0, draw nothing for box i.
Draw a nucleus rect. Dimensions are nucleus width x nucleus height. Top left corner is at x + box_width, y + box_height. Fill color = nucleus background color. The nucleus is drawn after the surround boxes (and after the ellipse overlay), so the overlay never tints the nucleus.
Determine the cell text rendered font size from the full-token character count of the alphabet the cell's text is written in — the input alphabet for the entropy cells (head, tail, and every cell of a ≤512-bit input), and Crockford base32 for the fingerprint middle cells. The size difference between a hex input's 6-character entropy cells (0.75×) and its 5-character Crockford middle cells (0.80×) therefore tracks the alphabet, not the cell's position. A short final token is written in the same alphabet as its full-token siblings, so it renders at the same size — a 2- or 4-character trailing hex token still renders at the hex 0.75×, never enlarged to reference size — keeping a group of cells visually consistent. (The Crockford base32 middle cells are 5 characters even when the head/tail are 4-character tokens of a 5-/6-bit alphabet, so the middle cells MUST be sized down independently or their 5 glyphs overflow the nucleus.)
- The 4-character-token alphabets (base64, base58, bech32, base32) in the head/tail: rendered font size = the reference font size.
- The 5-character Crockford base32 middle cells (any large input, regardless of the input alphabet): rendered font size = round(0.80 × reference_font_size) (= round(reference × 4/5); rounded to the nearest whole point, ties toward even — 10 pt at the 12 pt reference). This follows directly from the generalized rule below; the 5-char middle cells leave ~9.6 px of horizontal slack inside the nucleus.
- The 6-character-token alphabets (hex, decimal) in the head/tail: rendered font size = round(0.75 × reference_font_size) (rounded to the nearest whole point, with ties broken toward even). The 75% factor leaves ~4.8 px of horizontal slack inside the nucleus even on monospace fonts with the widest char-width ratios. A short final hex/decimal token (fewer than 6 characters) is sized here too, with its siblings, not by its own shorter length.
- Generalized rule (keyed on the alphabet's full-token character count token_chars), in case future spec revisions introduce additional alphabets:
```
rendered_font_size_pt = round(reference_font_size_pt × max(0.75, min(1.0, 4 / token_chars)))
```
  This collapses to the cases above for current token types: 4-char → reference, 5-char → 80% of reference, 6-char → 75% of reference (the 0.75 floor clamps 6-char and beyond). The 0.75 floor ensures readability remains acceptable even if a future token type would technically permit further shrinking. Because the size is keyed on each cell's own alphabet (and a short final token takes its alphabet's full-token width), a single entviz MAY mix rendered sizes (full-size 4-char head/tail and 0.80× 5-char Crockford middle on the same large input).
Geometry (grid, nucleus, cell positions) does not change with the rendered font size — only the size of the glyphs drawn inside the nucleus does. Using the foreground color, write the text of the token on top of the nucleus rect at the rendered font size, centering it vertically and horizontally.
Draw a quartile mark on each quartile ftok's corresponding cell. The mark is a small right triangle in one corner of the nucleus rect: both legs are nucleus_height / 2 long, the right-angle vertex sits at the matching nucleus corner, and the legs run along the two nucleus edges meeting there. The clockwise corner assignment is: 1st = top-left, 2nd = top-right, 3rd = bottom-right, 4th = bottom-left. Quartile identity is carried by triangle orientation alone — there is no per-quartile color palette.

The triangle is filled in the cell text foreground color (#ffffff or #000000, picked by luminance contrast against the nucleus background — the same rule that picks the text color). The mark therefore reads as a small same-color flag in the nucleus corner without obscuring the cell text or requiring any compositing modes. Drawn after the nucleus rect and after the cell text.
Blank cells carry no token. For a blank cell, draw no nucleus, no text, and no surround boxes; the grid_rect's background color shows through. Every blank cell instead carries a rounded-corner rectangle coincident with the cell's nucleus rect (the same centered region a nucleus would occupy, nucleus_width × nucleus_height), with fill per Blank fill (v10) below (v9 and earlier used fill = none), stroke = #000000, stroke-width = 1, and corner radius nucleus_height / 2 (= 10 at 12pt — the maximum meaningful radius for the nucleus's short axis, giving a fully-rounded "pill" end). It is drawn after the ellipse overlay so it sits on top of any overlay tint.

Blank fill (v10). The pill outline is always retained (it preserves the "gap" reading and keeps a blank distinct from a nucleus); v10 fills its interior from the fingerprint so a one-character change recolors the blanks (see Casual avalanche). Let the filled blanks be those that get a fingerprint color: every non-map blank, plus the map blank iff it is the only blank (the hybrid rule under Map rendering below). Enumerate the filled blanks in cell index order; the j-th (0-based) is filled with edge_palette[digest[32 + j] & 0b11] — the 2 low-order bits of digest byte 32 + j index the 4-entry edge palette. (Byte range 32..59 is disjoint from the ellipse's digest bytes 60..63. The edge palette excludes the entviz background, so a blank fill always contrasts with the grid background showing behind it.)

The first blank cell in reading order (the blank with the lowest cell index) additionally becomes a blank-cell map: a miniature scale model of the grid that shows where the two extreme fingerprint cells sit. No other blank cell carries the map — the rest are just the outlined rectangle — so there is exactly one map per entviz.

Define: * minftok cell: among the used ftoks, the one with the smallest 24-bit quant; tie-break = highest cell index of the corresponding cell. * maxftok cell: among the used ftoks, the one with the largest 24-bit quant; tie-break = highest cell index.

Map rendering. Fill the first blank cell's rounded rect by the hybrid rule (v10): if the map blank is the only blank in the entviz, fill it from the fingerprint exactly as a filled blank (edge_palette[digest[32 + j] & 0b11], with j = 0 since it is the only filled blank) — this is the case for the common small inputs (LEI, small hex, 18-char base36) whose sole blank is the map blank, and it is where the casual-avalanche color is most needed; otherwise (two or more blanks) keep the v9 fill — #ffffff when the entviz background is not white, or the palette gold #e7be00 when the entviz background is white — so the map blank reads as a findable white/gold anchor while its sibling blanks carry the fingerprint color. Then subdivide the rect's interior into a logical grid of cols × rows sub-cells mirroring the entviz's own grid dimensions (the subdivision is logical; no grid lines are drawn). The sub-cell at logical position (row, column) corresponds to the full-grid cell at that same position. Draw two markers, distinguished by shape so the max/min identity survives even total color blindness (under which the red and blue collapse to near-equal grays):
- a red plus centered in the sub-cell whose (row, column) is that of the maxftok cell. The plus is a crossed pair of strokes through the center — a horizontal arm and a vertical arm, each extending ±1.2 · marker_radius from center — with stroke-width = max(1, 0.55 · marker_radius), fill = none, and stroke-linecap = butt. Its stroke is #d62828 on the v9 white/gold anchor fill; when the map blank is fingerprint-filled (sole-blank case), the stroke is instead the luminance-contrast color against that fill — #ffffff if the fill's Oklab L < 0.6, else #000000 (the same rule that picks cell-text foreground).
- a blue dot (a filled circle) centered in the sub-cell whose (row, column) is that of the minftok cell. Its fill is #1d4ed8 on the v9 white/gold anchor fill, and the same luminance-contrast color as the plus when the map blank is fingerprint-filled. Because max/min identity is carried by shape (plus vs dot), recoloring the markers costs only the redundant hue cue (see the v8 rationale below).
Each marker carries its cell's position as a literal "row,col" string in data-blank-map-min (the blue dot) / data-blank-map-max (the red plus) — see the SVG profile — so a checker recovers the position from the named attribute rather than reverse-engineering it from pixel geometry.

Each marker is centered in its sub-cell at (sub_cell_width = nucleus_width / cols, sub_cell_height = nucleus_height / rows) spacing, but its size is governed by a fixed marker_radius = nucleus_height / 8 + font_size_px / 16 (= 3.5 at 12pt; the font_size_px / 16 term is exactly 1 px at the 12pt/96dpi nominal size and scales with the entviz), independent of grid dimensions — so markers are the same size on every entviz rather than shrinking on denser grids. On a dense grid a marker MAY overflow its sub-cell; that is acceptable (it still reads as marking that cell's position). When the maxftok and minftok cells coincide (possible only when a single used ftok makes the smallest and largest quant the same cell), both markers are drawn at that one center; because they differ in shape, the red plus painted over the blue dot remains legible (no special-case marker is needed).

Rationale (non-normative). Through v7 both markers were dots distinguished only by hue (red = max, blue = min). The blank-cell map is the channel a habituated reader checks first, yet under achromatopsia the two dots became near-equal grays (ΔL* ≈ 8), so a reader could confirm that two cells are marked but not which is the max and which the min. v8 (PSY-F1) gives the max marker a distinct shape (a plus) so the distinction no longer rests on color; the colors are retained as a redundant cue.

This map replaces v5's white-disc-with-clock-hands marker. Where the clock hands indicated the maxftok/minftok cells by direction (an angle, ambiguous about which cell along the ray), the map indicates them by position in a scale model of the grid — it names the exact cell. The map also uses no mix-blend-mode, so it renders identically in browsers and in non-browser SVG rasterizers, closing adversarial finding F-A6 (the v5 long hand was invisible outside browsers) for this channel.

Blank cells include both the up-to-3 algorithm-inserted blanks (median, ASCII-last, ASCII-first) and any trailing unfilled cells; large inputs use these same blanks (there are no special separator blanks). The lowest-indexed blank in reading order carries the map.

Whole-entviz overlays¶

Draw the color bar in the inset rectangle described in the bounding-rect section above (left border at x = 1, right border at x = 1 + bar_width, drawing height = bounding_rect.height − 2). Build a 4-element histogram by counting how many of the 256 disjoint 2-bit slices of the SHA-512 digest (64 bytes × 4 slices/byte = 256 slices) equal each of the four 2-bit patterns (00, 01, 10, 11). Map binary value i to edge palette[i]. For each palette color whose count is greater than zero, compute count^4. Divide the color bar's drawing height into horizontal bands, one per nonzero color, with each band's height proportional to that color's count^4 value as a share of the sum of all four count^4 values. The fourth-power skew amplifies the dominance of the most-frequent pattern so the bar reads as a clear pecking order rather than four near-equal stripes (which is what a raw-count distribution from a uniformly-random digest typically produces).

Band order (decoupled from height — v9). The bands' vertical order — which band sits on top, next, and so on — is set by the order in which each 2-bit pattern first appears while scanning the 256 disjoint 2-bit slices of the SHA-512 digest in order (slice 0 = the high 2 bits of byte 0, …, slice 255 = the low 2 bits of byte 63); the pattern whose first occurrence is earliest is placed at the top, the next-earliest below it, and so on. Only patterns with nonzero count are banded, and each banded pattern has a distinct first-occurrence slice index, so a genuine tie cannot arise; the pattern-value order 00 < 01 < 10 < 11 is nonetheless specified as the deterministic tie-break to remove any ambiguity. This order is independent of the band heights: heights are still each color's count^4 share (so the thickness pecking order is unchanged), but the vertical sequence is now first-appearance order, not descending-count order. Decoupling the two adds a few reliable discrete bits (read the band letters top to bottom) at zero extra glance and zero CVD cost, and does not hurt the match task — descending-vs-first-appearance order only matters for reading a trend in one bar in isolation, never for "do these two bars match." (Through v8 the order was descending count, which equals argsort(heights) and therefore carried no information beyond the heights themselves; see this.i:b4rm4rks and this.i:d1scr3t3.) Fill each band with its color. Total count is always 256 regardless of grid size, so band proportions stay comparable across small and large inputs.

Color-bar letters. In each band, centered both horizontally (within the color bar's inset width) and vertically (within that band's height), draw a single lowercase letter identifying the band's color:

Color Hex Letter

white #ffffff w

gold #e7be00 g

red #ff3f2f r

blue #2f3fbf b

black #000000 k

Lowercase is chosen so the glyph's visual height sits near x-height rather than full cap-height; this keeps the letter from dominating the narrow color-bar band even on tall slices. The machine-readable data-color-bar-band attribute on each band group remains the uppercase form (W|G|R|B|K) so it stays a stable identifier even if the rendered case is restyled.

The letter's fill color is chosen by the same Oklab perceptual lightness rule applied to cell text against its nucleus background (see Cell Rendering Algorithm): compute Oklab L of the band's fill color; if L < 0.6 use white (#ffffff) for the letter, otherwise use black (#000000). For the five palette colors this resolves to: black letter on white, gold, and red bands; white letter on blue and black bands. (Red #ff3f2f has Oklab L ≈ 0.657, just above the 0.6 threshold, so it pairs with black text — matching the cell-text behavior already established in v4 for cells whose nucleus background is red.)

The letter's font family is the same monospace family as cell text. The letter's font size equals the cell text rendered size for this entviz (the same size the token text uses — see the cell rendering algorithm), giving uniform type across the whole visualization. The letter is not scaled down to fit a short band (the bar is bar_width wide and the cell text always fits horizontally). It is rendered with text-anchor="middle" and is bottom-anchored within its band: the baseline is placed a descender's height (≈ 0.22 × font_size) above the band's bottom edge, so the glyph's bottom never bleeds below the band. On a band too short to contain the full glyph height, the top of the glyph MAY bleed above the band — that is acceptable. Bands and their letters are emitted top-to-bottom in the decoupled band order above (the topmost band first, regardless of its height), so where a short band's letter bleeds upward over the band above it, the lower letter paints on top — deterministic layering.

Why letters. The color bar is the primary gestalt-comparison channel and is the channel most relied on under habituated comparison. v4 communicated band identity by hue alone, which left CVD users (adversarial review finding F6) and users on monochrome or color-filtered displays without a primary discriminator. The letters provide a verbal label that survives color blindness, monochrome rendering, and CSS color filtering, and they enable simple spot-check verification ("the top band is G") without trusting hue discrimination.

Color-bar markers (v9). Two small markers ride the bar, adding a discrete (fixed-slot) comparison channel that is present on every input — including the exactly-filled grid where the blank-cell map (the other best discrete channel) vanishes entirely. They are derived from the second, domain-separated digest second = SHA-512(DOMAIN_TAG ‖ core) (the same digest the large-input middle cells use; DOMAIN_TAG = "entviz/fingerprint-middle/v6\0"), now computed for every input — see the large-input handling subsection where second is defined. The marker bytes (second[12], second[13]) are disjoint from the middle cells' second[0..11].
- Slots. The color bar's drawing height is divided into K = clamp(floor(bar_height / 12px), 4, 16) equal fixed slots, where bar_height is the bar's drawing height (bounding_rect.height − 2). The slots are a fixed grid independent of the bands (their count, heights, and order). Slots are numbered 0..K−1 top to bottom.
- Left marker (left gutter): a small filled circle in the bar's left gutter, at slot second[12] mod K.
- Right marker (right gutter): a small filled circle in the bar's right gutter, at slot second[13] mod K.
- Circles, not distinct shapes. Both markers are circles; identity is carried by side (left = second[12], right = second[13]), not by shape. An earlier draft used a square (left) and an equilateral triangle (right), but shape discrimination proved unreliable at the bar's scale: on a dark band (blue/red/black) the black halo of the two-tone marker disappears, leaving only a small white inner glyph too small to read as a shape. Because the markers already sit in fixed left/right gutters, position alone distinguishes them, so the shape distinction was redundant as well as fragile — circles drop it.
- Gutters, not extra width. bar_width is unchanged: the two markers live in thin (~4 px) gutters inside the existing bar (one left, one right), so the band fills and letters are not displaced and there is no geometry blast radius. Because one marker is always on the left and the other always on the right, the two can never overlap regardless of which slots they land in — so there is no coincident-slot special case, and the side (left/right) is what tells them apart.
- Opaque two-tone paint (NOT blend modes). Each marker is drawn opaque: a white fill with a ~0.75 px black outline (a halo). Implementations MUST NOT draw these markers with mix-blend-mode, a difference/invert filter, or any other compositing-against-the-backdrop trick. The reasons are the same ones that keep the blank-cell map free of blend modes (closing adversarial finding F-A6): a blend/invert marker is invisible outside browsers and breaks the non-browser Tier-B reference rasterizer, whereas pure white-with-black-halo is bit-identical across rasterizers and reads on any backdrop. The halo is also what keeps a marker visible where it straddles two bands (a band boundary cutting through the slot): a single statically-computed inverse color cannot satisfy two backdrops at once, but the white core shows on the dark side of the cut and the black halo on the light side. The contrast is pure lightness, so it is CVD- and grayscale-safe.
- Paint order. The markers are painted after the bands and their letters, and before the final gray border — i.e. within the color-bar layer (paint-order step 1; see the SVG profile's paint-order list).
Honest framing: the markers are a tripwire-tier contributor (2 markers × ~3 bits ≈ 6 independent hard bits, drawn from a digest domain-separated from the primary fingerprint so they add cleanly to a near-collision grind cost), not the security backbone — the cell-text reads remain that. Their decisive property is that they are always present. (See this.i:b4rm4rks and this.i:d1scr3t3.)
Draw the ellipse overlay. v4 always draws an overlay (no input-size skip rule): the anchor enumeration is chosen hybrid based on grid size. Derive the overlay's parameters from fingerprint bytes (the 64 bytes of the raw SHA-512 digest, numbered 0 to 63):
- anchor (hybrid): count the grid's interior corners — cell-corner points strictly inside the grid_rect, of which there are (N − 1) × (M − 1) for an N-col × M-row grid.
- If interior count ≥ 6 (i.e., grid is 3x4 / 4x3 or larger), enumerate the interior corners in row-major order; this produces a centered ellipse mostly visible inside the grid.
- If interior count < 6 (i.e., 2x2, 2x3, 2x4, 2x5, 2x6, 3x3), enumerate the external corners — every cell-corner on the grid_rect's outer boundary, which numbers 2(N + M) per grid. Enumerated in row-major order: top edge left-to-right (N+1 points), then each interior row's leftmost and rightmost corners (2 each), then bottom edge left-to-right (N+1 points). External anchors produce a quarter-ellipse-in-a-corner or half-ellipse-along-an-edge silhouette as most of the ellipse is clipped outside the grid.
Use fingerprint byte 60, taken modulo the number of anchor points in the chosen list, to select the anchor. The anchor is the center of the ellipse, not a point on its boundary. * rx (horizontal semi-axis): compute rx_step = digest[61] mod 16. Then rx = r_min + (rx_step / 15) × (r_max − r_min), where r_min = 0.22·d_far and r_max = 0.58·d_far. d_far is the distance from the chosen anchor to the farthest of the grid rect's four outer corners. Both bounds scale with the grid (through d_far) so the overlay covers a noticeable but partial share of the grid on every grid size: the lower bound keeps the visible silhouette from shrinking to an imperceptible sliver (the failure mode on large grids, where a fixed-size minimum would be lost), and the upper bound keeps it from swamping the grid and destroying the covered-vs-uncovered contrast that is the overlay's entire purpose (the failure mode on small / near-square grids). The fractions 0.22/0.58 were chosen empirically so coverage stays in roughly the 8–70% range across every grid entviz produces (median ≈ 32%); see reviews/ellipse-audit-2026-06-02.md. Because 0.58·d_far > 0.22·d_far for all d_far, the radius range is always valid (it never goes degenerate). This replaces v5's [nucleus_height, d_far − cell_width] bounds, which let small grids be swamped (>80% coverage) and large grids show invisible slivers. * ry (vertical semi-axis): compute ry_step = digest[62] mod 16. Then ry = r_min + (ry_step / 15) × (r_max − r_min), with the same r_min and r_max as rx. rx and ry are drawn independently, so the ellipse ranges from a near-circle to a strongly elongated shape. * rotation: compute rotation_step = digest[63] mod 16. Then rotation = (rotation_step / 15) × 180°. Rotates the ellipse around the anchor. * fill, edge, and opacity: chosen per entviz background color, since the four background candidates each need different treatment to produce a perceptible silhouette. The overlay is drawn as a single ellipse with both a low-opacity interior fill and a higher-opacity 2-px stroke (edge) in the same color. The subtler fill keeps the cells beneath the ellipse legible; the crisper edge keeps the silhouette readable. SVG fill-opacity and stroke-opacity are independent, so both apply to one element.
```
| bg color | hex | fill/stroke color | fill opacity | edge opacity (2-px) |
|---|---|---|---|---|
| white | `#ffffff` | `#000000` (darken)  | 20% | 30% |
| gold  | `#e7be00` | `#000000` (darken)  | 20% | 30% |
| red   | `#ff3f2f` | `#000000` (darken)  | 25% | 35% |
| blue  | `#2f3fbf` | `#ffffff` (lighten) | 35% | 45% |

The **stroke width** is `cell_height / 20` (= 2 px at the 12 pt / 96 dpi nominal size) and scales with the entviz. The stroke is centered on the ellipse path; because the overlay is clipped to the *grid rect*, the edge is visible only where the ellipse curve lies inside the grid (there is no stroke along the straight grid-cut where the ellipse is clipped), so it reads as a silhouette rim rather than a box around the grid.

Saturated bgs need higher opacity to read against the surround boxes; white is least demanding because its darkened overlay is high luminance contrast against the bg already. Blue darkens to near-black, so it's lightened instead. Red lightens into a chalky pink that loses its character, so it stays darkened. No entropy bytes are consumed for fill, edge, or opacity. The split (fill = edge − 10 percentage points) is a v6 refinement: earlier revisions used a single solid fill at the edge opacity, which obscured the underlying cells more than necessary; moving most of the contrast into a thin edge preserves the silhouette while letting the cells show through.
```
16 discrete steps per parameter is intentional: it's near the just-noticeable-difference threshold for both pixel-level radius changes and degree-level rotations, so adjacent steps produce overlays that are visibly distinct from each other.

Clip the overlay to the grid rect, not the bounding rect. The overlay must never appear outside the cells of the grid (it must not leak into the margins or color bar). The clipping is what makes external-anchored ellipses (small grids) visible as quarter/half silhouettes — the portion of the ellipse outside grid_rect is clipped away.

Draw the overlay above the surround-box layer but below the nucleus layer, so that nucleus background colors and text are never affected by it.

SVG implementation notes.

Clip-path id uniqueness. The clipPath element used to confine the overlay must have an id that is unique within the enclosing HTML document, not merely unique within its own SVG. When multiple entvizes are embedded in one HTML page (e.g. a gallery), the browser resolves every url(#…) reference to the first matching id document-wide; if two entvizes both use id="grid-clip", every entviz after the first is silently clipped to the first entviz's grid rectangle. Salt the id with something stable but per-entviz — the reference implementation uses grid-clip-{first_16_hex_of_fingerprint}-{cols}x{rows}. The 16-hex (64-bit) salt gives a birthday-bound headroom of ~4 billion distinct entvizes on a single page before id collision becomes likely. A narrower 8-hex (32-bit) salt would collide around ~65k entvizes per page; large-gallery generators that exceed even the 16-hex bound should additionally rewrite ids at embed time.

Responsive embedding. The root <svg>'s required viewBox (see the SVG profile) is what lets consumer pages scale the entviz proportionally (width="100%" etc.); without it, browsers fall back to fixed-pixel sizing and the entviz becomes brittle when embedded in a responsive layout.

Clip-path with rotated content. When emitting the overlay as an <ellipse> carrying a transform="rotate(…)", the clip-path attribute must live on a non-rotated parent <g> element, not on the ellipse itself. If both attributes go on the same element, SVG resolves the clipPath in the element's post-transform coordinate system — i.e., the clip rectangle rotates along with the ellipse. The two-element structure keeps the clip axis-aligned in screen space while the ellipse rotates within it.

Label strips¶

Draw the label strips. These are thin monospace text bands above (always) and below (only when needed) the grid rect. They identify the entropy type, surface any non-entropy prefix/suffix that the parser stripped from the input, and signal large-input truncation. Strips are drawn after cell content but before the final gray border lines, so they sit on top of any underlying fill but never obscure the bounding rect's gray rim.

Geometry. Both strips have height nucleus_height and abut the grid_rect (no GM between strip and grid). The top strip occupies the band between y = 1 + GM (just below the top gray border, after one GM margin) and y = 1 + GM + nucleus_height, and the grid_rect begins immediately at y = 1 + GM + nucleus_height. The bottom strip — present when the parsed result has a suffix or a user note is supplied (see User note below) — begins immediately at the grid_rect's bottom edge and is followed by one GM margin before the bottom border. Text is centered within each nucleus_height-tall strip (dominant-baseline="central" at the strip's vertical center), so — because the strip abuts the grid — the label text sits the same distance from the grid as nucleus text sits from a nucleus edge. Strip widths match grid_width and are aligned horizontally with the grid_rect.

Text style. Monospace, fill = #666666, font size = the hex-equivalent rendered size = round(font_size_pt × 0.75) px at 96 dpi (= 12 px at the 12 pt reference). The strip font size is fixed at this value regardless of whether the cell text in this entviz uses full-size (4-char tokens) or shrunk (6-char hex) glyphs — the strip's job is to label the visualization, not match its body type. Top-strip text is left-aligned to grid_rect.left; bottom-strip text is right-aligned to grid_rect.right, so the ellipses on the two strips point inward toward the grid.

Top label content (v15 — a projection of the characterization). The top strip is a pure projection of the entropy characterization fields through one grammar — implementations MUST derive it from the same encoding/scheme/role/qualifiers/size_basis/size_bits/parts fields they emit as data-* attributes, NOT by fusing a per-parser type string. The grammar is:

[+hash ]PRIMARY[, MOD]…[, SIZE][, PREFIX]

Slots are joined by ", " (comma-space); there is no trailing :. The optional leading +hash marker is the large-input truncation marker (below). The optional trailing PREFIX slot (v15) echoes a stripped front prefix and is the only slot that may itself end in ... (its own elision — see Stripped-prefix slot below). The slots:

PRIMARY (always present):
- A self-describing prefix scheme (did/urn/gitoid/swhid) renders its self-framing prefix, with no body echo and no ...: did:key, urn:isbn, gitoid:blob:sha256, swh:1:rev.
- Any other scheme renders its short display name: ETH, BTC, LTC, BCH, ADA, XRP, XLM, EOS, UUID, ULID, LEI, snowflake, SSH, CESR, CIDv0, CIDv1, bech32, multihash.
- When scheme is null: text if size_basis == utf8 (the UTF-8 fallback), otherwise the encoding name (hex, base32, base58, bech32, b64, b64url, crockford32, decimal; base64/base64url are shortened to b64/b64url).
MOD (zero or more, comma-joined; silent default, loud departure):
- CESR → the primitive from qualifiers.algorithm with a redundant trailing pubkey stripped (Ed25519 nt pubkey → Ed25519 nt, Ed25519 pubkey → Ed25519, Blake3-256 unchanged) — the role is implied by the primitive.
- SSH → qualifiers.algorithm (ed25519, rsa, dss), with the ECDSA curve shortened to its common short name — ecdsa-nistp256/-nistp384/-nistp521 render ecdsa-p256/-p384/-p521 (there is no rival non-NIST p256, so the standards-body word is redundant; the data-qualifiers.algorithm field keeps the faithful SSH id).
- CID → the content codec (qualifiers.codec) always (it varies: dag-pb/raw/dag-cbor); the hash (qualifiers.hash) only on departure from sha2-256. A CIDv0 is dag-pb/sha2-256 by definition and emits no MOD.
- multihash → the hash only on departure from sha2-256.
- Blockchain schemes → the network (qualifiers.network) only on departure — testnet shown, mainnet silent. The legacy/segwit variant is dropped entirely.
- Everything else (UUID/ULID/ETH/XRP/XLM/EOS/LEI/snowflake, bare encodings) → no MOD.
SIZE (zero or one):
- When scheme is null → shown, with the unit following size_basis: decoded → "<size_bits>-bit", utf8 → "<size_bits/8>-byte" (e.g. hex, 256-bit; text, 56-byte).
- When scheme is ssh or multihash → "<size_bits>-bit" (key/hash size genuinely varies).
- All other schemes → omitted (the size is fixed or pinned by the scheme, or a bit/byte size is a category error for a structured identifier such as a DID/URN/UUID).
PREFIX (zero or one; v15) — the literal front prefix that was stripped from the visualized core, so the reader can reconcile the value they pasted against the cells (whose first character is otherwise silently different — the prefix was peeled off the front before visualization). It is the last element of the first parts entry when that entry's bind is none (a presentation sigil: 0x, 1, bc1, ltc1, bitcoincash:, addr1, the Cosmos-family cosmos1/osmo1, Stellar G/M, the CID multibase b/Qm, the SSH structural header, …). The prefix is always shown when present, in addition to the type name (ETH, 0x, ADA, addr1, bech32, cosmos1): whether the prefix and the type appear redundant is judged by a naive human reader — addr1 does not obviously imply Cardano to someone who does not already know the mapping — so the two never collapse. A folded identity prefix (bind = fold: did:/urn:/gitoid:/swhid:) is not repeated here — it is already the PRIMARY slot — and a bind = core leading part (e.g. a CESR derivation code, which sits in the first cell) is not a stripped prefix. This is the top-strip counterpart of the bottom strip's ...<suffix>, which already reconciles the end of the value.

Stripped-prefix slot — truncation. The prefix is the only elastic label element: PRIMARY/MOD/SIZE are never truncated. It is fit to the character budget the grid leaves on the label line: line_chars = floor(grid_width / (label_font_px × 0.6)), where 0.6 em is a fixed spec constant (NOT the renderer's real monospace metric — so every implementation computes the same integer budget and the Tier-A label string is reproducible regardless of the substituted font). The budget available to the prefix is line_chars minus the marker (if truncated), the joined PRIMARY/MOD/SIZE core, and the ", " that introduces the prefix slot. If the prefix does not fit, it is cut to <head> + "...", where <head> is the leading characters and the elision marker is the ASCII string ... (matching the bottom strip; no Unicode ellipsis). The head length is floored at 4 characters, so a long prefix on a tight line — in practice only SSH's ~24–52-character structural header — still shows a few leading characters rather than collapsing to a bare ... (the prefix is never fully hidden; showing the first few characters tells the reader a substantial prefix was elided). A prefix that fits is shown verbatim (the common case: every non-SSH prefix is ≤ 12 characters). Because only the prefix truncates, a long PRIMARY/MOD/SIZE (e.g. a large SSH key's SSH, rsa, 3096-bit) may still overrun the grid width exactly as it did before v15.

Worked examples: CESR, Ed25519 nt; CESR, Blake3-256; hex, 256-bit; text, 56-byte; did:key; urn:isbn; CIDv1, dag-pb, b; SSH, ed25519, 264-bit, AAAA...; ETH, 0x; BTC, 1; bech32, cosmos1; ADA, addr1; UUID; LEI; snowflake; +hash b64, 712-bit. See this.i:v15pfxlbl, this.i:v14lbl, this.i:lbldedup, and reviews/v14-label-redesign.md.

Bottom label content. "...<suffix>", present when the parsed result has a (bound) suffix. The suffix is only ever an entropy-bound checksum or derivation (see the normalization step's bound-vs-free distinction and Checksum verification below); free annotations are dropped, never shown. Because a bound checksum is now shown, it MUST have been verified (see Checksum verification), so the bottom strip never displays a checksum that does not check out. Examples: ...vfNa (Bitcoin legacy 4-char base58check checksum), ...12 (LEI 2-char MOD 97-10 check). When a user note is also supplied, it follows the suffix after a single space (see below). See this.i:sufxbind.

User note (optional, out-of-band caption). Implementations MAY accept an optional user note — a short, human-supplied caption (e.g. git) for an input whose meaning entviz cannot detect. The note is out-of-band: it is supplied through a dedicated channel (a --note CLI flag / a note render parameter), never as part of the input string. This is deliberate and load-bearing:

The note never enters the entropy core and never affects the fingerprint — two renders of the same value that differ only in their note are identical in every comparison channel (the note is outside the comparison surface). A clean comparison simply omits it.
Because the note is set by whoever runs the renderer (not by the input), an adversary who supplies one of the two compared values cannot inject it; an adversary who instead controls rendering could already draw anything (threat-model tier T2), so the note grants no new capability. A mismatched note only makes two entvizes look more different (the safe direction), so it cannot forge a false match. See threat-model.md and this.i:usrn0te1.

The note MUST be sanitized to printable ASCII: every character in the inclusive range U+0020–U+007E (space through tilde), maximum 10 characters. Spaces and printable punctuation are allowed; case is preserved (it is a caption, not entropy). A note that violates these constraints MUST be rejected with an error — it MUST NOT be silently truncated or otherwise mangled. Restricting to ASCII (rather than any Unicode "printable" set) closes the entire Unicode-spoofing surface by construction — no control characters, bidi overrides, zero-width or combining characters, homoglyphs/confusables, or private-use codepoints can appear — and keeps the rule trivially identical across implementations (no Unicode version or category tables to pin). Injection is handled separately by XML-escaping the note in both its <text> node and the data-user-note attribute, so it holds regardless of which printable characters the note contains. The 10-character cap keeps the note from overflowing even the smallest (1×1) entviz's bottom strip.

The note renders in the bottom strip, after any suffix, separated by one space and wrapped in parens: ...<suffix> (<note>), or just (<note>) when there is no suffix. It is drawn in gray #808080 — quieter than the #666666 of the rest of the label, the same value as the bounding-rect border, so it reads as chrome rather than data — using the same monospace family and label font size (no italics, no separate size). The note's <text> element MUST carry a data-user-note="<note>" attribute so that downstream tooling can distinguish the user caption from algorithm-derived suffix content regardless of styling. The note is rendered in the top strip nowhere — it is kept off the trusted type-label channel.

Large-input truncation marker. When the input exceeds 512 bits and the text channel is reduced to head + fingerprint-middle + tail (see the large-input handling subsection above), the top label is prefixed with a bold, dark-red marker +hash (v15; renamed from v14's fingerprint of). The full top label thus reads +hash PRIMARY[, MOD]…[, SIZE][, PREFIX]. Examples: +hash hex, 1024-bit, +hash b64, 712-bit, +hash ADA, addr1. The wording is deliberate: the cells still show the value's own head and tail bytes, with a hash folded into the middle only for the part too large to display — so it is the value, augmented with a hash (the leading + is additive), not "a hash". This avoids implying the whole picture is a digest. For a bare encoding the SIZE slot conveys the value's size directly; the marker signals that the text cells are a head/tail + fingerprint readout rather than a linear scan.

The marker MUST be rendered with these visual attributes, distinct from the rest of the top label:

Font weight: bold (font-weight="bold").
Fill color: #a00000 (a dark, desaturated red chosen for contrast against the white bounding rect background and reasonable visibility under deuteranopia/protanopia; its Oklab L is ~0.43, which gives it clear separation from the #666 of the rest of the label). Implementations MAY substitute a different dark red provided that (a) it satisfies WCAG AA contrast against white, (b) its Oklab L lies in [0.35, 0.55], and (c) it remains clearly hue-distinct from the rest of the label under common CVD simulations.
Font size: same as the rest of the label (the hex-equivalent rendered size).
Anchor: rendered as the first segment of the top label, immediately followed by a single space and then the rest of the label in the standard #666666 non-bold style. The +hash segment and the rest of the label SHOULD be rendered as a bold dark-red <tspan> followed by its tail (the rest of the label, in #666666) within a single <text> element, so the two flow with exactly one separating space; emitting them as two absolutely-positioned <text> elements risks a font-metric-dependent gap and is discouraged.

The marker communicates what the middle cells are — namely, that the cells display the head and tail of the input plus a fingerprint readout in the middle, NOT a linear scan. A reading user encountering +hash should:

Understand that two inputs sharing all 20 of their filled text cells (8 head + 4 middle + 8 tail) are not necessarily byte-identical (only the head and tail are guaranteed to match the input bytes; the middle cells are a fingerprint readout).
Compare the fingerprint-driven channels (surround pattern, blank-cell positions, color bar, ellipse overlay, quartile marks) as carefully as — or more carefully than — the text channel. This matters more the larger the input: for inputs far over 512 bits the head and tail are a vanishing fraction of the data and serve verification, not representation (see the head/tail scale caveat in the large-input handling subsection), so the fingerprint-driven channels — which bind the entire input including everything between the head and tail — carry essentially all of the comparison signal.
Treat the value size from the label's SIZE slot (e.g. hex, 1024-bit), together with the data-input-bytes metadata, as a useful corroborating fact: two inputs of meaningfully different byte length cannot possibly be the same input, regardless of how their cells appear to match.

The head-byte-only or tail-byte-only collision attacks that were sufficient against v4 are no longer sufficient: an attacker must also reproduce the 4 displayed fingerprint tokens — ~96 specific bits of the input's SHA-512 output (a ≈2⁹⁶ partial preimage). See threat-model.md and adversarial review finding F5 for context.

Part C: Conformance and verification¶

This part is normative. Everything it references is now already defined above.

Conformance¶

This section is normative. It defines what a conformant entviz implementation is, and exactly which properties of its output a checker is entitled to verify. Producing an entviz and its Rendering one cell section define how to compute the output; this section defines what counts as correct.

Conformant implementation¶

A conformant implementation is a program that accepts the render inputs below and, for every input it does not reject, produces an SVG document that is conformant-equivalent (defined under Equivalence relation) to the SVG the reference algorithm produces for the same inputs.

The render inputs are:

entropy — the input string to visualize (REQUIRED).
target aspect ratio — a positive rational W:H (OPTIONAL; the default, and the value used throughout the worked examples, is 1:1).
reference font size — an integer point size (OPTIONAL; default 12). An implementation MUST support at least the range [6, 30] points; behavior outside an implementation's supported range is governed by Error conditions.
user note — an OPTIONAL out-of-band caption, supplied through a dedicated parameter and never as part of the entropy string (see the label-strip step).

An implementation MUST be deterministic: identical render inputs MUST yield conformant-equivalent output on every invocation, on every platform. No part of the output may depend on wall-clock time, locale, environment, or a random source. (The clip-path id salt is derived from the fingerprint, not from randomness; see the SVG profile.)

The three conformance tiers¶

Correctness is verified at three tiers. A checker compares an implementation's output, for each input in the published conformance corpus, against that input's golden artifacts.

Tier A — render model (semantic correctness). The checker recovers the abstract render model from the implementation's SVG (via the SVG profile's required attributes) and compares it field-for-field to the golden render model. Tier A proves that the algorithm computed the right values. It is fast and it localizes failures (e.g. "edge color of cell 7 is wrong"), which a pixel comparison cannot.
Tier B — canonical raster (visual correctness). The implementation's SVG is rasterized by the single reference rasterizer pinned by the corpus, and the result is compared, pixel-by-pixel, to the golden raster. Tier B is the authority on what a human actually sees: it proves layering, color, position, size, and occlusion — properties Tier A cannot prove, because two SVGs with identical render models can still paint them in the wrong order or place. See Canonical rasterization for the text-region exclusion and tolerance.
Tier C — browser smoke (deployment sanity). A small subset of the corpus is rendered in a headless browser and screenshot-compared with loose tolerance. Browsers are the real deployment target but are not bit-reproducible, so Tier C is a non-blocking sanity check, never the authority.

Conformance levels. An implementation is Core-conformant if it passes Tier A for the entire corpus; Visual-conformant if it additionally passes Tier B; and Fully conformant if it additionally passes Tier C. A claim of conformance MUST state the level and the corpus (spec) version it was certified against.

The render model (Tier A)¶

The render model is the abstract structure the algorithm computes prior to SVG serialization. For a given set of render inputs it comprises exactly the following, and two render models are equal iff all of these fields are equal:

version: spec version and library version stamps.
input metadata: the normalized byte length of the input; the truncated flag (true iff the large-input path was taken); the user note, if any.
grid: cols, rows (hence cell count).
entviz background color (one of the four background candidates).
cells, keyed by cell index; each cell is either blank or filled.
A filled cell records: token text; nucleus background color (RGB); foreground color (#000000/#ffffff); edge color (a palette entry); the 24 surround bits (which boxes are filled); the rendered-font-size class (full, 0.80× for a Crockford middle cell, or 0.75× for a 4-bit-alphabet cell); whether it is a fingerprint (middle) cell; and the quartile orientation, if the cell carries a quartile mark.
A blank cell records its presence, and — for the single map-bearing blank — the (row, col) of the minftok and maxftok dots.
color bar: the ordered list of bands, each with its color, rank, and letter (the rank/order is the decoupled first-appearance order — see the color-bar step); the number of marker slots K; and the two marker slot indices (the left-gutter marker at second[12] mod K, the right-gutter marker at second[13] mod K).
ellipse: absent, or present with anchor (x, y), rx, ry, rotation, and the fill/stroke color and opacities implied by the background.
labels: the top-label text (including the truncation marker, type, and stripped prefix) and the bottom-label text (suffix and/or note), with the marker segment distinguished.

An implementation MUST expose enough machine-readable structure in its SVG — at minimum the attributes enumerated in the SVG profile — for a checker to recover every field of the render model unambiguously. The authoritative serialization of each field, for each corpus input, is the golden render model published with the corpus (generated by the reference implementation).

Equivalence relation¶

Two SVG documents are conformant-equivalent iff (a) their render models are equal (Tier A) and (b) their canonical rasters match within tolerance (Tier B). Visual-conformance requires both; Core-conformance requires only (a).

The following are purely serialization-level differences. They MUST NOT be treated as non-conformance, and the equivalence relation ignores them:

the XML prolog, DOCTYPE, character-encoding declaration, and namespace declarations;
attribute ordering, element indentation, and insignificant whitespace;
numeric formatting: coordinate and length values are compared by value, not by string. Two values are equivalent when they agree to within an absolute tolerance of 0.01 px (for the ellipse rotation, 0.01°). So 60, 60.0, and 60.00 are equivalent, and so are two implementations' coordinates that differ only in the last computed bit or in how they rounded an unspecified-precision value. A conformance checker MUST compare these fields numerically within this tolerance and MUST NOT require byte-identical numerals. (The tolerance is deliberately small enough that any Tier-A-equivalent coordinate difference is also within the Tier-B raster tolerance — Tier A ⊆ Tier B — so a coordinate difference can never be accepted by the model comparison yet produce a raster the pixel comparison rejects. It is still ~10× the largest legitimate rounding difference, ≈0.001 px, so it never causes a spurious failure.);
the concrete value of any salted id (e.g. the clip-path id), provided ids remain unique within the document as required;
element grouping (<g> nesting) that changes neither paint order nor geometry;
the presence of additional, advisory metadata beyond the REQUIRED attribute set, provided it does not alter rendering.

Numeric serialization. Every number an implementation writes into the SVG (coordinates, lengths, the ellipse parameters, transform arguments) MUST be a finite decimal in plain notation: implementations MUST NOT use exponential/scientific notation (e.g. 1e-7), because not every SVG consumer accepts it in every attribute context and it defeats by-eye diffing. Implementations SHOULD emit the compact form — at most 3 fractional digits, no trailing zeros after the decimal point, integer-valued numbers with no decimal point, and -0 written as 0 — to keep documents small. The rounding used to reach 3 digits is deliberately not constrained (round-half-up, round-half-to-even, etc. are all acceptable): the 0.01 px equivalence tolerance above absorbs the ≤ 0.001 px disagreement different rounding modes produce, so an implementation MAY use its language's native fixed-precision formatter (printf("%.3f"), Number.toFixed(3), format!("{:.3}"), strconv.FormatFloat, String.format("%.3f"), std::format) — trimming trailing zeros — without a hand-written rounder. This is what keeps the serialization portable across languages whose formatters round ties differently (Java and JS round half up; Python, Go, Rust, and C/C++ round half to even).

Conversely, any difference in the render model beyond the numeric tolerance above, or any difference visible in the canonical raster outside text-glyph regions beyond tolerance, is non-conformance.

Canonical rasterization (Tier B)¶

The corpus pins exactly one reference rasterizer (name, version, and rendering DPI) and ships, per input, a golden raster produced by it. An implementation's SVG, rasterized by that same rasterizer at that DPI, MUST match the golden raster.

Text-glyph regions are excluded from the pixel comparison. This spec does not require glyph-shape equality across platforms (see the font-family fallback chain in the algorithm): the visible glyph depends on which monospace font the environment provides. Text is therefore proven through the render model (Tier A: content, position via cell geometry, rendered size class, and fill color), not through pixel equality. The excluded regions are the bounding boxes of the <text> elements (cell text, label strips, and color-bar letters). Every other channel — surround boxes, nucleus fills, quartile triangles, the color-bar band fills, the ellipse overlay, blank-cell outlines and fills and the blank-cell map markers, and the gray borders — MUST match the golden raster within the corpus's stated per-channel tolerance (a small allowance for anti-aliasing only). (v10: blank-cell pill fills and the fingerprint-edge cells' surround colors are fingerprint-driven, and the map markers may be recolored to luminance contrast in the sole-blank case; all are part of the matched render model.)

Because Tier B fixes one rasterizer for all implementations, a pixel difference means the SVG differs, not the renderer — which is what makes cross-language certification meaningful.

SVG profile¶

A conformant SVG MUST satisfy the following. These requirements are what make Tiers A and B checkable; they constrain structure and paint order, not serialization style (see the equivalence relation).

The root <svg> MUST carry width, height, and viewBox="0 0 <bounding_width> <bounding_height>", so consumers can scale the entviz responsively.
The SVG MUST carry the spec-version and library-version stamps (data-entviz-version, data-entviz-lib) and the grid dimensions (data-cols, data-rows).
Each grid cell MUST be identifiable by data-cell-index and locatable by data-cell-col/data-cell-row. Blank cells MUST be flagged (data-cell-blank), the map-bearing blank distinguished (data-cell-blank-map) with the marker positions carried as a literal "row,col" string in data-blank-map-min (the blue minftok dot) and data-blank-map-max (the red maxftok plus) — a checker MUST be able to recover each (row, col) from the attribute value directly, without measuring pixel geometry — fingerprint/middle cells flagged (data-cell-fingerprint), and quartile-bearing cells marked (data-cell-quartile).
Each filled cell MUST declare its surround channel on the cell group: data-surround-bits carries the 24-bit pattern as a hexadecimal integer (e.g. data-surround-bits="0xad2981", bit i = box i), and data-edge-color carries the cell's edge color whenever at least one box is set (it is absent when the pattern is 0x0, where the edge color is undefined). A checker MUST recover the surround bits and edge color from these attributes, not by measuring box geometry. The boxes themselves MAY be rendered as a single <path> (one subpath per set box) or as individual rects; either way they MUST sit in the surround layer (painted before the cell nuclei and the ellipse overlay, so the overlay composites over them), and their pixels are verified by Tier B while their values are verified by Tier A from the attributes above.
Each color-bar band MUST carry its stable uppercase identifier (data-color-bar-band ∈ {W,G,R,B,K}), its data-color-bar-rank (the band's position in the decoupled first-appearance order — top band rank 0; this is no longer argsort(heights) — see the color-bar step), and data-color-bar-letter.
The color bar MUST expose its two markers and slot count: data-bar-marker-left and data-bar-marker-right each carrying the integer slot index of that gutter's marker, and data-bar-slots carrying the integer K, so a Tier-A checker recovers the markers without measuring pixel geometry.
When an ellipse overlay is present, its parameters MUST be exposed (data-ellipse-anchor-x, data-ellipse-anchor-y, data-ellipse-rx, data-ellipse-ry, data-ellipse-rotation-deg).
Truncated (large-input) renders MUST set data-truncated; a user note MUST be exposed on its <text> element as data-user-note. Logical channels MAY be grouped and tagged with data-channel.
Text font carriage (serialization-style, not normative structure). font-family is an inherited SVG presentation property: the monospace fallback chain MAY be set once on an ancestor of the <text> elements (e.g. the root <svg>) and inherited, rather than repeated on every <text>. Likewise, a <text>'s rendered size MAY be carried as a font-size presentation attribute on the element rather than inside a style="...; font-size:Npx;" declaration. A conformant checker MUST recover the rendered size class from either form, and MUST NOT require a per-<text> font-family. These are equivalent serializations of the same render model and do not affect the equivalence relation; hoisting the chain to the root materially shrinks dense SVGs.
The root <svg> SHOULD carry data-input-bytes, the byte length of the raw input as serialized for fingerprinting, as informational metadata. It is not part of the equivalence relation: two conformant renderings of the same value MAY differ in data-input-bytes (e.g. a dashed vs. undashed UUID, whose presentation differs but whose normalized core — and therefore whole render model — is identical), so a checker recovering it MUST exclude it when comparing renderings for equivalence.
The clip-path id confining the ellipse overlay MUST be unique within the enclosing HTML document; the reference implementation salts it with the fingerprint and grid dimensions (see the ellipse step). Its concrete value is not significant (equivalence relation).

Closed profile (no overlaid content). A conformant entviz is a closed document: it contains only the elements that make up the channels above, and nothing else is rendered. There is no slot for a brand or logo (as in the center of a QR code), a copyright or watermark, a caption beyond the user note, or any other text or graphic overlaid on the diagram. Concretely, the only element types that may appear are <svg>, <defs>/<clipPath> (and the clip <rect>), <g>, and the channel primitives <rect>, <path>, <text>/<tspan>, <polygon>, <circle>, <ellipse>, and the border <line>s — each only in the channel it belongs to, with no extra instances (one background, one ellipse overlay, no extra children inside a cell or color-bar group). Purely non-rendering advisory metadata — <title>, <desc>, <metadata>, XML comments, and additional data-* attributes — is permitted, since it adds no ink. Any other element, or any rendering element outside its channel, is non-conformance: a checker MUST reject it (the reference checker's validate_closed_profile enforces this at Tier A, so it is caught without a golden raster). This is what guarantees that an entviz a viewer trusts as a fingerprint cannot have been quietly dressed up with extra marks.

Paint order (normative layering). Implementations MUST paint in the following back-to-front order, because the visual result — what occludes what — depends on it and is verified by Tier B:

the frame-rect white fill (inset by MARGIN; the quiet ring outside it stays transparent) and the color bar — its band fills, then the band letters, then the two color-bar markers (circles, in the left and right gutters) on top;
per cell, the surround boxes;
the ellipse overlay, clipped to the grid rect;
per cell, the nucleus rect, then the cell text, then the quartile mark (so text and nuclei are never tinted by the overlay);
the blank-cell outlines and the blank-cell map (so the map sits on top of any overlay tint);
the label strips;
the gray border lines last, so nothing overwrites them.

Error conditions¶

A conformant implementation MUST reject the following inputs with an error and MUST NOT emit an SVG for them:

a mixed-case Ethereum (EIP-55) address whose case pattern fails the EIP-55 checksum — the error MUST identify the first mismatched-case digit (see the normalization step). All-lowercase and all-uppercase Ethereum addresses are accepted ("checksum not asserted").
an input that structurally matches a checksummed scheme but whose bound checksum fails to verify (v14) — a base58check address (Bitcoin/Litecoin legacy), a bech32/bech32m address (Bitcoin segwit bc1, Litecoin ltc1, or the generic <hrp>1… Cosmos-family form), a Bitcoin Cash CashAddr whose 40-bit BCH checksum fails, or a GLEIF LEI whose ISO/IEC 7064 MOD 97-10 check digits do not match. See Checksum verification in the normalization step.
a user note that violates sanitization — anything other than 1–10 printable-ASCII characters ([\x20-\x7E]{1,10}, i.e. U+0020 space through U+007E tilde). The note MUST NOT be silently truncated or otherwise mangled (see the label-strip step).
render parameters outside the implementation's supported range — e.g. a reference font size outside [6, 30] or an aspect-ratio component outside the implementation's accepted bounds. (The reference implementation rejects font sizes outside [6, 30] and aspect-ratio components outside [1, 100].)

A well-formed input that selects a degenerate grid (fewer than 2 columns or 2 rows) cannot arise from the grid-selection rule, which guarantees at least a 2×2 grid; an implementation MUST NOT emit a single-row or single-column grid regardless.

Rejection MUST be reported through the implementation's normal error channel (an exception, a non-zero exit, etc.); the specific mechanism is implementation-defined, but the set of rejected inputs is normative.

Part D: Appendices¶

Design rationale and further ideas¶

Non-normative. The why behind these decisions is recorded in this.i (design-rationale nodes such as h4shtext, s3mpr3fx, v14lbl, v15pfxlbl, cr0ckmid, b4rm4rks, d1scr3t3, usrn0te1), in the per-version history in spec-change-log.md, and in the threat model in threat-model.md.

Ideas recorded for possible future revisions (some, such as the 2-bit histogram, have since been implemented as the color bar):

display a 2-bit pattern histogram for the fingerprint as a redundant comparison aid
allow toggling off each channel, each color, CRC
spotcheck by reading a row or column or by having a column / row slider
render with a legend for rows and columns

Color	Hex	Letter
white	`#ffffff`	`w`
gold	`#e7be00`	`g`
red	`#ff3f2f`	`r`
blue	`#2f3fbf`	`b`
black	`#000000`	`k`