Skip to content

Commit

Permalink
Merge pull request #71 from SmithSamuelM/revised-format
Browse files Browse the repository at this point in the history
Changed version string defintion to refer to CESR specificaiton
  • Loading branch information
SmithSamuelM authored Jan 30, 2024
2 parents 26774a7 + e813b10 commit 014669f
Showing 1 changed file with 3 additions and 7 deletions.
10 changes: 3 additions & 7 deletions spec/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,17 +257,13 @@ The primary field labels are compact in that they use only one or two characters

### Version string field

The version string, `v` field MUST be the first field in any top-level field map of any ACDC. It provides a regular expression target for determining a serialized field map's serialization format and size (character count) constituting an ACDC field map (message body). A stream parser may use the version string to extract and deserialize (deterministically) any serialized stream of ACDC field maps in a set of ACDC field maps. Each field map in a stream may use a different serialization type.
The version string, `v`, field shall be the first field in any top-level ACDC field map encoded in JSON, CBOR, or MGPK as a message body [[spec: RFC4627]] [[spec: RFC4627]] [[ref: CBOR]] [[ref: RFC8949]] [[ref: MGPK]]. It provides a regular expression target for determining a serialized field map's serialization format and size (character count) constituting an ACDC message body. A stream parser may use the version string to extract and deserialize (deterministically) any serialized stream of ACDC message bodies. Each ACDC message body in a stream may use a different serialization type. The format for the version string field value is defined in the CESR specification [[ref: CESR]].

The format of the version string is `ACDCVVVKKKKBBBB_`. It is 16 characters in length and is divided into five parts: protocol, version, serialization kind, serialization length, and terminator. The first four characters, `ACDC` indicate the protocol. The CESR encoding standard supports multiple protocols, `ACDC` being one of them. The next three characters, `VVV`, provide in Base64 notation the major and minor version numbers of the version of the ACDC protocol specification. The first `V` character provides the major version number, and the final two `VV` characters provide the minor version number. For example, `CAA` indicates major version 2 and minor version 00 or in dotted-decimal notation, i.e., `2.00`. Likewise, `CAQ` indicates major version 2 and minor version decimal 16 or in dotted-decimal notation `1.16`. The version part supports up to 64 major versions with 4096 minor versions per major version. The next four characters, `KKKK` indicate the serialization kind in uppercase. The four supported serialization kinds are `JSON`, `CBOR`, `MGPK`, and `CESR` for the JSON, CBOR, MessagePack, and CESR serialization standards, respectively [[spec: RFC4627]] [[spec: RFC4627]] [[ref: CBOR]] [[ref: RFC8949]] [[ref: MGPK]] [[ref: CESR]]. The next six characters provide in Base64 notation the total length of the serialization, inclusive of the version string and any prefixed characters or bytes. This length is the total number of characters in the serialization of the ACDC field map. The maximum length of a given ACDC field map serialization is thereby constrained to be *2<sup>24</sup> = 16,777,216* characters in length. The final character `_` is the version string terminator. This enables later versions of ACDC to change the total version string size and thereby enable versioned changes to the composition of the fields in the version string while preserving deterministic regular expression extractability of the version string.

Although a given ACDC field map serialization kind may have characters or bytes such as field map delimiters or framing codes that appear before, i.e., prefix the version string field in a serialization, the set of possible prefixes for each of the supported serialization kinds is sufficiently constrained by the allowed serialization protocols to guarantee that a regular expression can determine unambiguously the start of any ordered field map serialization that includes the version string as the first field value. Given the length of the version string, a parser may then determine the end of the serialization to extract the full ACDC field map serialization from the stream without first deserializing it. This enables performant stream parsing and off-loading of ACDC streams that include any or all of the supported serialization types.
The protocol field, `PPPP` value in the version string shall be `ACDC` for the ACDC protocol. The version field, `VVV`, shall encode the current version of the ACDC protocol [[ref: CESR]].

##### Legacy version string field format

Compliant ACDC version 2.XX implementations shall support the old ACDC version 1.XX version string format to properly verify ACDCs created with 1.XX format events.

The format of the version string for ACDC 1.XX is `ACDCvvKKKKllllll_`. It is 17 characters in length and is divided into five parts: protocol, version, serialization kind, serialization length, and terminator. The first four characters, `ACDC` indicate the protocol. The CESR encoding standard supports multiple protocols, `ACDC` being one of them. The next two characters, `vv` provide the major and minor version numbers of the version of the ACDC protocol specification in lowercase hexadecimal notation. The first `v` provides the major version number, and the second `v` provides the minor version number. For example, `01` indicates major version 0 and minor version 1 or in dotted-decimal notation `0.1`. Likewise, `1c` indicates major version 1 and minor version decimal 12 or in dotted-decimal notation `1.12`. The next four characters, `KKKK` indicate the serialization kind in uppercase. The four supported serialization kinds are `JSON`, `CBOR`, `MGPK`, and `CESR` for the JSON, CBOR, MessagePack, and CESR serialization standards, respectively [[spec: RFC4627]] [[spec: RFC4627]] [[ref: CBOR]] [[ref: RFC8949]] [[ref: MGPK]] [[ref: CESR]]. The next six characters provide in lowercase hexadecimal notation the total length of the serialization, inclusive of the version string and any prefixed characters or bytes. This length is the total number of characters in the serialization of the ACDC field map. The maximum length of a given ACDC field map serialization is thereby constrained to be *2<sup>24</sup> = 16,777,216* characters in length. For example, when the length of serialization is 384 decimal characters/bytes, the length part of the version string has the value `000180`. The final character `_` is the version string terminator. This enables later versions of ACDC to change the total version string size and thereby enable versioned changes to the composition of the fields in the version string while preserving deterministic regular expression extractability of the version string.
Compliant ACDC version 2.XX implementations shall support the old ACDC version 1.x version string format to properly verify message bodies created with 1.x format events. The old version 1.X version string format is defined in the CESR specification [[ref: CESR]]. The protocol field, `PPPP` value in the version string shall be `ACDC` for the ACDC protocol. The version field, `vv`, shall encode the old version of the ACDC protocol [[ref: CESR]].


### Self-addressing identifier (SAID) fields
Expand Down

0 comments on commit 014669f

Please sign in to comment.