-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of "Safe" binary encoding #19
Comments
Yes, I think you got it right. "7 LSB" really was meant to mean "use the lowest 7 bits" (leave highest bit -- sign bit -- zero). And I did indeed drop the ball on clarification. But in practical terms the implementation is indeed the definition: if and when padding is on MSB (and not like spec claims, LSB), that is the Right encoding. |
Ok, that's great. We're also putting together a set of test vectors, and I accidentally wrote a right-padding encoder on my way to a left-padding one, so I should have all my bases covered in case testing reveals something strange. After we've got our encoder successfully sending data downstream (to Elastic, which I expect uses the reference implementation), I can send a PR if you'd like. |
@Zankoku-Okuno yes. You are right. I tried to improve description (wrt FasterXML/jackson-dataformats-binary#37) , but would appreciate help in better wording. |
Ah, now that I've "graduated" from the array encoder to writing the actual number encoder, I notice also that the spec doesn't state whether the byte arrays that encode are big- or little-endian; looking at the Java documentation, I'm assuming big-endian (which is also consistent with the rest of the spec), but it would be good to state this explicitly. Aha, and here's an interesting point that I had missed in the javadocs and had me going around in circles for a while:
If you remember anything else Java-specific that's made its way into the spec please leave a comment; I'll include it in my clarification PR. |
I am not quite sure what you mean by big- vs little-endian wrt byte arrays: endianness is a property of multi-byte values isn't it? Endian-ness should be consistently applied and I think, big-endian (common with IETF network protocols maybe due to Sun legacy). And yes, encoding of |
I'm running into some issues determining a precise meaning of the specification for encoding BigInteger and BigDecimal.
The specification states:
Which led me to believe that 7 LSB is a well-known encoding method that I simply hadn't yet heard of. Unfortunately, google searches for
"7 LSB" encoding
(and similar) only turned upI spotted the phrase "7/8 encoding" in the description of token type 0xE8 (which I think is the same as the "safe binary encoding" mentioned earlier), but that search term leads to
Digging deeper led to FasterXML/jackson-dataformats-binary#37, which has been open and afaict essentially untouched for five years now. At this point I have to assume that the implementation has won by default, and that's how we'll be proceeding with our own encoder unless integration testing says otherwise.
Anyway, I figured I'd show up to confirm y'all's notion of safe binary encoding (or 7 LSB, or 7/8 encoding) according to the behavior of the implementation:
Safe binary encoding is an 8-bit clean encoding of arbitrary byte arrays. The byte array is segmented into seven-bit groups from lowest-to-highest index and most-to-least significant bit within each byte. Each group is then padded with leading zero bits until the group is the size of a full octet. Note that this means that each full seven bit group is prepended with a single leading zero bit, but the last group (which may have fewer than seven bits) contains its data in the rightmost (least-significant) bits. (Note: the standard at time of writing specifies that a partial trailing group contains padding in its "rightmost, least-significant bits". I'm honestly not sure which is more "elegant" and don't really have an opinion other than "break the fewest things".)
As an example, the encoding of the array
{0xDE, 0xAD, 0xBE, 0xEF}
is performed as follows:The text was updated successfully, but these errors were encountered: