You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Protobuf libaries were easily available for prototyping in Rust and Solidity at time of creation
Via ordered fields in protobuf schemas, it is pretty easy to have a determenistic content-addressable format
Low size overhead over contents
Cons:
Protobuf is comparatively complex for the simple features we need of it
As the protobuf encoding doesn't contain any information about the entity kind, the entity kind has to be known for the encoding to be correctly interpreted
Per-EntityKind CID multicodecs would require a lot of codecs to be registered/coordinated
Unwieldy to use in end-user applications
Proposal
As a replacement for the Probuf format, I am proposing a CBOR-based format with included prefix. The major upside is that CBOR is standardized, and has robust implementations in many languages. The other major change would be an included prefix that makes it possible to deserialize the entity without externally provided information what entity kind the bytes represent.
Prefix as varint (CBOR unsigned vs. multibase varint?) that specifies the entity kind. Could also be a full multicodec per entity kind. Maybe it's even possible to have it be both a multibase varint that together with a multicodec prefix (for a whole block of numbers) can act as a full multicodec.
Body as CBOR where empty fields are omitted.
As field keys numbers are used instead of full field names like "annotation", so it is more compact.
It could be a good idea to use a shared mapping of "field name"->"field key number", so that the only thing that is necessary for properly deserializing is the the entity kind and that mapping. Otherwise we would need to have a per-entity-kind mapping, which would be more complex in its creation. E.g. annotations would be 0 for Class while it is 1 for DataProperty.
Since only the field key numbers from 0-23 use 1 byte, and the ones after that 2 bytes, the mapping should prioritize the most common field names. Most common might mean either the field names that appear in most entity kinds (e.g. annotations), or field names that appear in entity kinds that are used very often (e.g. all field names in DataProperty).
The text was updated successfully, but these errors were encountered:
Might also be a good idea to always encode the entity kind as the 0 field in the CBOR body, instead of having a prefix.
This is probably better supported than CBOR tags in most libraries, and all of the encoding/decoding can be done in a single step after decoding the CBOR, instead of having to parse the prefix first.
Motivation
The Protobuf format is not that great. An overview from the current doc page of the pros and cons (https://rlay-project.github.io/rlay-client/docs/rlay-ontology-serialization-formats#protobuf-based-format):
Proposal
As a replacement for the Probuf format, I am proposing a CBOR-based format with included prefix. The major upside is that CBOR is standardized, and has robust implementations in many languages. The other major change would be an included prefix that makes it possible to deserialize the entity without externally provided information what entity kind the bytes represent.
annotations
would be0
forClass
while it is1
forDataProperty
.annotations
), or field names that appear in entity kinds that are used very often (e.g. all field names inDataProperty
).The text was updated successfully, but these errors were encountered: