The CBOR Data Format

This chapter will give an introduction into the CBOR data format and what are the advantages or features CBOR offers.

What is CBOR?

"The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation." (http://www.cbor.io)

That said, CBOR is a programming language agnostic data format to serialize information in a very concise byte stream. In comparison to other data formats are available to store JSON-alike data (like BSON, JSONB), CBOR has the main purpose to store and represent those information in one of the smallest possible ways.

In simple words, CBOR can be used or described as a binary representation of JSON. Given this fact, it’ll bring in schema-less data handling, it delivers, however, additional advantages on top of that.

Why CBOR?

CBOR has a set of features, which are not always unique to CBOR, however unique in that combination.

This section will deliver a quick overview of the different elements that make CBOR such a great data format and how they can be used to create applications and data stores for current and future needs.

JSON-alike (document storage)

CBOR is designed to store JSON-alike information. However, it does not directly relate to JSON in terms of supported data types.

This means that CBOR is not limited to the data types available in the ECMAScript or JavaScript language (object, array, number, string, true, false or null - https://tools.ietf.org/html/rfc7159#section-3).

Concise Representation

Based on the background, that CBOR was designed to be useful for network protocols, the byte stream representation aims to be as concise as possible, without the need of any complicated compression algorithms or encoding, decoding schemes.

That means information stored in CBOR use a very small byte representation, often the smallest possible, and safe storage space as well as traffic. Therefore, right now, the IETF considers CBOR to be the foundation for multiple new network protocols.

Binary

Information in CBOR are stored in binary form. Given the fact, that most use cases for information include machines to understand the data, a binary data format has speed advantages over human-readable data formats like JSON or XML which have to be parsed each time the computer is supposed to understand the data stored.

In comparison to a lot of other binary data formats CBOR can easily transformed into a human-readable form for debugging or logging purposes, rendering the necessity to store in human-readable form invalid.

Streamable

CBOR is designed as a read-forward-only data format, giving the option to already start parsing before all of the data is available. On the other hand it also means, that CBOR byte streams can be created in a streaming fashion, bringing multi-chunk strings and indefinite arrays or maps to the game.

Those additional features offer to start generating before the number of items or the length of a string is known, rendering it very effective as the data transport format for reactive protocols.

Schema-less

A lot use cases for JSON, name the schema-less nature of JSON to begin with. Schema-less offers the possibility to change, add or remove elements over time based on needs. Especially in modern big-data applications that store information over years, where external data sources or system might change, this feature is very welcome.

It means parsers need to offer a way to manage this schema-less nature but it also means the data format needs to support it.

CBOR, by design, is schema-less, the same way as JSON is. That said, it’ll be usable in every situation where you consider JSON but offers additional advantages as named in the other bullets.

Type Safe

Given the CBOR specification, each and every item knows its own data type. Even though the overall document itself is schema-less, that means items have a fixed type. Over the data types defined by JSON specification (which is limited by the ECMAScript language), CBOR offers additional data types and the option to add tags to CBOR items to specify them even further (e.g. a string which actually represents a date).

This type safety brings the additional, already mentioned, possibility to, on-the-fly, generate a human-readable form of any CBOR encoded byte stream.

Highly Extensible

As mentioned before, CBOR has an extensible type system, which is built on the basis of the fixed set of basic types. In addition it brings extension types in the form of data item tags which prefix a specific item to specify it further as a specific representation. That way, for example, can a string be prefixed to be understood as a date or data-time item.

Items not known to a specific parser implementation are defined to be ignored or returned as un-parsed to give the option to the user to implement decoding in his own implementation.

Well defined

The CBOR specification is extremely well defined, stable and easy to implement in additional programming languages. This guarantees interoperability and is less error prone than most other binary data formats.

IETF Standard

CBOR is specified by the IETF (https://tools.ietf.org/html/rfc7049), making it a good choice to rely your data to store on. Being stable for years (first inception 2013) and a very discreet update in the making. This means data stored will be readable even in years time from today, being an important capability of a data format for long term storage.

CBOR - The Data Format