Skip to content

scottkidder/Chronicle-Wire

 
 

Repository files navigation

Wire Format abstraction library

The purpose of Chronicle Wire is to combine a number of concerns in a consistent manner:

  • Application configuration. (Using YAML)

  • Data serialization (YAML, binary YAML, JSON, Raw binary data, CSV)

  • Accessing off heap memory in a thread safe manner. (Bind to shared off heap memory)

  • High performance data exchange via binary formats. (Only include as much meta data as you need)

Design

Chronicle Wire uses Chronicle Bytes for bytes manipulation and Chronicle Core for low level JVM access.

Why are these concerns conflated?

Often you want to use these interchangeably.

  • Configuration includes aliased type information. This supports easy extension through adding new classes/versions and cross platform through type aliasing.

  • By supporting types, a configuration file can bootstrap itself. You control how the configuration file is decoded. engine.yaml

  • To send the configuration of a server to a client or visa-versa.

  • To store the configuration of a data store in it’s header.

  • In configuration, to be able to create any object or component.

  • Save a configuration after you have changed it.

  • To be able to share data in memory between processes in a thread safe manner.

Features

Chronicle Wire supports a separation of describing what data you want to store and retrieve and how it should be rendered/parsed. Wire handles a variety of formatting options for a wide range of formats.

A key aim of Wire is to support schema changes. It should make reasonable attempts to handle:

  • optional fields,

  • fields in a different order,

  • fields the consumer doesn’t expect. Optionally parsing them or ignoring them,

  • more or less data than expected (in field-less formats),

  • reading a different type to the one written,

  • updating fixed length fields, atomically where possible via a "bound" data structure.

It should also be as efficient as possible in the case where any or all of these are true:

  • fields are in the order expected.

  • fields are the type expected.

  • fields names/numbers are not used.

  • self describing types are not needed.

  • random access of data values is supported.

Wire is designed to make it easy to convert from one wire format to another. e.g. you can use fixed width binary data in memory for performance and variable width or text over the network. Different TCP connection could use different formats.

Wire also supports hybrid wire formats. e.g. you can have one format embedded in another.

Support

This library will require Java 8. Support for C++ and C# planned.

Text Formats

The text formats include * YAML (a subset of mapping structures included) * JSON (super set to support serialization) * CSV (super set to support serialization) * XML (planned) * FIX (proposed)

Options include:

  • field names (e.g. JSON) or field numbers (e.g. FIX)

  • optional fields with default values that can be dropped.

  • zero copy access to fields. (planned)

  • thread safe operations in text. (planned)

To support wire format discovery, the first byte should be in the ASCII range, adding an ASCII whitespace if needed.

Binary Formats

The binary formats include:

  • binary YAML.

  • delta compressing Binary YAML. (Chronicle-Wire-Enterprise)

  • typed data without fields.

  • raw untyped fieldless data.

  • BSON (Binary JSon). (planned)

Options for Binary format:

  • field names or field numbers.

  • variable width.

  • optional fields with a default value can be dropped.

  • fixed width data with zero copy support.

  • thread safe operations.

Note: Wire supports debug/transparent combinations like self describing data with zero copy support.

To support wire format discovery, the first bytes should have the top bit set.

Using Wire

simple use case.

First you need to have a buffer to write to. This can be a byte[], a ByteBuffer, off heap memory, or even an address and length you have obtained from some other library.

// Bytes which wraps a ByteBuffer which is resized as needed.
Bytes<ByteBuffer> bytes = Bytes.elasticByteBuffer();

Now you can choose which format you are using. As the wire formats are themselves unbuffered, you can use them with the same buffer, but in general using one wire format is easier.

Wire wire = new TextWire(bytes);
// or
WireType wireType = WireType.TEXT;
Wire wireB = wireType.apply(bytes);
// or
Bytes<ByteBuffer> bytes2 = Bytes.elasticByteBuffer();
Wire wire2 = new BinaryWire(bytes2);
// or
Bytes<ByteBuffer> bytes3 = Bytes.elasticByteBuffer();
Wire wire3 = new RawWire(bytes3);

So now you can write to the wire with a simple document.

wire.write(() -> "message").text("Hello World")
      .write(() -> "number").int64(1234567890L)
       .write(() -> "code").asEnum(TimeUnit.SECONDS)
      .write(() -> "price").float64(10.50);
System.out.println(bytes);

prints

message: Hello World
number: 1234567890
code: SECONDS
price: 10.5
// the same code as for text wire
wire2.write(() -> "message").text("Hello World")
        .write(() -> "number").int64(1234567890L)
        .write(() -> "code").asEnum(TimeUnit.SECONDS)
        .write(() -> "price").float64(10.50);
        System.out.println(bytes2.toHexString());

prints

00000000 C7 6D 65 73 73 61 67 65  EB 48 65 6C 6C 6F 20 57 ·message ·Hello W
00000010 6F 72 6C 64 C6 6E 75 6D  62 65 72 A3 D2 02 96 49 orld·num ber····I
00000020 C4 63 6F 64 65 E7 53 45  43 4F 4E 44 53 C5 70 72 ·code·SE CONDS·pr
00000030 69 63 65 90 00 00 28 41                          ice···(A

Using the RawWire strips away all the meta data to reduce the size of the message, and improve speed. The down side is that we cannot easily see what the message contains.

        // the same code as for text wire
        wire3.write(() -> "message").text("Hello World")
                .write(() -> "number").int64(1234567890L)
                .write(() -> "code").asEnum(TimeUnit.SECONDS)
                .write(() -> "price").float64(10.50);
        System.out.println(bytes3.toHexString());

prints in RawWire

00000000 0B 48 65 6C 6C 6F 20 57  6F 72 6C 64 D2 02 96 49 ·Hello W orld···I
00000010 00 00 00 00 07 53 45 43  4F 4E 44 53 00 00 00 00 ·····SEC ONDS····
00000020 00 00 25 40                                      ··%@

For more examples see Examples Chapter1

Binding to a field value

While serialized data can be updated by replacing a whole record, this might not be the most efficient option, nor thread safe. Wire offers the ability to bind a reference to a fixed value of a field and perform atomic operations on that field such as volatile read/write and compare-and-swap.

   // field to cache the location and object used to reference a field.
   private LongValueReference counter = null;

   // find the field and bind an approritae wrapper for the wire format.
   wire.read(COUNTER).int64(counter, x -> counter = x);

   // thread safe across processes on the same machine.
   long id = counter.getAndAdd(1);

Other types such as 32 bit integer values and an array of 64-bit integer values are supported.

Compression Options

  • no compression

  • Snappy compression (planned)

  • LZW compression (planned)

Bytes options

Wire is built on top of the Bytes library, however Bytes in turn can wrap

  • ByteBuffer - heap and direct

  • byte\[\] (via ByteBuffer)

  • raw memory addresses.

Uses

Wire will be used for:

  • file headers.

  • TCP connection headers where the optimal Wire format actually used can be negotiated.

  • message/excerpt contents.

  • the next version of Chronicle Queue.

  • the API for marshalling generated data types.

Similar projects

SBE

Simple Binary Encoding is designed to do what it says. It’s simple, it’s binary and it supports C++ and Java. It is designed to be a more efficient replacement for FIX. It is not limited to FIX protocols and can be easily extended by updating an XML schema.

XML, when it first started, didn’t use XML for it’s own schema files, and it’s not insignificant that SBE doesn’t use SBE for it’s schema either. This is because it is not trying to be human readable. It has XML which, though standard, isn’t designed to be particularly human readable either. Peter Lawrey thinks it’s a limitation that it doesn’t naturally lend itself to a human readable form.

The encoding SBE uses is similar to binary, with field numbers and fixed width types. SBE assumes the field types, which can be more compact than Wire’s most similar option (though not as compact as others).

SBE has support for schema changes provided the type of a field doesn’t change.

msgpack

Message Pack is a packed binary wire format which also supports JSON for human readability and compatibility. It has many similarities to the binary (and JSON) formats of this library. c.f. Wire is designed to be human readable first, based on YAML, and has a range of options to make it more efficient. The most extreme being fixed position binary.

Msgpack has support for embedded binary, whereas Wire has support for
   comments and hints to improve rendering for human consumption.

The documentation looks well thought out, and it is worth emulating.

Comparison with Cap’n’Proto

Feature

Wire Text

Wire Binary

Protobuf

Cap’n Proto

SBE

FlatBuffers

Schema evolution

yes

yes

yes

yes

caveats

yes

Zero-copy

yes

yes

no

yes

yes

yes

Random-access reads

yes

yes

no

yes

no

yes

Random-access writes

yes

yes

no

?

no

?

Safe against malicious input

yes

yes

yes

yes

yes

opt-in / upfront

Reflection / generic algorithms

yes

yes

yes

yes

yes

yes

Initialization order

any

any

any

any

preorder

bottom-up

Unknown field retention

yes

yes

yes

yes

no

no

Object-capability RPC system

yes

yes

no

yes

no

no

Schema language

no

no

custom

custom

XML

custom

Usable as mutable state

yes

yes

yes

no

no

no

Padding takes space on wire?

optional

optional

no

optional

yes

yes

Unset fields take space on wire?

optional

optional

no

yes

yes

no

Pointers take space on wire?

no

no

no

yes

no

yes

C++

planned

planned

yes

yes (C++11)*

yes

yes

Java

Java 8

Java 8

yes

yes*

yes

yes

C#

yes

yes

yes

yes*

yes

yes*

Go

no

no

yes

yes

no

yes*

Other languages

no

no

6+

others*

no

no

Authors' preferred use case

distributed computing

financial / trading

distributed computing

platforms / sandboxing

financial / trading

games

Note
The Binary YAML format can be automatically converted to YAML without any knowledge of the schema as the messages are self describing.
Note
You can parse all the expected fields (if any) and then parse any remaining fields. As YAML supports object field "names" or keys, these could be Strings or even Object as keys and values.

Note: It not clear what padding which doesn’t take up space on the wire means.

Design notes.

Schema evolution.

Wire optionally supports:

  • field name changes,

  • field order changes,

  • capturing or ignoring unexpected fields,

  • setting of fields to the default, if not available,

  • raw messages can be longer or shorter than expected.

The more flexibility, the larger the overhead in terms of CPU and memory. Wire allows you to dynamically pick the optimal configuration and convert between these options.

Zero copy.

Wire supports zero copy random access to fields and direct copy from in memory to the network. It also support translation from one wire format to another e.g. switching between fixed length data and variable length data.

Random Access.

You can access a random field in memory e.g. in 2 TB file, page in/pull into CPU cache, only the data relating to you read or write.

format access style

fixed length binary

random access without parsing first

variable length binary

random access with partial parsing. i.e. you can skip large portions

fixed length text

random access with parsing

variable length text

no random access

Wire References are relative to the start of the data contained, to allow loading in an arbitrary point in memory.

Safe against malicious input.

Wire has built in tiers of bounds checks to prevent accidental read/writing corrupting the data. It is not complete enough for a security review.

Reflection / generic algorithms.

Wire supports generic reading and writing of an arbitrary stream. This can be used in combination with predetermined fields. e.g. you can read the fields you know about and ask it to provide the fields you didn’t. You can also give generic field names like keys to a map as YAML does.

Initialization order.

Wire can handle unknown information like lengths by using padding.
   It will go back and fill in any data which it wasn't aware of as it was writing the data.
   e.g. when it writes an object it doesn't know how long it is going to be so it adds padding at the start.
   Once the object has been written it goes back and overwrites the length.
   It can also hand cases where the length was more than needed- known as packing.

Unknown field retention?

Wire can handle reading data it didn’t expect interspersed with data it did expect. Rather than specify the expected field name, a StringBuilder is provided.

Note: there are times when you want to skip/copy an entire field or message without reading any more of it. This is also supported.

Object-maximumLimit RPC system.

Wire supports references based on a name, number or UUID. This is useful when including a reference to an object the reader should look up via other means.

  • A common case, if when you have a proxy to a remote object and you want to pass or return this in an RPC call.

Schema language

Wire’s schema is not externalised from the code, however it is planned to use YAML in a format it can parse.

Usable as mutable state

Wire supports storing an application’s internal state. This will not allow it to grow or shrink. You can’t free any of it without copying the pieces you need and discarding the original copy.

Padding takes space on the wire.

The Wire format chosen determines if there is any padding on the wire. If you copy the in memory data directly, it’s format doesn’t change. If you want to drop padding you can copy the message to a wire format without padding. You can decide whether the original padding is to be preserved or not if turned back into a format with padding.

We could look at supporting Cap’n’Proto’s zero byte removal compression.

Unset fields take space on the wire?

Wire supports fields with and without optional fields and automatic means of removing them. It doesn’t support automatically adding them back in, as information has been lost.

Pointers take space on the wire.

Wire doesn’t have pointer but it does have content lengths which are a useful hint for random access and robustness, but these are optional.

Platform support

Wire is Java 8 only for now. Future version may support Java 6, C++ and C\#