Liar Machines

Liar Machines

Basic Wire Formats Design

We're always in a distributed system, might as well be efficient about our messages.

Scott Bruce's avatar
Scott Bruce
Aug 01, 2025
∙ Paid

When working with distributed computing systems, which we usually are, having the skill to design compact, expressive and precise wire formats for sending messages within that system is essential to getting good engineering outcomes. What precisely is a wire format, how do you design it, and how do you increase your skills such that you get better engineering outcomes?

What is the essential characteristic of a wire format? It must be a full and deterministic representation, represented in a simple byte array. If you send a message from computer A to computer B, both computers reconstruct the exact same information when the serialize->send->deserialize process is complete. Beyond the essence, there are many interesting ways to apply design skill when creating wire formats. Let’s consider a concrete example to motivate our discussions.

Our example is a simple JSON message that is roughly the shape of a request you might find in any computing system. In practice they tend to have more fields, and possibly tree structure, but this example is illustrative.

{
 "username" : "exampleusername"
 "useruuid" : "9c5b94b1-35ad-49bb-b118-8e8fc24abf80",
 "enum" : "THIS_IS_A_LONG_ENUM",
 "bignumber" : 36028797018963968,
 "pi" : 3.141592653589793
 "boolean": true,
 "missingfield": null,
}

Here we have a username, a uuid, an enum, a “big” number (2^55), pi (a floating point number), a boolean condition, and a field that isn’t present. We’ve written our example in JSON because it is pretty easy for a human to think they understand the meaning of such a message. As we all know, the JSON wire representation is just the string that encodes it. If we count the bytes in that string (obviously removing whitespace first), we find it takes 196 bytes. We’re transmitting a string, a UUID, an enum, an integer (maybe?), a floating point number (of some accuracy), a boolean, and a missing field.

Note that JSON does not specify anything about numbers! The sender and receiver have to just agree on what type the number is! An incredibly common error is for JSON endpoints to get this wrong. I have seen an enormous number of obvious and subtle bugs related to not knowing if a number is a floating point or integral type! This is part of why a human reads a JSON message and only thinks they know what it means. The actual meaning of a message whose wire format is JSON is not obvious.

It is also not obvious to many engineers that a given wire message can be rearranged and still have the same semantics! Any particular way an object is written to wire format isn’t guaranteed to be canonical. Writing the fields of a JSON object in different order maintains its meaning but results in naive comparisons (correctly) returning the objects are not equal.

So what’s the absolutely smallest number of bytes this JSON message could be sent in? If we assume “bignumber” is an integer with 64 bits, “pi” is a double precision floating point number (64 bits), the enum will fit in 1 byte, and we encode the 128 bit UUID in 128 bits instead of the 312 bits it currently takes, we see the absolute minimum is 49 bytes plus 1 bit for the boolean.

Four times larger! That is a lot of space and time to pay for a human to read a message and still not understand what that message means!

What’s the smallest we could make that JSON? Well, we could make each key exactly 1 character and drop the null field:

{"u":"exampleusername""i":"9c5b94b1-35ad-49bb-b118-8e8fc24abf80","e":"THIS_IS_A_LONG_ENUM","b":36028797018963968,"p":3.141592653589793"c":true}

This gets us down to 145 bytes, a factor of about three. Now we have something that is much harder for humans to read, but encodes the same meaning. Changing your enums to be one character will get you to 127 bytes / 2.6x larger. Now we’ve got something humans can’t examine to really understand what is going on at all, and we’re still far larger than the minimum.

Having your data format be about a factor of three times larger than minimum for a common, normal case seems bad! In engineering, we mostly expect to be able to get “close” to our minimum cost calculations, and three times larger is not close. Note that in some conditions we end up far worse! Lots of booleans or enums or repeated strings in a message pushes it 10x or more larger! We remember that processing a message requires at least time and memory proportional to the message’s size on the wire, so every computation we ever do against this data will be something like four times too costly.

Without showing the work (yet), a compact binary format like Protocol Buffers will encode the same message in 58 bytes, which is 1.18x. 18% larger than minimum seems pretty good compared to a factor of three!

We have our motivating examples! 145 bytes human readable, 58 bytes compact binary, 49.125 bytes minimal. Something that costs a factor of three too much is something we should pay attention to! Imagine building a whole distributed computing system in a format that requires three times the local memory, three times the bandwidth, and three times the logs processing time! Each mobile phone will have to send much larger messages, or burn local energy to compress messages to try to reduce the impact! Your database machines get overloaded much faster, across a larger number of queries! This requires time and attention to debug and money to buy more compute! Many computational workloads scale horizontally, but delaying when that work must be done is very advantageous! Larger messages make message sending and processing overall more fragile, so during high demand times you are more likely to get a critical breakage. Critical breakages lose users, customers and clients. Attempting to improve such a system by bottleneck improvement ignores the underlying inefficiency. This is particularly problematic: it feels like progress is made but sufficiently useful results are not obtained. As an analogy, imagine the cost and time difference of carrying a backpack vs putting a backpack in a shipping container and then moving that! If you improve things by putting it in a smaller shipping container, the outcome is still bad!

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Scott Bruce · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture