Reasons to use protocol buffers instead of JSON

[+] falcolas|11 years ago|reply

> Schemas Are Awesome

No reason you can't implement schemas over JSON. In fact, you typically implicitly do - what your code is expecting to be present in the data structures deserialized from JSON.

> Backward Compatibility For Free

JSON is unversioned, so you can add and remove fields as you wish.

> Less Boilerplate Code

How much boilerplate is there in parsing JSON? I know in Python, it's:

    structure = json.loads(json_string)

Now then, if you want to implement all kinds of type checking and field checking on the front end, you're always welcome to, but allowing "get attribute" exceptions to bubble up and signal a bad data structure have always appealed to me more. I'm writing in Python/Ruby/Javascript to avoid rigid datastructures and boilerplate in the first place most times.

[EDIT] And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.

> Validations and Extensibility

Same as previous comment about type checking, etc.

> Easy Language Interoperability

Even easier: JSON!

And you don't have to learn yet another DSL, and compile those down into lots of boilerplate!

I'm not trying to say that you shouldn't use Protocol Buffers if its a good fit for your software, but this list is a bit anemic on real reasons to use them, particularly for dynamically typed languages.

[+] theseoafs|11 years ago|reply

> No reason you can't implement schemas over JSON. In fact, you typically implicitly do

Right, that's the point -- since in normal use you impose a set of "schema" requirements over all data interchange formats, even schemaless ones, it's a strictly good thing to have that schema explicitly written out. It means the compiler can verify your types are correct and the runtime can verify your messages have all the fields they'll need.

> JSON is unversioned, so you can add and remove fields as you wish.

Sure, but if you do, you have to handle the version-management code at the application level, manually, where it's really easy to make mistakes.

> And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.

Sure, and if you're going to do that, you might as well use protobufs which is going to be much faster/more lightweight.

[+] Arkadir|11 years ago|reply

> structure = json.loads(json_string)

Not quite. Let's say your JSON data contains the following attribute:

    "access" : [ "view", "edit", "admin" ],

This field should be represented (in the language) as a set of values from an "access-levels" enumeration.

In C#, you'd have the following boilerplate:

    [DataMember(Name = "access")]
    public HashSet<AccessLevels> Access { get; private set; }

In OCaml, it would be:

    access : Access.Set.t ;

The simple "json.loads" solution would return a list of strings instead. What's the Python code for turning it into a set of enumeration values, and failing if one of the values does not match ?

[+] wunki|11 years ago|reply

Or, another alternative is Cap'n Proto [1] from the primary author of Protocol Buffers v2. It smooths some of the bumps of protocol buffers.

[1]: http://kentonv.github.io/capnproto/

[+] ntoshev|11 years ago|reply

Came here to write this. Promise pipelining is an especially interesting attempt to solve latency in RPC (although it doesn't always work).

[+] Arkadir|11 years ago|reply

Easy language interoperability as a reason to choose Protobuf over JSON ? Mainstream languages support both JSON and Protobuf equally well, and the others tend to support JSON more often than Protobuf.

Free backwards compatibility ? No. Numbered fields are a good thing, but they only help in the narrow situation where your "breaking change" consists in adding a new, optional piece of data (a situation that JSON handles as well). New required fields ? New representation of old data ? You'll need to write code to handle these cases anyway.

As for the other points, they are a matter of libraries (things that the Protobuf gems support and the JSON gems don't) instead of protocol --- the OCaml-JSON parser I use certainly has benefits #1 (schemas), #3 (less boilerplate) and #4 (validation) from the article.

There is, of course, the matter of bandwidth. I personally believe there are few cases where it is worth sacrificing human-readability over, especially for HTTP-based APIs, and especially for those that are accessed from a browser.

I would recommend gzipped msgpack as an alternative to JSON if reducing the memory footprint is what you want: encoding JSON as msgpack is trivial by design.

[+] CJefferson|11 years ago|reply

Reasons not to use protocol buffers (in C++ at least):

    1) Doesn't support Visual Studio 2013.
    2) Doesn't support Mac OS X Mavericks.
    3) No "nice" support C++11 (i.e. move constructors)

(These can be at least partly solved by running off svn head, but that doesn't seem like a good idea for a product one wants to be stable)

With JSON I can be sure there will be many libraries which will work on whatever system I use.

[+] nly|11 years ago|reply

All 3 of your points are the fault of Protobufs being an almost stagnant project. Nothing significant has changed for a number of years. You only have to look at the public repo commit log to see Protobufs may as well be a release tarball only distribution[0], with already infrequent releases[1].

[0] https://code.google.com/p/protobuf/source/list

[1] https://code.google.com/p/protobuf/downloads/list

[+] TillE|11 years ago|reply

> Doesn't support Visual Studio 2013

That's a really tiny issue which just requires including another header.

https://code.google.com/p/protobuf/issues/detail?id=531

[+] ardit33|11 years ago|reply

We at Spotify use them extensively and are actually moving away from Protobufs, which we consider as 'legacy'. The advantages of Protobufs don't make up for its disadvantages over plain JSON. With JSON you have universal support, simple parsing, developer and debug friendly, much easier to mock, etc etc.

[+] AYBABTME|11 years ago|reply

I think the main advantages are:

    - network bandwidth/latency: smaller RPC consume less 
      space, are received and responded to faster.
    - memory usage: less data is read and processed while      
      encoding or decoding protobuf.
    - time: haven't actually benchmarked this one, but I 
      assume CPU time spent decoding/encoding will be 
      smaller since you don't need to go from ASCII to 
      binary.

Which means, all performance improvements. They come, as usual, at the cost of simplicity and ease of debugging.

[+] al2o3cr|11 years ago|reply

"I assume X is more performant ... I haven't actually benchmarked it"

I believe that's usually spelled "I am making this up".

[+] gldalmaso|11 years ago|reply

> When Is JSON A Better Fit? > Data from the service is directly consumed by a web browser

This seems to me like a key issue, you need to really know beforehand that this won't ever be the case, else you need to make your application polyglot afterwards. A risky bet for any business data service.

Maybe if it's strictly infrastructure glue type internal service. But even then, maybe someone will come along wanting to monitor this thing on the browser.

[+] tbrownaw|11 years ago|reply

I thought the reasoning behind JSON was that Javascript has a parser built in (eval), rather than it's incapable of being used to parse other formats?

[+] dangoldin|11 years ago|reply

If it's too big of a change you can "proxy" it through a simple ProtoBuf to JSON converter.

[+] znt|11 years ago|reply

Not so sure about "backwards compatibility" part.

From Protocol buffer python doc: https://developers.google.com/protocol-buffers/docs/pythontu...

"Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal."

So basically I will be in trouble if I decide to get rid of some fields which are not necessary, but somehow were defined as "required" in the past.

This will potentially result in bloated protobuf definitions that have a bunch of legacy fields.

I will stick to the JSON, thanks.

[+] gohrt|11 years ago|reply

That argument never made sense:

" old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally."

That means the old readers -- the ones that are expecting required fields, can't accept new messages. That's good! The readers don't know how to read the new messages! The readers need to be updated to a new version before they can correctly start reading new version of the schema.

[+] yongjik|11 years ago|reply

If your protobufs (and their producer/consumers) are short-living, just remove the "required" field from your servers and clients at the same time. The "required is forever" problem persists only if you keep files with serialized protobuf data (or legacy binaries) and cannot reasonably update them at the same time.

Of course, once something hits production, you really cannot reasonably update them all at the same time. That's why the proto API has:

https://developers.google.com/protocol-buffers/docs/referenc...

    bool ParseFromString(const string & data)
      Parse a protocol buffer contained in a string.
    bool ParsePartialFromString(const string & data)
      Like ParseFromString(), but accepts messages that are missing required fields.

Or simply use "optional" instead as others suggested.

Also, even with these "bloated" definitions with legacy unused fields, it's probably still smaller than JSON.

[+] jamesaguilar|11 years ago|reply

This is a non-criticism. Just use optional for everything, or make sure that all your clients and servers have upgraded. If you had a JSON library that supported required semantics, you'd have exactly the same issue on your hands.

[+] Locke1689|11 years ago|reply

Just make everything optional (like suggested) if you're worried about it.

[+] nostrademons|11 years ago|reply

In JSON all fields are optional and you're forced to write application-specific custom validation routines anyway, so your semantics are exactly as if you had marked them all "optional" in protobufs.

[+] isbadawi|11 years ago|reply

If you don't currently have protobuf definitions, you can just not use required fields in any new ones you write, like the quoted passage suggests.

[+] pling|11 years ago|reply

This is a fine reason to use protocol buffers instead of JSON:

   [ 4738723874747487387273747838727347383827238734.00 ]

Parsing that universally is a shit.

[+] KaiserPro|11 years ago|reply

The biggest thing is that its smaller over the wire.

however if schemas scare you (shame on you if they do) then msgpack might be a better choice.

[+] jacob019|11 years ago|reply

Definitely the right direction for performance. My company ended up going with python-gevent and zeromq to implement an asynchronous API server with persistent TCP connections. Our application servers are able make remote calls over a persistent tcp connection without any noticeable overhead. You could still use JSON, and we tried it--but since we're all python anyway we decided to just pickle the objects which is way faster. We looked at protocol buffers, but found it to be a bit cumbersome. It's been stable for two years and completely solved our scaling problems.

[+] tieTYT|11 years ago|reply

> There do remain times when JSON is a better fit than something like Protocol Buffers, including situations where:

> * You need or want data to be human readable

When things "don't work" don't you always want this feature? Over a long lifetime, this could really reduce your debugging costs. Perhaps protocol buffers has a "human readable mode". If not, it seems like a risk to use it.

[+] kentonv|11 years ago|reply

You can run the binary format through `protoc --decode` to get text, or use the `toString()` method in your code. It's an extra step, but for the 99.99% of queries that no human looks at, it saves a lot of CPU.

[+] redthrowaway|11 years ago|reply

...in Ruby. For some applications.

With Node, I'd have to see a very good argument for why I should give up all of JSON's great features for the vast majority of services. Unless the data itself needs to be binary, I see no reason why I shouldn't use the easy, standard, well-supported, nice-to-work-with JSON.

[+] kybernetikos|11 years ago|reply

In javascript, JSON is also much faster to deserialize than protobuffers.

[+] jweir|11 years ago|reply

Anyone tried ProtoBuf.js "Protocol Buffers for JavaScript." on the client side?

https://github.com/dcodeIO/ProtoBuf.js

[+] kybernetikos|11 years ago|reply

Yes. It's a decent implementation. However, binary data access (particularly DataView which is used by protobufjs) is embarrassingly slow in all browsers at the moment, so JSON (possibly gzipped if you're worried about bandwidth) is much better if you're going to be mainly targetting browsers.

[+] jayvanguard|11 years ago|reply

What a huge step backwards. We had decades of binary protocols. They sucked. Then everyone moved to non-binary protocols and it was much better. Let's not do it all over again.

[+] kentonv|11 years ago|reply

Don't forget about XML, MIME, SGML, SOAP... We've had decades of protocols both binary and text, and they've all sucked.

The thing it seems we've learned more recently is that the more features and complication you add to a protocol, the more it sucks. At the end of they day your data is composed of primitives, records, and lists, and if your protocol offers to structure things in any other way, it's just creating confusion. This is why JSON beats XML, and Protobufs beats many of its binary predecessors.

[+] nly|11 years ago|reply

I'd rather use Avro. The binary encoding is more dense and there's an officially supported JSON encoding (the text encoding for Protobufs is mostly intended for debugging)

[+] pkandathil|11 years ago|reply

When you convert an object from language X to JSON, validate it using a schema validation before deserializing, then is it not the same as JSON. Also now with JSON you have the opportunity to have human readable data which is great when debugging issues. I am not seeing the advantage of protocol buffers. It would be great if you can compare payload sizes and see if there is a significant savings from that perspective.

[+] mrinterweb|11 years ago|reply

Personally, I prefer Thrift over protocol buffers. I'm surprised Thrift wasn't mentioned.

[+] don_draper|11 years ago|reply

"Reason #3: Less Boilerplate Code"

Anyone who has used Avro in Java knows that this is not true.

[+] nostrademons|11 years ago|reply

Curious what this has to do with protocol buffers?

[+] nly|11 years ago|reply

Avro code is generated though?

[+] AnimalMuppet|11 years ago|reply

Use protocol buffers because you get versioning? How hard would it be to add a version number to your JSON struct?

(Answer: Trivial.)

57 comments