No reason you can't implement schemas over JSON. In fact, you typically implicitly do - what your code is expecting to be present in the data structures deserialized from JSON.
> Backward Compatibility For Free
JSON is unversioned, so you can add and remove fields as you wish.
> Less Boilerplate Code
How much boilerplate is there in parsing JSON? I know in Python, it's:
structure = json.loads(json_string)
Now then, if you want to implement all kinds of type checking and field checking on the front end, you're always welcome to, but allowing "get attribute" exceptions to bubble up and signal a bad data structure have always appealed to me more. I'm writing in Python/Ruby/Javascript to avoid rigid datastructures and boilerplate in the first place most times.
[EDIT] And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.
> Validations and Extensibility
Same as previous comment about type checking, etc.
> Easy Language Interoperability
Even easier: JSON!
And you don't have to learn yet another DSL, and compile those down into lots of boilerplate!
I'm not trying to say that you shouldn't use Protocol Buffers if its a good fit for your software, but this list is a bit anemic on real reasons to use them, particularly for dynamically typed languages.
> No reason you can't implement schemas over JSON. In fact, you typically implicitly do
Right, that's the point -- since in normal use you impose a set of "schema" requirements over all data interchange formats, even schemaless ones, it's a strictly good thing to have that schema explicitly written out. It means the compiler can verify your types are correct and the runtime can verify your messages have all the fields they'll need.
> JSON is unversioned, so you can add and remove fields as you wish.
Sure, but if you do, you have to handle the version-management code at the application level, manually, where it's really easy to make mistakes.
> And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.
Sure, and if you're going to do that, you might as well use protobufs which is going to be much faster/more lightweight.
The simple "json.loads" solution would return a list of strings instead. What's the Python code for turning it into a set of enumeration values, and failing if one of the values does not match ?
Easy language interoperability as a reason to choose Protobuf over JSON ? Mainstream languages support both JSON and Protobuf equally well, and the others tend to support JSON more often than Protobuf.
Free backwards compatibility ? No. Numbered fields are a good thing, but they only help in the narrow situation where your "breaking change" consists in adding a new, optional piece of data (a situation that JSON handles as well). New required fields ? New representation of old data ? You'll need to write code to handle these cases anyway.
As for the other points, they are a matter of libraries (things that the Protobuf gems support and the JSON gems don't) instead of protocol --- the OCaml-JSON parser I use certainly has benefits #1 (schemas), #3 (less boilerplate) and #4 (validation) from the article.
There is, of course, the matter of bandwidth. I personally believe there are few cases where it is worth sacrificing human-readability over, especially for HTTP-based APIs, and especially for those that are accessed from a browser.
I would recommend gzipped msgpack as an alternative to JSON if reducing the memory footprint is what you want: encoding JSON as msgpack is trivial by design.
All 3 of your points are the fault of Protobufs being an almost stagnant project. Nothing significant has changed for a number of years. You only have to look at the public repo commit log to see Protobufs may as well be a release tarball only distribution[0], with already infrequent releases[1].
We at Spotify use them extensively and are actually moving away from Protobufs, which we consider as 'legacy'. The advantages of Protobufs don't make up for its disadvantages over plain JSON.
With JSON you have universal support, simple parsing, developer and debug friendly, much easier to mock, etc etc.
- network bandwidth/latency: smaller RPC consume less
space, are received and responded to faster.
- memory usage: less data is read and processed while
encoding or decoding protobuf.
- time: haven't actually benchmarked this one, but I
assume CPU time spent decoding/encoding will be
smaller since you don't need to go from ASCII to
binary.
Which means, all performance improvements. They come, as usual, at the cost of simplicity and ease of debugging.
> When Is JSON A Better Fit?
> Data from the service is directly consumed by a web browser
This seems to me like a key issue, you need to really know beforehand that this won't ever be the case, else you need to make your application polyglot afterwards. A risky bet for any business data service.
Maybe if it's strictly infrastructure glue type internal service. But even then, maybe someone will come along wanting to monitor this thing on the browser.
"Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal."
So basically I will be in trouble if I decide to get rid of some fields which are not necessary, but somehow were defined as "required" in the past.
This will potentially result in bloated protobuf definitions that have a bunch of legacy fields.
" old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally."
That means the old readers -- the ones that are expecting required fields, can't accept new messages. That's good! The readers don't know how to read the new messages! The readers need to be updated to a new version before they can correctly start reading new version of the schema.
If your protobufs (and their producer/consumers) are short-living, just remove the "required" field from your servers and clients at the same time. The "required is forever" problem persists only if you keep files with serialized protobuf data (or legacy binaries) and cannot reasonably update them at the same time.
Of course, once something hits production, you really cannot reasonably update them all at the same time. That's why the proto API has:
bool ParseFromString(const string & data)
Parse a protocol buffer contained in a string.
bool ParsePartialFromString(const string & data)
Like ParseFromString(), but accepts messages that are missing required fields.
Or simply use "optional" instead as others suggested.
Also, even with these "bloated" definitions with legacy unused fields, it's probably still smaller than JSON.
This is a non-criticism. Just use optional for everything, or make sure that all your clients and servers have upgraded. If you had a JSON library that supported required semantics, you'd have exactly the same issue on your hands.
In JSON all fields are optional and you're forced to write application-specific custom validation routines anyway, so your semantics are exactly as if you had marked them all "optional" in protobufs.
Definitely the right direction for performance. My company ended up going with python-gevent and zeromq to implement an asynchronous API server with persistent TCP connections. Our application servers are able make remote calls over a persistent tcp connection without any noticeable overhead. You could still use JSON, and we tried it--but since we're all python anyway we decided to just pickle the objects which is way faster. We looked at protocol buffers, but found it to be a bit cumbersome. It's been stable for two years and completely solved our scaling problems.
> There do remain times when JSON is a better fit than something like Protocol Buffers, including situations where:
> * You need or want data to be human readable
When things "don't work" don't you always want this feature? Over a long lifetime, this could really reduce your debugging costs. Perhaps protocol buffers has a "human readable mode". If not, it seems like a risk to use it.
You can run the binary format through `protoc --decode` to get text, or use the `toString()` method in your code. It's an extra step, but for the 99.99% of queries that no human looks at, it saves a lot of CPU.
With Node, I'd have to see a very good argument for why I should give up all of JSON's great features for the vast majority of services. Unless the data itself needs to be binary, I see no reason why I shouldn't use the easy, standard, well-supported, nice-to-work-with JSON.
Yes. It's a decent implementation. However, binary data access (particularly DataView which is used by protobufjs) is embarrassingly slow in all browsers at the moment, so JSON (possibly gzipped if you're worried about bandwidth) is much better if you're going to be mainly targetting browsers.
What a huge step backwards. We had decades of binary protocols. They sucked. Then everyone moved to non-binary protocols and it was much better. Let's not do it all over again.
Don't forget about XML, MIME, SGML, SOAP... We've had decades of protocols both binary and text, and they've all sucked.
The thing it seems we've learned more recently is that the more features and complication you add to a protocol, the more it sucks. At the end of they day your data is composed of primitives, records, and lists, and if your protocol offers to structure things in any other way, it's just creating confusion. This is why JSON beats XML, and Protobufs beats many of its binary predecessors.
I'd rather use Avro. The binary encoding is more dense and there's an officially supported JSON encoding (the text encoding for Protobufs is mostly intended for debugging)
When you convert an object from language X to JSON, validate it using a schema validation before deserializing, then is it not the same as JSON. Also now with JSON you have the opportunity to have human readable data which is great when debugging issues. I am not seeing the advantage of protocol buffers. It would be great if you can compare payload sizes and see if there is a significant savings from that perspective.
[+] [-] falcolas|11 years ago|reply
No reason you can't implement schemas over JSON. In fact, you typically implicitly do - what your code is expecting to be present in the data structures deserialized from JSON.
> Backward Compatibility For Free
JSON is unversioned, so you can add and remove fields as you wish.
> Less Boilerplate Code
How much boilerplate is there in parsing JSON? I know in Python, it's:
Now then, if you want to implement all kinds of type checking and field checking on the front end, you're always welcome to, but allowing "get attribute" exceptions to bubble up and signal a bad data structure have always appealed to me more. I'm writing in Python/Ruby/Javascript to avoid rigid datastructures and boilerplate in the first place most times.[EDIT] And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.
> Validations and Extensibility
Same as previous comment about type checking, etc.
> Easy Language Interoperability
Even easier: JSON!
And you don't have to learn yet another DSL, and compile those down into lots of boilerplate!
I'm not trying to say that you shouldn't use Protocol Buffers if its a good fit for your software, but this list is a bit anemic on real reasons to use them, particularly for dynamically typed languages.
[+] [-] theseoafs|11 years ago|reply
Right, that's the point -- since in normal use you impose a set of "schema" requirements over all data interchange formats, even schemaless ones, it's a strictly good thing to have that schema explicitly written out. It means the compiler can verify your types are correct and the runtime can verify your messages have all the fields they'll need.
> JSON is unversioned, so you can add and remove fields as you wish.
Sure, but if you do, you have to handle the version-management code at the application level, manually, where it's really easy to make mistakes.
> And for languages where type safety is in place, the JSON libraries frequently allow you to pre-define the data structure which the JSON will attempt to parse into, giving type safety & a well define schema for very little additional overhead as well.
Sure, and if you're going to do that, you might as well use protobufs which is going to be much faster/more lightweight.
[+] [-] Arkadir|11 years ago|reply
Not quite. Let's say your JSON data contains the following attribute:
This field should be represented (in the language) as a set of values from an "access-levels" enumeration.In C#, you'd have the following boilerplate:
In OCaml, it would be: The simple "json.loads" solution would return a list of strings instead. What's the Python code for turning it into a set of enumeration values, and failing if one of the values does not match ?[+] [-] wunki|11 years ago|reply
[1]: http://kentonv.github.io/capnproto/
[+] [-] ntoshev|11 years ago|reply
[+] [-] Arkadir|11 years ago|reply
Free backwards compatibility ? No. Numbered fields are a good thing, but they only help in the narrow situation where your "breaking change" consists in adding a new, optional piece of data (a situation that JSON handles as well). New required fields ? New representation of old data ? You'll need to write code to handle these cases anyway.
As for the other points, they are a matter of libraries (things that the Protobuf gems support and the JSON gems don't) instead of protocol --- the OCaml-JSON parser I use certainly has benefits #1 (schemas), #3 (less boilerplate) and #4 (validation) from the article.
There is, of course, the matter of bandwidth. I personally believe there are few cases where it is worth sacrificing human-readability over, especially for HTTP-based APIs, and especially for those that are accessed from a browser.
I would recommend gzipped msgpack as an alternative to JSON if reducing the memory footprint is what you want: encoding JSON as msgpack is trivial by design.
[+] [-] CJefferson|11 years ago|reply
With JSON I can be sure there will be many libraries which will work on whatever system I use.
[+] [-] nly|11 years ago|reply
[0] https://code.google.com/p/protobuf/source/list
[1] https://code.google.com/p/protobuf/downloads/list
[+] [-] TillE|11 years ago|reply
That's a really tiny issue which just requires including another header.
https://code.google.com/p/protobuf/issues/detail?id=531
[+] [-] ardit33|11 years ago|reply
[+] [-] AYBABTME|11 years ago|reply
[+] [-] al2o3cr|11 years ago|reply
I believe that's usually spelled "I am making this up".
[+] [-] gldalmaso|11 years ago|reply
This seems to me like a key issue, you need to really know beforehand that this won't ever be the case, else you need to make your application polyglot afterwards. A risky bet for any business data service.
Maybe if it's strictly infrastructure glue type internal service. But even then, maybe someone will come along wanting to monitor this thing on the browser.
[+] [-] tbrownaw|11 years ago|reply
[+] [-] dangoldin|11 years ago|reply
[+] [-] znt|11 years ago|reply
From Protocol buffer python doc: https://developers.google.com/protocol-buffers/docs/pythontu...
"Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal."
So basically I will be in trouble if I decide to get rid of some fields which are not necessary, but somehow were defined as "required" in the past.
This will potentially result in bloated protobuf definitions that have a bunch of legacy fields.
I will stick to the JSON, thanks.
[+] [-] gohrt|11 years ago|reply
" old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally."
That means the old readers -- the ones that are expecting required fields, can't accept new messages. That's good! The readers don't know how to read the new messages! The readers need to be updated to a new version before they can correctly start reading new version of the schema.
[+] [-] yongjik|11 years ago|reply
Of course, once something hits production, you really cannot reasonably update them all at the same time. That's why the proto API has:
https://developers.google.com/protocol-buffers/docs/referenc...
Or simply use "optional" instead as others suggested.Also, even with these "bloated" definitions with legacy unused fields, it's probably still smaller than JSON.
[+] [-] jamesaguilar|11 years ago|reply
[+] [-] Locke1689|11 years ago|reply
[+] [-] nostrademons|11 years ago|reply
[+] [-] isbadawi|11 years ago|reply
[+] [-] pling|11 years ago|reply
[+] [-] KaiserPro|11 years ago|reply
however if schemas scare you (shame on you if they do) then msgpack might be a better choice.
[+] [-] jacob019|11 years ago|reply
[+] [-] tieTYT|11 years ago|reply
> * You need or want data to be human readable
When things "don't work" don't you always want this feature? Over a long lifetime, this could really reduce your debugging costs. Perhaps protocol buffers has a "human readable mode". If not, it seems like a risk to use it.
[+] [-] kentonv|11 years ago|reply
[+] [-] redthrowaway|11 years ago|reply
With Node, I'd have to see a very good argument for why I should give up all of JSON's great features for the vast majority of services. Unless the data itself needs to be binary, I see no reason why I shouldn't use the easy, standard, well-supported, nice-to-work-with JSON.
[+] [-] kybernetikos|11 years ago|reply
[+] [-] jweir|11 years ago|reply
https://github.com/dcodeIO/ProtoBuf.js
[+] [-] kybernetikos|11 years ago|reply
[+] [-] jayvanguard|11 years ago|reply
[+] [-] kentonv|11 years ago|reply
The thing it seems we've learned more recently is that the more features and complication you add to a protocol, the more it sucks. At the end of they day your data is composed of primitives, records, and lists, and if your protocol offers to structure things in any other way, it's just creating confusion. This is why JSON beats XML, and Protobufs beats many of its binary predecessors.
[+] [-] nly|11 years ago|reply
[+] [-] pkandathil|11 years ago|reply
[+] [-] mrinterweb|11 years ago|reply
[+] [-] don_draper|11 years ago|reply
Anyone who has used Avro in Java knows that this is not true.
[+] [-] nostrademons|11 years ago|reply
[+] [-] nly|11 years ago|reply
[+] [-] AnimalMuppet|11 years ago|reply
(Answer: Trivial.)