It wouldn't be fun if you didn't re-invent your own wheel but this time did it with the features you want instead of the ones that are almost-identical-but-you-didn't-invent!
On one of these pages, the claim is made that Protobufs is not self-describing, and therefore cannot be used for "network applications". It seems that "self-describing" here means that the format includes key names, instead of compressing them by using numbers like protobufs does. I can't understand why having field names is going to make a difference for anyone. Once you are setting up a system to deal with a specific format of data, why not just include a protobufs schema?
Here's a use case where protobuf is terrible because it isn't self-describing: write a wireshark plugin which parses and pretty-prints protobuf messages for human consumption.
You can't, because such a plugin would have to have a-priori knowledge of the schema in use.
We have never claimed that Protobuf could not be used for network applications. We have claimed that Protobuf is a bad choice for messages have have to be routable by intermediaries that do not know the schema of a Protobuf message. Where does one Protobuf message end and another begin?
Additionally, Protobuf is not good at encoding raw bytes - according to their own words.
It's nice to not have to assume the client and server are running the exact same version of the protocol. If you use ordinal numbers instead of names, you can never remove or reorder things without completely breaking backward compatibility. You can only append new fields to the end.
Binary format makes me believe it's not human-readable. How doesn't this compare in size to gzipped JSON? JSON overhead is fairly small (some quotes, colons, brackets and keys) - it's no XML.
Yes, binary formats are not easily readable in a text editor. But, it is actually possible to convert ION to an XML format and back again without loss of information (we have not implemented this yet). This should make it easier to read messages during debug - especially because you don't need to know the schema for the given message to conver it to XML.
Regarding GZipped JSON, it is true that GZiped JSON is small. But, due to the CRIME and BREACH attacks it is not recommended to compress data sent over encrypted connections (TLS).
If you look at our performance benchmarks page you can see a list of serialized length comparisons. As you can see, as soon as you send a few objects in an ION table, the difference is big. More than what you normally can gain with GZip (except perhaps for String).http://tutorials.jenkov.com/iap/ion-performance-benchmarks.h...
Furthermore, GZip only helps with transfer time, and actually slows down parsing time. If you look at our performance benchmarks you will see that ION parsing time is a lot faster than JSON. Additionally, if you really, really want high speed you do not parse ION (or JSON) into Java objects. You process the data directly in its binary form. If you look at our read-and-use benchmark you can see just how big a speed difference that gives. ION is designed for being processed directly. JSON isn't as good for that purpose.
Finally, ION is designed for fast arbitrary hierarchical navigation. JSON is not.
One thing that I like about CBOR is that with very little knowledge it's surprisingly readable in a hex-dump. Low value positive integers are the value of the byte itself, strings all have the form "0x6L [string]" or "0x7X [len] [string]". Arrays and maps are similarly obvious.
Of course, anything more than a simple construct you're better off using a decoder (e.g. a Wireshark one).
Also, the fact that it's compatible with JSON means that you can use JSON in your development, and then switch to CBOR at the end for the reduction in packet size. In python it's as simple as changing:
> As you can see, an ION field can contain values that are up to 2^120 bytes long. If you need to encode larger blocks of data than that, you would need to break it up into multiple fields.
As the author of Protobuf v2 (the version that was open sourced by Google), I object to some of the "no"s in the protobuf column.
(Note: I no longer work on Protobuf, and I did not invent the format. I do work on and did invent Cap'n Proto.)
> Protobuf apparently isn't great at encoding raw bytes either (according to their own website).
Protobuf can handle raw bytes just fine, using the "bytes" type. There is no special encoding done on bytes; parsing and encoding is done by memcpy(). I'm curious to know what part of the web site you interpret as saying otherwise. It's entirely possible that the web site contains confusing language, but a citation would have been a good idea here.
> Schema / Class Id
> Self describing
The Protobuf libraries have extensive support for manipulating dynamic schemas and transmitting schemas over the wire. See the "Descriptor" and "DynamicMessage" APIs. This is mentioned on the web site:
> Even if these compact objects do not contain any property names, they are still self describing enough that you can see where fields start and end, plus their data type, without an external schema. You cannot do that with Protobuf (as far as we know).
You absolutely can do that with Protobuf. This is what the "protoc --decode_raw" flag does, and it should be clear enough from reading the encoding.
While it's true that Protobuf doesn't support these, I hope you've considered the denial-of-service vulnerabilities they tend to create if the receiver is not expecting them. Please ensure that cyclic references are only allowed in cases where the app opted into it.
Relatedly, overlapping references / backreferences ("Copy" in your table) potentially leads to an amplification attack where a small message on the wire turns out to be much, much larger when traversed. If applications cannot defend themselves from huge payloads by setting a message size limit, then you'll need to give them some other way.
> All of the formats (except perhaps Protobuf) supports arbitrary hierarchical navigation of the encoded data, without first converting it to objects.
Protobuf supports this, and in fact should be an unqualified "Yes" rather than "Yes(*)" like the others. Protobuf encoding is very similar to ION's. Sub-messages are length-delimited, which seems to be exactly the advantage you're claiming that ION has.
Note that none of these formats support random access in the way that Cap'n Proto does.
In summary, I believe Protobuf deserves a "yes" in: "Raw bytes", "Good at raw bytes", "Schema / Class Id", "Arbitrary hierarchical navigation", and "Self describing".
If that is really true, then we will of course update the comparison page. However, we have put it together from what we were able to find in Google Protocol Buffer's own docs + stack overflow + googling. It is entirely possible that we made mistakes.
Sending schema over the wire is not a good solution for anything else than point-to-point communication. An intermediate node would need every single schema transmitted along with every single messsage, or have another way to keep the schemas cached. That becomes complicated.
The Protobuf documentation says very clearly that you cannot see when one message ends and another begins. Then a protobuf message is not fully self describing. This might be easy to add, but it doesn't have it (according to Protobuf's own docs).
We have looked at Cap'n Proto - but late in the process where we had already looked at quite a lot of formats. From what I can see, Cap'n Proto is pretty much just a binary struct. That is pretty close to what we wanted to do with ION, except we wanted it to be compact on the wire too. We have seen that Cap'n Proto has a compaction mechanism, but we have not yet had time to analyze and compare it to ION's.Cap'n Proto with compaction would be very similar to ION - on a conceptual level.
However, we need to make space for some IAP specific fields coming later in the process (like cache references, column stores and more). Stuff that is IAP specific. That is why we chose to roll with our own encoding in the first place.
i have found that most of my huge json data is from uniform recordsets. there's a great json-compatible encoder for such cases that stores them in a format that's CSV-esque:
Not yet. We have been asked to compare ION to Flatbuffers, Cap'n Proto, Thrift, Avro, Transit, BSON and several other encodings. However, writing the benchmarks and going through the features systematically is a lot of work, so we have not yet had the time to go through them all.
I think the most interesting difference to the usual serialization formats might be the copy and reference types. I'm a little bit undecided whether they might be a brilliant idea or not. The decision whether to support copy or not puts some extra effort in the serializer and deserializer, but the total result is the same that you can have as without a copy field mechanism. The support of cyclic references makes a big change, because you can't directly model them with technologies. You might also have trouble using these data structures in some programming languages or libraries (e.g. if you are only using immutable types or want to use only value types). However for some kind of data it seems to make sense to support cyclic data, as GraphQL and Falcor have also added support for that.
I also don't see that many use cases for the table structure. I have deployed thousands of RPC APIs into production, and I can't recall having the need for it. And even if you need it, using an object with 2 arrays in it would be just fine.
I also looked through the IAP documentation (btw. bad name => ipod accessory protocol) because it's quite related to what I'm working on. I think that the shown basic communication patterns are correct, but from the documentation I can't really get a feeling what I could expect from an IAP library. Would it be some low level messaging system (like MQTT, ZeroMQ, etc.) or would higher level communication patterns (request/response, notifications) also be built in. There are no predifined message formats for RPC listed in the documentation which would outline that. The WAMP specification (http://wamp.ws) e.g. makes it clearer what I could expect from such a protocol. I'm not sure whether we need a new low level messaging protocol or if the work should be more focused on adding higher level semantics on top if it.
E.g. I think some pattern that I really need in my domain is remote object synchronization, which means the status of an object on the server gets automatically pushed towards all interested client and is continously updated during changes (=> e.g. to build something like Firebase). Of course one can built something like that on top of basic messages by defining subscribe and update messages in the API, but I'm wondering if it's worthwhile to add something like that directly in the protocol. On the one hand this is also a special case of the subscription pattern which is also listed here, on the other hand it can not directly be implemented with the subscription possibilites of many message broker systems, because they won't send you the current state of an object after subscription but will only forward you a message after the value changes for the next time.
The connection and sequence definition in IAP looks a little bit redudant to me on the first look. I really think there is a need for message ordering and you must support it. The question for me is then if you don't need message ordering, why not put the message into a seperate channel and let channels/streams always be ordered (like in HTTP/2)? Overhead for channel creation? Or to setup channels during creation either as ordered or unordered and keep that for the lifetime of the channel?
Mathias, you also don't need a binary protocol. XML would work. JSON too. But binary is faster and more compact. Same with the table construct. You can work around not having it, but now ION has it built in. You don't make the mistake of serializing an array of JSON objects because you are busy. The objects are serialized as an ION table - not a list of ION object fields. If you ever send an array of objects across the wire with ION (IAP Tools), you will be using the table mode automatically. You save bandwidth and parsing time automatically. Who don't want that - even if you don't need it?
Regarding Copy and Reference, the support for them is still not very good (= not automatic). But imagine your service executes an SQL JOIN query, and in that result a lot of objects are repeated (e.g. same zip + city for a lot of objects). The Copy field can be use to include the zip + city fields just once, and after that refer to them later with a Copy field. That is shorter than including them again. These two fields still need some work to have full support, but we are working on it.
Right now ION is the most well-defined part of IAP. The network protocol itself is still not 100¤ finalized. But, now that we are close to being done with ION (we still have extra fields to add as extended types), we can move forward with the IAP core protocols and semantic protocols. If we do not define a standard semantic protocol for remote object synchronization, IAP will be designed so that you can plug in your own semantic protocol to meet that need.
[+] [-] efaref|10 years ago|reply
[1] https://tools.ietf.org/html/rfc7049 [2] http://cbor.io/
[+] [-] VStack|10 years ago|reply
[+] [-] GordonS|10 years ago|reply
[+] [-] robalfonso|10 years ago|reply
[+] [-] woah|10 years ago|reply
[+] [-] ctz|10 years ago|reply
You can't, because such a plugin would have to have a-priori knowledge of the schema in use.
[+] [-] VStack|10 years ago|reply
Additionally, Protobuf is not good at encoding raw bytes - according to their own words.
[+] [-] umanwizard|10 years ago|reply
[+] [-] dplgk|10 years ago|reply
[+] [-] VStack|10 years ago|reply
Regarding GZipped JSON, it is true that GZiped JSON is small. But, due to the CRIME and BREACH attacks it is not recommended to compress data sent over encrypted connections (TLS).
If you look at our performance benchmarks page you can see a list of serialized length comparisons. As you can see, as soon as you send a few objects in an ION table, the difference is big. More than what you normally can gain with GZip (except perhaps for String).http://tutorials.jenkov.com/iap/ion-performance-benchmarks.h...
Furthermore, GZip only helps with transfer time, and actually slows down parsing time. If you look at our performance benchmarks you will see that ION parsing time is a lot faster than JSON. Additionally, if you really, really want high speed you do not parse ION (or JSON) into Java objects. You process the data directly in its binary form. If you look at our read-and-use benchmark you can see just how big a speed difference that gives. ION is designed for being processed directly. JSON isn't as good for that purpose.
Finally, ION is designed for fast arbitrary hierarchical navigation. JSON is not.
[+] [-] efaref|10 years ago|reply
Of course, anything more than a simple construct you're better off using a decoder (e.g. a Wireshark one).
Also, the fact that it's compatible with JSON means that you can use JSON in your development, and then switch to CBOR at the end for the reduction in packet size. In python it's as simple as changing:
to[+] [-] Cyph0n|10 years ago|reply
[+] [-] umanwizard|10 years ago|reply
[+] [-] rix0r|10 years ago|reply
Hasn't been published as far as I can Google.
[+] [-] pinkunicorn|10 years ago|reply
[+] [-] klodolph|10 years ago|reply
Har, har.
[+] [-] kentonv|10 years ago|reply
As the author of Protobuf v2 (the version that was open sourced by Google), I object to some of the "no"s in the protobuf column.
(Note: I no longer work on Protobuf, and I did not invent the format. I do work on and did invent Cap'n Proto.)
> Protobuf apparently isn't great at encoding raw bytes either (according to their own website).
Protobuf can handle raw bytes just fine, using the "bytes" type. There is no special encoding done on bytes; parsing and encoding is done by memcpy(). I'm curious to know what part of the web site you interpret as saying otherwise. It's entirely possible that the web site contains confusing language, but a citation would have been a good idea here.
> Schema / Class Id > Self describing
The Protobuf libraries have extensive support for manipulating dynamic schemas and transmitting schemas over the wire. See the "Descriptor" and "DynamicMessage" APIs. This is mentioned on the web site:
https://developers.google.com/protocol-buffers/docs/techniqu...
> Even if these compact objects do not contain any property names, they are still self describing enough that you can see where fields start and end, plus their data type, without an external schema. You cannot do that with Protobuf (as far as we know).
You absolutely can do that with Protobuf. This is what the "protoc --decode_raw" flag does, and it should be clear enough from reading the encoding.
https://developers.google.com/protocol-buffers/docs/encoding
> Cyclic references
While it's true that Protobuf doesn't support these, I hope you've considered the denial-of-service vulnerabilities they tend to create if the receiver is not expecting them. Please ensure that cyclic references are only allowed in cases where the app opted into it.
Relatedly, overlapping references / backreferences ("Copy" in your table) potentially leads to an amplification attack where a small message on the wire turns out to be much, much larger when traversed. If applications cannot defend themselves from huge payloads by setting a message size limit, then you'll need to give them some other way.
> All of the formats (except perhaps Protobuf) supports arbitrary hierarchical navigation of the encoded data, without first converting it to objects.
Protobuf supports this, and in fact should be an unqualified "Yes" rather than "Yes(*)" like the others. Protobuf encoding is very similar to ION's. Sub-messages are length-delimited, which seems to be exactly the advantage you're claiming that ION has.
Note that none of these formats support random access in the way that Cap'n Proto does.
In summary, I believe Protobuf deserves a "yes" in: "Raw bytes", "Good at raw bytes", "Schema / Class Id", "Arbitrary hierarchical navigation", and "Self describing".
[+] [-] VStack|10 years ago|reply
Sending schema over the wire is not a good solution for anything else than point-to-point communication. An intermediate node would need every single schema transmitted along with every single messsage, or have another way to keep the schemas cached. That becomes complicated.
The Protobuf documentation says very clearly that you cannot see when one message ends and another begins. Then a protobuf message is not fully self describing. This might be easy to add, but it doesn't have it (according to Protobuf's own docs).
We have looked at Cap'n Proto - but late in the process where we had already looked at quite a lot of formats. From what I can see, Cap'n Proto is pretty much just a binary struct. That is pretty close to what we wanted to do with ION, except we wanted it to be compact on the wire too. We have seen that Cap'n Proto has a compaction mechanism, but we have not yet had time to analyze and compare it to ION's.Cap'n Proto with compaction would be very similar to ION - on a conceptual level.
However, we need to make space for some IAP specific fields coming later in the process (like cache references, column stores and more). Stuff that is IAP specific. That is why we chose to roll with our own encoding in the first place.
[+] [-] leeoniya|10 years ago|reply
https://github.com/WebReflection/JSONH
[+] [-] VStack|10 years ago|reply
[+] [-] kevinSuttle|10 years ago|reply
See also: https://github.com/edn-format/edn
[+] [-] VStack|10 years ago|reply
Specification: http://tutorials.jenkov.com/iap/index.html
Benchmarks: http://tutorials.jenkov.com/iap/ion-performance-benchmarks.h...
Tutorials: http://tutorials.jenkov.com/iap-tools-java/index.html
Here is also a recent Infoq.com article http://www.infoq.com/articles/IAP-Fast-HTTP-Alternative?utm_....
[+] [-] fnordsensei|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] dmitrygr|10 years ago|reply
Problem accessing /iap/message-structure.html. Reason:
[+] [-] VStack|10 years ago|reply
[+] [-] mike_hock|10 years ago|reply
[+] [-] blakecallens|10 years ago|reply
[+] [-] shockzzz|10 years ago|reply
[+] [-] VStack|10 years ago|reply
[+] [-] Matthias247|10 years ago|reply
I also don't see that many use cases for the table structure. I have deployed thousands of RPC APIs into production, and I can't recall having the need for it. And even if you need it, using an object with 2 arrays in it would be just fine.
I also looked through the IAP documentation (btw. bad name => ipod accessory protocol) because it's quite related to what I'm working on. I think that the shown basic communication patterns are correct, but from the documentation I can't really get a feeling what I could expect from an IAP library. Would it be some low level messaging system (like MQTT, ZeroMQ, etc.) or would higher level communication patterns (request/response, notifications) also be built in. There are no predifined message formats for RPC listed in the documentation which would outline that. The WAMP specification (http://wamp.ws) e.g. makes it clearer what I could expect from such a protocol. I'm not sure whether we need a new low level messaging protocol or if the work should be more focused on adding higher level semantics on top if it.
E.g. I think some pattern that I really need in my domain is remote object synchronization, which means the status of an object on the server gets automatically pushed towards all interested client and is continously updated during changes (=> e.g. to build something like Firebase). Of course one can built something like that on top of basic messages by defining subscribe and update messages in the API, but I'm wondering if it's worthwhile to add something like that directly in the protocol. On the one hand this is also a special case of the subscription pattern which is also listed here, on the other hand it can not directly be implemented with the subscription possibilites of many message broker systems, because they won't send you the current state of an object after subscription but will only forward you a message after the value changes for the next time.
The connection and sequence definition in IAP looks a little bit redudant to me on the first look. I really think there is a need for message ordering and you must support it. The question for me is then if you don't need message ordering, why not put the message into a seperate channel and let channels/streams always be ordered (like in HTTP/2)? Overhead for channel creation? Or to setup channels during creation either as ordered or unordered and keep that for the lifetime of the channel?
[+] [-] VStack|10 years ago|reply
Regarding Copy and Reference, the support for them is still not very good (= not automatic). But imagine your service executes an SQL JOIN query, and in that result a lot of objects are repeated (e.g. same zip + city for a lot of objects). The Copy field can be use to include the zip + city fields just once, and after that refer to them later with a Copy field. That is shorter than including them again. These two fields still need some work to have full support, but we are working on it.
Right now ION is the most well-defined part of IAP. The network protocol itself is still not 100¤ finalized. But, now that we are close to being done with ION (we still have extra fields to add as extended types), we can move forward with the IAP core protocols and semantic protocols. If we do not define a standard semantic protocol for remote object synchronization, IAP will be designed so that you can plug in your own semantic protocol to meet that need.
[+] [-] k__|10 years ago|reply