Using Protobuf instead of JSON to communicate with a front end

[+] oppositelock|10 years ago|reply

I worked on a product inside Google which used protos (v1) as the data format to a web front end, and in practice, that system was a failure, in part to the decision to use protos. The deserialization cost of protocol buffers is too high if you're doing complex data throughput, and even though the data size is smaller, it's better to send larger gzipped JSON (which will be decompressed in native code) and deserialized into JS (also via native code). We weren't using ProtoBuf.js, but our own internal javascript implementation of a similar library, and doing all of this in JS was too expensive. Granted, we were sending around protos that had multi megabyte payloads at times.

We rewrote our app eventually to send protos in JSON format to the app, while just letting our backends still pass around native protos, it worked a lot better.

[+] haberman|10 years ago|reply

Things have changed a lot since your experience, I think. For one, a different encoding called "JSPB" has become the de facto standard for doing Protocol Buffers in JavaScript, at least inside Google. JSPB is parseable with JSON.parse(), so it avoids the speed issues you experienced.

And looking forward, JavaScript parsing of protobuf binary format has gotten a lot faster, thanks in large part to newer JavaScript technologies like TypedArray. Ideally JSPB would be deprecated as a wire format in favor of fast JavaScript parsing of binary protobufs, but this would of course be contingent on the performance being acceptable.

Finally, JSON is becoming a first-class citizen in proto3, so protobuf vs. JSON will no longer be an either/or, it can be a both/and. https://developers.google.com/protocol-buffers/docs/proto3#j...

[+] haberman|10 years ago|reply

Making JSON first-class is an explicit design goal of proto3, the next version of Protocol Buffers currently in alpha: https://developers.google.com/protocol-buffers/docs/proto3#j...

This will allow you to switch between JSON and protobuf binary on the wire easily, while using official protobuf client libraries. So you can choose easily whether you care more about size/speed efficiency or wire readability. Best of both worlds!

I work on the protobuf team at Google and would be happy to answer any questions.

[+] teacup50|10 years ago|reply

I'm really bummed that you got rid of required fields in pb3. Now every consumer has to write additional code to verify that their required fields are actually available, and the proto spec is barely useful as an actual interpretable spec -- you have to specify requirements purely in comments.

On top of which, you've defined built-in default values for empty fields; this means that, without warning, an accidentally missing field will inject bad data into any consumer that doesn't carefully check for the existence of all required fields.

These are basically killer issues for us; we're not going to adopt an "update" that requires us to write JSON-style "hey, does this field exist?" code everywhere.

[+] caust1c|10 years ago|reply

> Message field names are mapped to lowerCamelCase

Why is a mapping to camel case necessary? I imagine it creates the potential for collisions, no?

[+] zapov|10 years ago|reply

Do you plan on improving Protobuf speed in Java? People don't expect it to be slower than JSON ;)

http://hperadin.github.io/jvm-serializers-report/report.html

[+] sapek|10 years ago|reply

JSON [1][2] (in fact text-based protocols in general) is also first-class in Bond (Microsoft's framework similar to ProtoBuf).

[1] https://microsoft.github.io/bond/manual/bond_cpp.html#simple...

[2] https://microsoft.github.io/bond/manual/bond_cs.html#json

[+] borplk|10 years ago|reply

When can we expect proto3 to be stable and ready to use for general public?

[+] skybrian|10 years ago|reply

It's possible to encode a protobuf as JSON and we do it all the time at Google. In browsers, native JSON parsing is very fast and the data is compressed, so going to a binary format doesn't seem worthwhile. The .proto file is used basically as an IDL from which we generate code.

[+] cletus|10 years ago|reply

Personally I've found JSON encoded protobufs to be almost universally awful.

The most common method is to use an array indexed by the field number. I've seen protobufs with hundreds of fields so that's hundreds of nulls as the string "null".

The alternative is to have JSON objects with attributes named after the protobufs field name. This isn't without warts either and seems to be less prevelant in my experience.

Another problem is JavaScript doesn't support all the data types you can get in protobufs, most notably int64s.

Protobufs are relatively space efficient (eg variable width int types). JSON encoded protobufs much less so.

Perhaps the rise of browser support for raw binary data will make this less awful.

Many consider it a virtue to use the same code on the client and server. It explains things like this and GWT. Personally opi think this is horribly misguided and a fools errand. You want to decouple your client and server as much as possible (IMHO).

Disclaimer: I work for Google

[+] teh|10 years ago|reply

Can I ask which library you are using? I found a few [1] but none seem super robust.

Also, how do you deal with the bytes type?

[1] https://code.google.com/p/protobuf-json/ https://github.com/benhodgson/protobuf-to-dict

[+] justinsb|10 years ago|reply

I like to use Protobuf in my server code, but then support JSON _or_ Protobuf as the encoding. So browsers can continue to use JSON, but the server gets strongly-typed Protobuf structures.

[+] haberman|10 years ago|reply

What you describe is exactly how proto3, the latest version of protobuf, will work!

proto3 supports both binary protobuf encoding and JSON natively, so you can switch between them as desired. https://developers.google.com/protocol-buffers/docs/proto3

proto3 is currently in alpha, but we are working to bring it closer to release (I work on the protobuf team at Google).

[+] edgarvm|10 years ago|reply

Which library do you use to support both?

[+] PaulHoule|10 years ago|reply

Yeah,

    if you are using a statically typed language,  binary formats like Protobuf are a big win,  but if you are going to have the dynamic language overheard that comes with JS,  there isn't much gain to be had from binary formats.

[+] sbarre|10 years ago|reply

The biggest takeaway for me from this experiment was "always make sure you are gzipping your output".

[+] zubspace|10 years ago|reply

One thing, where protobuf (at least protobuf-net) really shines, is serialization of data into a binary format which is incredibly fast. In .NET, all inbuilt alternatives are slower by a large margin.

https://code.google.com/p/protobuf-net/wiki/Performance

[+] dmsimpkins|10 years ago|reply

I agree. I recently converted some large files that were previously stored using XmlSerializer to use protobuf-net, and I found an 8x increase in space efficiency, and 6-7x increase in (de)serialization efficiency. It really is a fantastic library, and if your classes are already marked up for serialization, there is very minimal work required to make the switch. For files that need not be human-readable, protobuf is definitely the way to go.

[+] benjaminjackman|10 years ago|reply

It would probably be better to try something like Cap'n Proto or SBE if worried about performance. Otherwise I think sticking to GZIP'd json isn't going to lag that far behind. Protocol buffers biggest benefit IMHO is just their .proto file for cross language code generation.

I have it on a todo list to port an SBE parser to ScalaJS. ScalaJS already backs java ByteBuffers with javascript TypedArrays. That should be really fast, the same stuff that is being worked on for making asm.js fast will also make the Cap'n Proto / SBE approach fast, so I think this has the most promise of bringing really high-performance data transfer capabilities to the browser.

[+] rqebmm|10 years ago|reply

Having used both on a few projects, including a JS frontend, my advice is:

"Don't use protobufs if you don't have to".

Protobufs can be much faster, and provide a strict schema, but it comes at the price of higher maintenance costs. JSON is much simpler, easier to implement, and MUCH easier to debug. If your GPB looks like it's building properly, but fails to parse, it's a huge pain to try and decode/debug the binary. You'll wish you could just print the JSON string.

If you need the speed and schema, then GPBs are great. In our case, we got a huge speed boost just by avoiding string building/parsing inherent in JSON.

[+] nfmangano|10 years ago|reply

Could you elaborate on the maintenance costs? We use ProtoBufjs for our own real-time whiteboarding webapp over web sockets, and in the long run having strict schemas has saved us a lot of time. We're a distributed team with different members working on the front and backends, and we frequently refer to our proto files to remember how data is transferred and how it should be interpreted (explained in our proto commented code).

Are the maintenance costs related to debugging unparsable messages? We've almost never had an issue there, so maybe we've just been lucky?

[+] wora|10 years ago|reply

Co-author of proto3 here. Proto3 was specifically designed to make proto more friendly in variety of environments, which includes native JSON support. New Google REST APIs are defined in proto3, which are open sourced[1].

[1] https://github.com/google/googleapis

[+] sdenton4|10 years ago|reply

In my experience, it's not that tough to write a 'proto-to-dict' function in python, which lets you crack open the proto and look at its juicy innards...

[+] mhahn|10 years ago|reply

I'm curious if Google has a common envelope they send all service messages with. Ie. A common way of specifying pagination parameters, auth tokens etc. when sending protobuf messages between services. I've been using protobufs for my services and wrote a ServiceRequest object which has worked well. I was more just surprised about not being able to find much documentation on actual deployments as opposed to just simple tutorials.

[+] dustingetz|10 years ago|reply

Transit is similar but addresses the flaws described in this article

http://blog.cognitect.com/blog/2014/7/22/transit

[+] teh|10 years ago|reply

Not sure transit is designed for the same space. E.g. there seems to be no schema, and the default JSON encoding isn't super readable either.

Protobufs can be encoded as JSON and as text, so there are some ways to address the readability I guess.

[+] jwr|10 years ago|reply

Transit is a really good solution.

As for Protobuf, I tried using it in a number of places, but found it to be very inflexible (schema!) and hard to debug in case of problems.

[+] Animats|10 years ago|reply

With one end in Python 2 and the other end in Javascript, using binary protobufs seems misplaced optimization. It's nice to know the support is there (well, not in Python 3, apparently), in case you need to talk to something that speaks protobufs.

I'm looking forward to seeing protobufs in Rust as a macro. It should be possible; there's an entire regular expression compiler for Rust as a compile-time macro, which is a useful optimization.

[+] kibwen|10 years ago|reply

It's not protobuf, but Rust has quite good Cap'n Proto support: https://crates.io/crates/capnp

[+] rikrassen|10 years ago|reply

One of the comments on that article was "YAY! JSON is wastefully large. I'd love to replace it." Is this true? I'm confused why JSON would be seen as a wasteful as a format. It seems to be that with any decent compression I would think it's hard to get much smaller. In this case I'm not talking about the other advantages Protobuf offers, I just want to know about size.

[+] shanemhansen|10 years ago|reply

There are basically 2 areas where JSON is really wasteful. Compression can help with both of those.

  1. Dictionary keys are repeated when you have an array of similar objects.
  2. Non-text data. JSON can't natively represent binary data, forcing people to use things like base64 for binary and base10 for numbers.

[+] alkonaut|10 years ago|reply

> "YAY! JSON is wastefully large. I'd love to replace it." Is this true? I'm confused why JSON would be seen as a wasteful as a format.

It transmits type and field names. Depending on how complex your data is those strings could be a large part of the data.

{ "person": { "age": 30, "shoesize": 10 } }

The above is what, 4-5 bytes of protobuf? I'm not sure what the gzipped-json data is but likely a lot more. If you were to send a list of 100 such person objects, the difference would be smaller.

[+] maratd|10 years ago|reply

> Is this true?

No, it isn't true, but regardless of what format you use, there will always be someone who's not happy. Actually, I think that applies to everything in life.

[+] gobengo|10 years ago|reply

+1 to "Did this in a real product and fully regret it"

[+] laurentoget|10 years ago|reply

Another way to do this is to specify the protocol in protobuf but have the server translate responses and requests to and from json. The java protobuf library does that for you out of the box. This is easier to implement. I would be curious to compare performance of both approaches in different contexts.

[+] krapht|10 years ago|reply

How does Protobuf compare with Corba? I'd be interested in anybody's experience if they have used both.

[+] nostrademons|10 years ago|reply

CORBA was ridiculously complex, because they tried to make remote objects look like local ones, with messages, reference counting, naming, discovery, etc. Protobuf is just a serialization mechanism. You're thinking at a lower level of abstraction - it's all just PODs that go over the wire, you build your own RPC framework on top of that (or use gRPC, which is Google's protobuf-over-HTTP2 RPC library) and think in terms of requests & responses.

IMHO trying to make everything look like an object was a mistake, and newer RPC frameworks like gRPC, Thrift, and JSON-over-HTTP are much easier to use than the late-90s frameworks like RMI, CORBA, and DCOM. Sometimes you don't want abstraction, because it abstracts away details you absolutely need to think about.

[+] flavor8|10 years ago|reply

> Reading time: ~15 minutes.

842 words including code.

Average adult reading speed: 300 words/minute.

Does not compute.

[+] Keats|10 years ago|reply

I know, I included some time for people wanting to to open some links, the github project etc.

Only reading the text itself takes indeed less than 5 minutes, not sure which approach people prefer.

[+] drawkbox|10 years ago|reply

There is definitely a place for binary serialization/de-serialization and transmission. Inter-system communication is probably the best place for binary or any place that needs high speed real-time communication with the smallest size to fit in MTU limits (game protocols over UDP for instance). Any place that you control the client and server is ok to use binary.

However, I do feel there is a strange swaying back to binary (Protobuf/HTTP/2/etc). Developers are trying to wedge it in now in places it may cause more problems because it is more efficient in performance but not in use or implementation. Plus, like mentioned in this thread, you can compress JSON to be very small to send over the wire which makes the compactness of it a non-issue in non real-time cases. Going binary just to go binary is more trouble than it is worth in most cases.

- Binary over keyed plain text (JSON) is harder to generically parse objects i.e. dictionaries/lists for just a few fields/keys.

- Binary over JSON also seems to lock down messaging more, people have more work to change binary explicit messages because of offset issues and client/server tools must be in sync rather than just adding a new key that can be pulled as needed.

- Third party implementation and parsing of JSON/XML is more forgiving making version upgrades and changes easier to do. This is especially apparent on projects that are taken over by other developers.

- The language/platform on the backend leaks into the messaging. For instance Protobuf only runs on js/python currently and has various versions. The best messaging is independent of the platform and versioning is easier.

I would bet binary formats end up causing more bugs over keyed/plaintext (JSON/XML and possibly compressed) though I have nothing to back that up by except my own experience largely in game development where networking state is almost always binary, for server/data I wouldn't use it unless it needs to be real-time.

That being said Protobuf is awesome and I hope developers are using it where it is best suited and that developers don't start obfuscating messaging for performance where it doesn't really need to be, better to be simple unless you need to make it more complex at every level.

[+] omouse|10 years ago|reply

At work we're using HTTP requests and now we added RabbitMQ in the last few months to deal with the fact that our frontend has to talk to our backend. After seeing this article it feels like we chose the wrong tool for the job; protobuf/thrift appear to be typed which would have saved us a lot of frustration as we've already run into multiple cases where the receiver or sender have messed up the type conversion or parsing.

[+] tokenizerrr|10 years ago|reply

I don't see how protobuf is mutually exclusive with RabbitMQ. RabbitMQ is a message broker and can send around byte arrays. These byte arrays can be anything, including protobuf messages.

[+] vruiz|10 years ago|reply

I guess it only makes sense if you are already using protobuf everywhere else in your stack. Specially if you are leveraging GRPC[0] which is already profobuf over HTTP. The network tab problem could be solved by an extension, or browsers could offer the tools built-in if there were to become a trend.

[0] http://www.grpc.io/

[+] swalsh|10 years ago|reply

I always wondered why google decided to build Protocol Buffers. ASN.1 seemed like it worked well, and it covered all the corners.

[+] VikingCoder|10 years ago|reply

Here was Kenton Varda's response:

https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4

My understanding of ASN.1 is that it has no affordance for forwards- and backwards-compatibility, which is critical in distributed systems where the components are constantly changing.

...

OK, I looked into this again (something I do once every few years when someone points it out).

ASN.1 _by default_ has no extensibility, but you can use tags, as I see you have done in your example. This should not be an option. Everything should be extensible by default, because people are very bad at predicting whether they will need to extend something later.

The bigger problem with ASN.1, though, is that it is way over-complicated. It has way too many primitive types. It has options that are not needed. The encoding, even though it is binary, is much larger than protocol buffers'. The definition syntax looks nothing like modern programming languages. And worse of all, it's very hard to find good ASN.1 documentation on the web.

It is also hard to draw a fair comparison without identifying a particular implementation of ASN.1 to compare against. Most implementations I've seen are rudimentary at best. They might generate some basic code, but they don't offer things like descriptors and reflection.

So yeah. Basically, Protocol Buffers is a simpler, cleaner, smaller, faster, more robust, and easier-to-understand ASN.1.

[+] jws|10 years ago|reply

ASN.1 is so ludicrously complex that it has led to a number of severe internet exploits.

95 comments