Greplin (YC W10) open sources 10-15x faster protocol buffers for Python

[+] haberman|15 years ago|reply

For a long time (much longer than I expected it would take) I've been working on a protobuf implementation in C that does not use Google's C++ implementation at all. I've been through about three rewrites and I finally have the interface right. I'm hoping it will be usable with Python soon (weeks).

https://github.com/haberman/upb/wiki

(if anyone's looking at the code, I'm working on the src-refactoring branch at the moment)

The benefits of my approach are:

* you can avoid depending on a 1MB C++ library. upb is more like 30k compiled.

* you can avoid doing any code generation. instead you just load the .proto schema at runtime, so you don't have to get a C++ compiler involved.

* Google's protobuf library does have a dynamic/reflection option that avoids my previous point, but it is ~10x slower than generating C++ code. My library, last time I benchmarked it, was 70-90% of the speed of generated C++.

[+] sigil|15 years ago|reply

Here's a fast Python C extension for protobuf that's already usable:

https://github.com/acg/lwpb

I read through your upb code about 3-4 months ago, was initially impressed, but couldn't get the Python extension to work. Certain abstractions really lost me, like pushing and pulling between sources and sinks. Why not just let a top-level event loop run the show in terms of buffered reads and size calculation for writes? But maybe you've refactored since.

[+] jsarch|15 years ago|reply

Can you clarify what you mean by "70-90% of the speed of generated C++"?

Suppose that the generated C++ takes 1.0 seconds. Does your implementation take 0.7-0.9s or 1.7-1.9s or something else?

[+] apotheon|15 years ago|reply

Looks interesting. I might need to dig in.

I like the license, too.

[+] sigil|15 years ago|reply

I too have a speedy Protocol Buffer implementation in Python:

https://github.com/acg/lwpb

It clocks in at 11x faster than json, the same speedup reported by fast-pb. Only with lwpb:

* There's no codegen step -- which is a disgusting thing in a dynamic language, if you ask me.

* You're not forced into object oriented programming, with lwpb you can decode and encode dicts.

Most of haberman's remarks apply to lwpb as well, ie it's fast, small, and doesn't pull huge dependencies. The lwpb C code was originally written by Simon Kallweit and is similar in intent to upb.

[+] ssnot|15 years ago|reply

fast and small footprint, as components should be

[+] atamyrat|15 years ago|reply

We (http://connex.io/) use Protocol Buffers quite heavily, and Python implementation was the performance bottleneck in many places.

I was working on same thing, CyPB, which is 17 times faster than Google's Python implementation. https://github.com/connexio/cypb

This one seems more complete though at the moment. I might just mark the ticket in our tracker as closed and switch to fastpb :-/

[+] nostrademons|15 years ago|reply

Nifty. I've passed it along to the appropriate folks.

Google uses SWIG-wrapped C++ proto bindings in Python pretty extensively, so I'm not sure how much this gets over that approach. I checked out the source; it's basically using Jinja templates to autogen Python/C API calls. Basically like SWIG, but not using SWIG.

[+] slewis|15 years ago|reply

When I was at Google I worked with very large structured protocol buffers in Python at one point. A single piece of data could be hundreds of MB in total, consisting of millions of smaller protocol buffers. I was doing a pass over the whole structure so needed to access each smaller PB from Python.

One day I decided my program was too slow so I profiled it and saw that the hot spots were in the Python protocol buffer implementation. "Easy", I thought, "I'll used SWIGed c++ PBs instead." Made some changes and ran the program again. Almost the exact same run time as before! I profiled again and found that this time the hot spots were in the SWIG layer. I was making so many calls through SWIG to c++ (because I was walking millions of objects), that using SWIGed PBs v. native Python PBs made no difference to my run time. Maybe I could have done some more custom SWIG work to lower the call overhead, but I remember being convinced at the time that SWIG wasn't going to do the trick.

So I ended up writing a 30 line Python extension that processed the protocol buffers in c++ and put the data into Python data structures. Run time was reduced by a factor of 10, hooray!

[+] apotheon|15 years ago|reply

It doesn't appear to actually be open source:

Where's the license?

I think the term you want is "publishes", and not "open sources".

[+] rwalker|15 years ago|reply

Good catch - updating now. It'll be under Apache 2.0. (edit: done)

[+] cookiecaper|15 years ago|reply

As an aside, I don't really like the idea that "open source" should also have to be synonymous with "free software". Can the intended users access (and possibly modify at least locally) the source? Then it's open source. Why do we need "open source" to be identical to "free software"? Isn't that what "free software" means?

[+] peterlai|15 years ago|reply

I hope to see these changes incorporated within Google's official implementation.

As of right now, deserialization of json and xml are way faster in Python: http://stackoverflow.com/questions/499593/whats-the-best-ser...

[+] dirtae|15 years ago|reply

This is very welcome, but I hope Google fixes this problem in the official protobuf distribution.

It looks like protobuf 2.4.0 has experimental support for backing Python protocol buffers with C++ via the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION environment variable:

http://protobuf.googlecode.com/svn/trunk/CHANGES.txt

[+] traviscline|15 years ago|reply

Is this really a better approach than using Cython to wrap a c++ or c implementation?

[+] sigil|15 years ago|reply

You should add cPickle to the benchmark as well -- I bet fast-pb still comes out ahead, and that may be an eye opener for many Python devs.

[+] andrewvc|15 years ago|reply

MessagePack is up to 4x faster* than protobuf, and easier to work with btw IMHO.

http://msgpack.org/

I used it as the native format for DripDrop (https://github.com/andrewvc/dripdrop)

* In Some tests

[+] haberman|15 years ago|reply

> * In Some tests

In that test protobuf is forced to copy the 512 byte string (200,000) times, while it appears that MessagePack is referencing it.

Granted it's a bummer that protobuf can't do this easily (my protobuf library upb can -- see above post), but I think it's dishonest not to mention that a large portion of the difference (if not all of it) is just memcpy() that protobuf is doing but MessagePack is not.

It reminds me of when I worked at Amazon and we had a developer conference with several speakers. One speaker was plugging Erlang and showed a graph comparing C++ processes with Erlang processes, and the graph showed C++ being much slower or bigger. Scott Meyers was in the audience and raised his hand to ask "what are the Erlang processes not doing, to explain the difference?" The guy couldn't answer that question directly.

After a bit of digging, you realize that an Erlang "process" is a lightweight, interpreter-level abstraction that is implemented inside a regular OS process. So naturally it doesn't have any of the overhead that is associated with an OS process, and you don't have to make a system call to perform IPC.

So when you're posting benchmark comparisons, I think it's only right to mention any inherent differences in how much work you're doing.

[+] apotheon|15 years ago|reply

Do you mean four times faster or do you mean four times as fast? These terms are not synonymous. Four times faster means the same thing as five times as fast.

i.e.: If it is four times as fast, you multiply the speed by four, and that's the new speed. If it is four times faster, you multiply the speed by four and add it to the original speed, because that's how much faster it is than the original.

I fucking hate that television commercials have conflated the two for the public at large. If you have followed in their footsteps, I hope this public service announcement has helped you sort that out for the future.

[+] sigil|15 years ago|reply

Has anyone managed to run the fast-pb tests in benchmark.py? I'm not sure where this switch is coming from:

  protoc --fastpython_out

[+] rwalker|15 years ago|reply

Have you installed both protocol buffers and the fast-python-pb module? Feel free to email me: robbyw@(the-company-mentioned-in-the-title).com

33 comments