For a long time (much longer than I expected it would take) I've been working on a protobuf implementation in C that does not use Google's C++ implementation at all. I've been through about three rewrites and I finally have the interface right. I'm hoping it will be usable with Python soon (weeks).
(if anyone's looking at the code, I'm working on the src-refactoring branch at the moment)
The benefits of my approach are:
* you can avoid depending on a 1MB C++ library. upb is more like 30k compiled.
* you can avoid doing any code generation. instead you just load the .proto schema at runtime, so you don't have to get a C++ compiler involved.
* Google's protobuf library does have a dynamic/reflection option that avoids my previous point, but it is ~10x slower than generating C++ code. My library, last time I benchmarked it, was 70-90% of the speed of generated C++.
I read through your upb code about 3-4 months ago, was initially impressed, but couldn't get the Python extension to work. Certain abstractions really lost me, like pushing and pulling between sources and sinks. Why not just let a top-level event loop run the show in terms of buffered reads and size calculation for writes? But maybe you've refactored since.
It clocks in at 11x faster than json, the same speedup reported by fast-pb. Only with lwpb:
* There's no codegen step -- which is a disgusting thing in a dynamic language, if you ask me.
* You're not forced into object oriented programming, with lwpb you can decode and encode dicts.
Most of haberman's remarks apply to lwpb as well, ie it's fast, small, and doesn't pull huge dependencies. The lwpb C code was originally written by Simon Kallweit and is similar in intent to upb.
Nifty. I've passed it along to the appropriate folks.
Google uses SWIG-wrapped C++ proto bindings in Python pretty extensively, so I'm not sure how much this gets over that approach. I checked out the source; it's basically using Jinja templates to autogen Python/C API calls. Basically like SWIG, but not using SWIG.
When I was at Google I worked with very large structured protocol buffers in Python at one point. A single piece of data could be hundreds of MB in total, consisting of millions of smaller protocol buffers. I was doing a pass over the whole structure so needed to access each smaller PB from Python.
One day I decided my program was too slow so I profiled it and saw that the hot spots were in the Python protocol buffer implementation. "Easy", I thought, "I'll used SWIGed c++ PBs instead." Made some changes and ran the program again. Almost the exact same run time as before! I profiled again and found that this time the hot spots were in the SWIG layer. I was making so many calls through SWIG to c++ (because I was walking millions of objects), that using SWIGed PBs v. native Python PBs made no difference to my run time. Maybe I could have done some more custom SWIG work to lower the call overhead, but I remember being convinced at the time that SWIG wasn't going to do the trick.
So I ended up writing a 30 line Python extension that processed the protocol buffers in c++ and put the data into Python data structures. Run time was reduced by a factor of 10, hooray!
As an aside, I don't really like the idea that "open source" should also have to be synonymous with "free software". Can the intended users access (and possibly modify at least locally) the source? Then it's open source. Why do we need "open source" to be identical to "free software"? Isn't that what "free software" means?
This is very welcome, but I hope Google fixes this problem in the official protobuf distribution.
It looks like protobuf 2.4.0 has experimental support for backing Python protocol buffers with C++ via the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION environment variable:
In that test protobuf is forced to copy the 512 byte string (200,000) times, while it appears that MessagePack is referencing it.
Granted it's a bummer that protobuf can't do this easily (my protobuf library upb can -- see above post), but I think it's dishonest not to mention that a large portion of the difference (if not all of it) is just memcpy() that protobuf is doing but MessagePack is not.
It reminds me of when I worked at Amazon and we had a developer conference with several speakers. One speaker was plugging Erlang and showed a graph comparing C++ processes with Erlang processes, and the graph showed C++ being much slower or bigger. Scott Meyers was in the audience and raised his hand to ask "what are the Erlang processes not doing, to explain the difference?" The guy couldn't answer that question directly.
After a bit of digging, you realize that an Erlang "process" is a lightweight, interpreter-level abstraction that is implemented inside a regular OS process. So naturally it doesn't have any of the overhead that is associated with an OS process, and you don't have to make a system call to perform IPC.
So when you're posting benchmark comparisons, I think it's only right to mention any inherent differences in how much work you're doing.
Do you mean four times faster or do you mean four times as fast? These terms are not synonymous. Four times faster means the same thing as five times as fast.
i.e.: If it is four times as fast, you multiply the speed by four, and that's the new speed. If it is four times faster, you multiply the speed by four and add it to the original speed, because that's how much faster it is than the original.
I fucking hate that television commercials have conflated the two for the public at large. If you have followed in their footsteps, I hope this public service announcement has helped you sort that out for the future.
[+] [-] haberman|15 years ago|reply
https://github.com/haberman/upb/wiki
(if anyone's looking at the code, I'm working on the src-refactoring branch at the moment)
The benefits of my approach are:
* you can avoid depending on a 1MB C++ library. upb is more like 30k compiled.
* you can avoid doing any code generation. instead you just load the .proto schema at runtime, so you don't have to get a C++ compiler involved.
* Google's protobuf library does have a dynamic/reflection option that avoids my previous point, but it is ~10x slower than generating C++ code. My library, last time I benchmarked it, was 70-90% of the speed of generated C++.
[+] [-] sigil|15 years ago|reply
https://github.com/acg/lwpb
I read through your upb code about 3-4 months ago, was initially impressed, but couldn't get the Python extension to work. Certain abstractions really lost me, like pushing and pulling between sources and sinks. Why not just let a top-level event loop run the show in terms of buffered reads and size calculation for writes? But maybe you've refactored since.
[+] [-] jsarch|15 years ago|reply
Suppose that the generated C++ takes 1.0 seconds. Does your implementation take 0.7-0.9s or 1.7-1.9s or something else?
[+] [-] apotheon|15 years ago|reply
I like the license, too.
[+] [-] sigil|15 years ago|reply
https://github.com/acg/lwpb
It clocks in at 11x faster than json, the same speedup reported by fast-pb. Only with lwpb:
* There's no codegen step -- which is a disgusting thing in a dynamic language, if you ask me.
* You're not forced into object oriented programming, with lwpb you can decode and encode dicts.
Most of haberman's remarks apply to lwpb as well, ie it's fast, small, and doesn't pull huge dependencies. The lwpb C code was originally written by Simon Kallweit and is similar in intent to upb.
[+] [-] ssnot|15 years ago|reply
[+] [-] atamyrat|15 years ago|reply
I was working on same thing, CyPB, which is 17 times faster than Google's Python implementation. https://github.com/connexio/cypb
This one seems more complete though at the moment. I might just mark the ticket in our tracker as closed and switch to fastpb :-/
[+] [-] nostrademons|15 years ago|reply
Google uses SWIG-wrapped C++ proto bindings in Python pretty extensively, so I'm not sure how much this gets over that approach. I checked out the source; it's basically using Jinja templates to autogen Python/C API calls. Basically like SWIG, but not using SWIG.
[+] [-] slewis|15 years ago|reply
One day I decided my program was too slow so I profiled it and saw that the hot spots were in the Python protocol buffer implementation. "Easy", I thought, "I'll used SWIGed c++ PBs instead." Made some changes and ran the program again. Almost the exact same run time as before! I profiled again and found that this time the hot spots were in the SWIG layer. I was making so many calls through SWIG to c++ (because I was walking millions of objects), that using SWIGed PBs v. native Python PBs made no difference to my run time. Maybe I could have done some more custom SWIG work to lower the call overhead, but I remember being convinced at the time that SWIG wasn't going to do the trick.
So I ended up writing a 30 line Python extension that processed the protocol buffers in c++ and put the data into Python data structures. Run time was reduced by a factor of 10, hooray!
[+] [-] apotheon|15 years ago|reply
> # Copyright 2010 Greplin, Inc. All Rights Reserved.
Where's the license?
I think the term you want is "publishes", and not "open sources".
[+] [-] rwalker|15 years ago|reply
[+] [-] cookiecaper|15 years ago|reply
[+] [-] peterlai|15 years ago|reply
As of right now, deserialization of json and xml are way faster in Python: http://stackoverflow.com/questions/499593/whats-the-best-ser...
[+] [-] dirtae|15 years ago|reply
It looks like protobuf 2.4.0 has experimental support for backing Python protocol buffers with C++ via the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION environment variable:
http://protobuf.googlecode.com/svn/trunk/CHANGES.txt
[+] [-] traviscline|15 years ago|reply
[+] [-] sigil|15 years ago|reply
[+] [-] andrewvc|15 years ago|reply
http://msgpack.org/
I used it as the native format for DripDrop (https://github.com/andrewvc/dripdrop)
* In Some tests
[+] [-] haberman|15 years ago|reply
In that test protobuf is forced to copy the 512 byte string (200,000) times, while it appears that MessagePack is referencing it.
Granted it's a bummer that protobuf can't do this easily (my protobuf library upb can -- see above post), but I think it's dishonest not to mention that a large portion of the difference (if not all of it) is just memcpy() that protobuf is doing but MessagePack is not.
It reminds me of when I worked at Amazon and we had a developer conference with several speakers. One speaker was plugging Erlang and showed a graph comparing C++ processes with Erlang processes, and the graph showed C++ being much slower or bigger. Scott Meyers was in the audience and raised his hand to ask "what are the Erlang processes not doing, to explain the difference?" The guy couldn't answer that question directly.
After a bit of digging, you realize that an Erlang "process" is a lightweight, interpreter-level abstraction that is implemented inside a regular OS process. So naturally it doesn't have any of the overhead that is associated with an OS process, and you don't have to make a system call to perform IPC.
So when you're posting benchmark comparisons, I think it's only right to mention any inherent differences in how much work you're doing.
[+] [-] apotheon|15 years ago|reply
i.e.: If it is four times as fast, you multiply the speed by four, and that's the new speed. If it is four times faster, you multiply the speed by four and add it to the original speed, because that's how much faster it is than the original.
I fucking hate that television commercials have conflated the two for the public at large. If you have followed in their footsteps, I hope this public service announcement has helped you sort that out for the future.
[+] [-] sigil|15 years ago|reply
[+] [-] rwalker|15 years ago|reply