sapek's comments

sapek | 10 years ago | on: Reflection in C++14

You are right, it is compile-time reflection, which I think is the most interesting one for C++.

sapek | 10 years ago | on: Reflection in C++14

There is a really interesting proposal to add reflection to the language: N4447 [1] (N3951 before). The idea is based on an observation that C++ already sort of has reflection-like syntax in form of parameter pack expansion [2]. I find this approach to be a very elegant and in the spirit of modern C++. It is also very powerful. Bond serialization is implemented using reflection. Because C++ doesn't have reflection, for normal classes we have to use code generation to provide type metadata. But for std::tuple<T...> all the necessary information is already available and Bond can serialize arbitrary instances of std::tuple<T...> today [3].

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n444...

[2] http://en.cppreference.com/w/cpp/language/parameter_pack

[3] https://microsoft.github.io/bond/manual/bond_cpp.html#tuples

sapek | 10 years ago | on: Why Go is doomed to succeed

Whether you like Go or not depends on whether you think of software development as engineering or as craft.

sapek | 11 years ago | on: Microsoft Bond

Bond has 3 ways of working with serialized payloads:

1) Simple deserialization where an object is fully materialized. This is the simplest (you write your business logic against idiomatic types in your target language) but, as you note, might not be best from performance perspective in some scenarios.

2) Lazy deserialization at the level of nested structs via bonded<T> [1]. In many cases this allows to optimize performance if you don't need to materialize the whole object but w/o loosing much in terms of simplicity of the programming model.

3) Transforms [2]. This is a very generic and very powerful abstraction in Bond. All higher level APIs like serialization and deserialization are internally implemented using this mechanism (even Python bindings [3] are fully implemented on top of Bond C++ meta-programming infrastructure driving Boost Python). In C++ it is built using template meta-programming and in C# using LINQ expressions. In Both cases it is a little like having a generic, strongly typed parser for richly typed data.

[1] https://microsoft.github.io/bond/manual/bond_cpp.html#unders...

[2] https://microsoft.github.io/bond/manual/bond_cpp.html#transf...

[3] https://microsoft.github.io/bond/manual/bond_py.html

sapek | 11 years ago | on: But Where Do People Work in This Office?

I think private offices are better for collaboration too. When I don't worry about distracting everyone around me I feel much freer to engage in a long discussion with one or two other developers in my office.

sapek | 11 years ago | on: Bond – An extensible framework for working with schematized data

I think your points are valid, especially about wire format compatibility. Bond and Thrift are in fact close enough that providing Thrift compatibility is a real possibility. It wasn't high enough on our priority list to make the cut but we do in fact use some Thrift-based system internally. The impedance mismatch in the type systems are higher between Bond and Avro/Protobuf so we will have to see about those.

I hear you on the fragmentation. I know that this doesn't help the community as an explanation, but big companies like Facebook, Google and Microsoft really have a good reasons to control such fundamental pieces of their infrastructure as serialization. Case in point: Facebook has forked their own Thrift project because, I presume, having it as Apache project was too restraining.

FWIW, we plan to develop Bond in public and accept contributions from the community.

sapek | 11 years ago | on: Bond – An extensible framework for working with schematized data

1) I didn't do a good job explaining this. You are right that if you want to materialize an object during deserialization you need to know a schema at build time to generate your class. But the crucial things is, and this is true of all similar frameworks, you don't know the schema of the payload at that point. One big reason you use something Protobuf, Thrift or Bond is to get forward/backward compatibility. What this means in essence is that deserialization is always mapping between schema you built your code with and schema of the payload. There are two common ways to do that mapping: (a) payload has interleaved schema information within data and you perform branches at runtime based on what you find in the payload (this is what Protobuf, Thrift and Bond tagged protocols do) (b) you get schema of payload at runtime and use that information perform the mapping (this is what Avro and Bond untagged protocol do). The latter case is particularly suitable for storage scenarios: you read schema from file/stream header and then process many records that have that schema. This is the case where having ability to emit code at runtime results in a huge performance win: you JIT schema-specific deserializer once and amortize this over many records.

2) You can do both. You can also do type safe transformations/aggregations/etc on serialized data w/o materializing any object.

sapek | 11 years ago | on: Bond – An extensible framework for working with schematized data

Cross-language support in such frameworks takes a lot of effort. Whenever you add a new language you really hit the classic 80-20 problem.

We have support for a few more languages that we are using internally but after having a hard look at the implementations we decided that they weren't up to par for the open source release yet. I hope that we will release more soon. And needless to say, we are open to contributions from the community.

sapek | 11 years ago | on: Bond – An extensible framework for working with schematized data

There are two advantages to generating code at runtime:

1) In some scenarios you have information at runtime that allows you do generate much faster code. The canonical example is untagged protocols, where serialized payload doesn't contain any schema information and you get schema at runtime. Bond supports untagged protocols (like Avro) in addition to tagged ones (like protobuf and Thrift) and the C# version generates insanely fast deserializer for untagged.

2) It allows programmatic customizations. If the work is done via codegen'ed source code then the only way for user to do something custom is to change the code generator to emit modified code. Even if codegen provides ability to do that, it is very hard to maintain such customizations. In Bond the serialization and deserialization are composed from more generic abstractions: parsers and transforms. These lower level APIs are exposed to the user. As an example imagine that you need to scrub PII information from incoming messages. This is a bit like deserialization, because you need to parse the payload, and a bit like serialization, because you need to write the scrubbed data. In Bond you can implement such an operation from those underlying abstractions and because you can emit the code at runtime you don't sacrifice performance.

BTW, Bond allows to do something similar in C++. The underlying meta-programming mechanism is different (compile-time template meta-programming instead of runtime JIT) but the principle that serialization and deserialization are not special but are composed from more generic abstractions is the same.

page 1