Bond – An extensible framework for working with schematized data

[+] gregwebs|11 years ago|reply

The Bond compiler is written in Haskell: http://blog.nullspace.io/bond-oss.html

It is about time considering that Microsoft research has been one of the main funders of work on the Haskell compiler.

[+] unknown|11 years ago|reply

[deleted]

[+] oscargrouch|11 years ago|reply

I've saw this yesterday, pretty happy about it, but then.. see that the compiler was coded in haskell..

This make it pretty "unportable" because its the same dependency with the Java VM, so how can i distribute code with this library, with a dependency like that, asking people to download the whole GHC ?!

Unfortunately for libraries that should be embedded in third-party code, the reality beyond C/C++ is pretty harsh.. for full applications the reality is different.. but for embedded libraries.. despite the fact that i've liked the solution for something im doing, i had to pass because of this small detail.. and im too busy to write a parser in C++ to make this more portable in source code form.. so i had to get back to protobuf :/

[+] leetrout|11 years ago|reply

Slightly OT- I'm working with data sets that might change, but not often if at all, which are provided by Elasticsearch. I'm processing the raw data in Flask (API), munging, joining, and dropping what I don't want going out to the world.

I've been toying with the idea of using something like PB, Cap'n Proto, or now Bond to define and track schema changes and centralize marshaling / serializing logic. I'm not concerned about having RPC. Does this sound like crazy talk? Anyone else happen to track schemas agains schemaless data stores?

(I also like the idea of not having to ship JSON everywhere if I don't want to.)

[+] seanp2k2|11 years ago|reply

TL;DR by using more of the available features in ElasticSearch, you can probably replace all of your external app with ElasticSearch.

A few things:

- ElasticSearch is definitely not schema-less, but it can try to generate a schema (aka "mapping") for you if you don't give it one: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/c...

- ElasticSearch has tons of ways to customize the data you get back, so, unless you really don't want the ES cluster crunching things for you, you can do a lot of the transformation server-side. You can go so far as to have your own type + mapping for e.g. a report, which sources data from another type and transforms it: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

- This covers both why the schema can't, by nature, be dynamic (so the argument of "schema-less / dynamic schema" is BS in practice IMO), as well as how to get data out from one index an into another (e.g. your "report" index which does scripted transformation).

- Another idea would be to use the scripting module to write a custom "view": http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

- You can use Groovy, mvel, JS, or Python for scripts. If you combine this with how ES lets you do "site plugins", you could make a JS + CSS + HTML site which is actually served by the ES cluster, which interacts with it and generates reports or whatever all without additional infrastructure. Example: https://github.com/karmi/elasticsearch-paramedic

[+] rch|11 years ago|reply

I'm doing something similar at work. I'd be happy to chat about it if you drop me a line sometime.

[+] ziedaniel1|11 years ago|reply

It's cool that the .NET version actually JITs specialized serialization and deserialization code at runtime. This is one place where managed languages really shine, because emitting bytecode is easier and more portable than emitting, say, raw x86. It's also safer -- the runtime can verify the memory safety and type safety of the code.

[+] Someone|11 years ago|reply

Is it? To start that JIT process, you need to have a class in your code that the compiler for the .NET version generated. Disk space is cheap nowadays, even on mobile, so I do not see a big disadvantage of generating the deserialization code at the same time the source code for the class gets generated (and if you things that way, you lose get one-time delay, and you don't need the code that generates those serializers in your application)

What a I overlooking? What information is know at runtime that isn't already available at build time? (And no, "the exact CPU/memory/etc. the code runs on is not a valid answer. This is C# code, so there always is a runtime that handles that stuff)

[+] sapek|11 years ago|reply

There's been a lot of questions on how Bond compares to Protobuf, Thrift and Avro. I tried to put some information at this page: http://microsoft.github.io/bond/why_bond.html

[+] nly|11 years ago|reply

No RPC? Disappointing. There are so few choices C and C++ programmers with regard to battle-tested, easy (read: code generation for decode and dispatch), language-agnostic RPC.

[+] sapek|11 years ago|reply

We are planning to release cross-platform RPC support but it just wasn't ready yet and we didn't want hold up the core release for it.

[+] bradleyankrom|11 years ago|reply

Have you tried any of the MessagePack RPC implementations? I haven't but I'm curious.

[+] a_c|11 years ago|reply

How would this compared with apache thrift?

[+] sapek|11 years ago|reply

See https://news.ycombinator.com/item?id=8868045

[+] sdave|11 years ago|reply

how does it compare to protobuf,thrift ?

[+] joncfoo|11 years ago|reply

Quoting apc @ https://lobste.rs/s/7w6p95/msft_open_sources_production_seri...

The current offerings (Thrift, ProtoBuffs, Avro, etc.) tend to have similar opinions about things like schema versioning, and very different opinions about things like wire format, protocol, performance tradeoffs, etc. Bond is essentially a serialization framework that keeps the schema logic stuff the same, but making the tasks like wire format, protocol, etc., highly customizable and pluggable. The idea being that instead of deciding ProtoBuffs isn’t right for you, and tearing it down and starting Thrift from scratch, you just change the parts that you don’t like, but keep the underlying schema logic the same.

In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.

The way this works, roughly, is as follows. For most serialization systems, the workflow is: (1) you declare a schema, and (2) they generate a bunch of files with source code to de/serialize data, which you can add to a project and compile into programs that need to call functions that serialize and deserialize data.

In Bond, you (1) declare a schema, and then (2) instead of generating source files, Bond will generate a de/serializer using the metaprogramming facilities of your chosen language. So customizing your serializer is a matter of using the Bond metaprogramming APIs change the de/serializer you’re generating.

[+] bradleyankrom|11 years ago|reply

One key differentiator is the limited set of languages Bond currently supports:

"By design Bond is language and platform independent and is currently supported for C++, C#, and Python on Linux, OS X and Windows."

Versus Thrift:

"language bindings - Thrift is supported in many languages and environments C++ C# Cocoa D Delphi Erlang Haskell Java OCaml Perl PHP Python Ruby Smalltalk"

[+] _asummers|11 years ago|reply

Or something like CBOR or JSONB?

[+] drivingmenuts|11 years ago|reply

And yet they still can't build a web page that isn't a shitshow.

Main content has horizontal scroll on portrait monitors, which underlaps the transparent fixed div they used for navigation.

41 comments