Hobbes – A language and an embedded JIT compiler

[+] zoom6628|8 years ago|reply

This should be of great interest to anybody working with high volume sensor data in IOT field. I will be checking it out very soon (on a few deadlines now so cant afford the diversion). Anything that is fast and simple for massive volumes of small record types should always be of great interest to IOT/WOT folks.

I agree with the comments about KDB/Q - tried to look at it but could never understand it. Maybe cos im not fulltime dev, nor in that field, but KDB just always becomes too hard. AT least hobbes looks like i can work it into C++. Looks at first 'parsing' to be a case of right tool for the job of working on high volume discrete data which one would expect from an investment bank.

Kudos to the bank for releasing this part of their secret-sauce.

[+] kthielen|8 years ago|reply

Thanks, I hope to hear about your experience if/when you look into this (and I'm happy to help in any way I can). :)

[+] kthielen|8 years ago|reply

I started the hobbes project at MS and am happy to answer any questions that folks have about it!

[+] willtim|8 years ago|reply

Are you able to comment on your implementation of row types?

You give the following type for a field selector:

> :t .foo

(a.foo::b) => (a) -> b

These give constraints with arbitrary types embedded in them. Most examples in the literature using qualified types, suggest something like the following type:

(r lacks foo) => { foo :: b | r } -> b

Where r is of kind row. The lacks constraint here is a very simple constraint and it is clear that this function takes a record, the disadvantage is that predicate lists can get very large. The old Haskell TREX extension built into Hugs worked this way.

It would be great to learn some background behind your choice. Thanks.

[+] ah-|8 years ago|reply

What motivated you to start building your own language? Did you use much kdb/q before?

How did you manage to get this open sourced?

[+] Volt|8 years ago|reply

There was a paper a while back from someone working with MS on a Q typechecker in Prolog. Was this you?

[+] eagle2001|8 years ago|reply

The LtU blurb says "a variant of Haskell", but the github readme does not mention non-strict. Strict or not?

[+] obl|8 years ago|reply

any plan to move beyond "allocate linearly and never free anything" memory management ?

semantics look nice and coherent but unfortunately without some kind (automatic or manual) of memory management it's not usable as a general purpose tool.

[+] nurettin|8 years ago|reply

All embedable languages should come with their own header parser/code generator to save us from the hassle of generating a bunch of class wrappers and boilerplate registration code. I think this high up in the list of concerns in a professional setting.

[+] willtim|8 years ago|reply

This looks like a typed take on KDB/Q, something that is long overdue! The key to making this work is structural typing of rows (row polymorphism / extensible records). I'd be interested to see more details on how they tackled this.

[+] amiramir|8 years ago|reply

Morgan Stanley has a history of languages in the vein. MS was a big APL shop in the 80s. Arthur Whitney[1] worked at MS and developed the languages A, and A+. He later wrote J, and K.

[1] https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scien...

[+] kthielen|8 years ago|reply

Yes, structural typing of records (and variants and recursives) has been very valuable to us.

There are a couple of type constraints (ala type class constraints) that make most of this work.

There's a constraint that looks like 'r={lbl:h * t}' that decides if/how a type "r" can be broken into a head field of type "h" (with a field named "lbl", this type variable gets the field name and can be used to e.g. relate one record type to another, like for deciding structural subtyping conversion), and a successor record type "t" (this bottoms out in the unit type).

There's also a constraint "r.f::x" (or we write "r/f::x" if "f" is a type variable) which decides if/how the type "r" can be indexed by the field name "f" to determine a type "x".

(Field access can also be overloaded, and we have some pretty awesome use-cases for that, hopefully I can put up some examples of that shortly.)

So e.g. we can use the destructuring constraint to make a generic print function for all record types, like:

  instance (r={lbl:h*t}, Print h, Print t) => Print r where
    print x = do{putStr(lbl++":");println(recordHead(x));print(recordTail(x));}

The field access constraint makes basic function definitions trivially "duck typed". So e.g. if I write ".jimmy" (shorthand for "\x.x.jimmy" or if you prefer Haskell syntax this would be like "\x -> x.jimmy") then it will select the "jimmy" field of whatever type it is given, however that is done for that particular type.

There's a lot more to say on this point but maybe it should be turned into its own doc on the site...

[+] zokier|8 years ago|reply

I'd love to see more about this, seems very interesting language. Something to explain how/where this is/could be used, maybe few more complete examples to show the language etc. Also some notes about performance would be great, I guess it is reasonably fast by the looks of it, but that is pretty vague.

[+] kthielen|8 years ago|reply

Yes the whole project started because we needed dynamically-determined logic (in various places) that could fit in very tight latency budgets. And the output of this compiler is put in a critical trading path that sees a very large portion of all daily US equity trades.

So there is a lot to say about performance.

[+] jitl|8 years ago|reply

I wish there was a more clear language tutorial, instead of mixing embedding tutorial with the language itself.

[+] kthielen|8 years ago|reply

Yes that's a good point, I do hope to build out more targeted documentation and example programs over the coming weeks.

[+] nnq|8 years ago|reply

Is this a "pure" language like Haskell? Or it's more like OCaml?

And, even if it's not pure, is there any syntactic sugar for monads?

Anyway, this looks awesome, especially with easy C++ interfacing that seems to be there: a system where you can have machine learning (anything from bayesian to deep nns) code in C++ and business logic in something Haskell-like is the stuff of wet dreams...

[+] kthielen|8 years ago|reply

It's not pure currently, so definitely more like OCaml.

We make frequent use of overloaded array comprehensions, which is one form that monad syntax can take. The "do" notation introduced by Haskell isn't supported right now, but maybe soon. I'd like to refactor the parser code a bit anyway.

[+] gjem97|8 years ago|reply

Can someone explain variants vs sums vs tuples vs records? This sentence from the README confuses me: "We can combine types with variants or sums (the "nameless" form of variants, as tuples are to records)."

[+] kthielen|8 years ago|reply

Sure, I guess "record type" makes sense? Basically a record type is a "struct" or a series of values with particular names (the "field names"). A tuple is the same thing, but by convention the "names" are by position (so you have the "0th field", the "1th field", etc).

A variant can be _one of_ a series of values (where a record is _all of_ a series of values), and each of the things can have a particular name (the "constructor names"). As with tuples to records, a sum type is just a variant where you don't care to name the constructors (so you have the "0th constructor", the "1th constructor", etc).

A common example of a variant is an "enum" type, where here the payload types are trivial (just the unit type) and only the constructor names matter. In hobbes we might write the type "|red:(),green:(),blue:()|" or the shorter syntax "|red,green,blue|" for an enum that we might write in C as "enum { red, green, blue }". You could also represent this same type as the sum "()+()+()" but that looks a little weird/profane and maybe not clear to other programmers.

Tuples are sometimes called "product types" and that connects well with sum types and the general logical/combinatorial interpretation of types. If you see the type "bool * bool" (or perhaps "2 * 2"?) then it means "a bool AND a bool" and there are four such values. Similarly if you see the type "bool + bool" then it means "a bool OR a bool" and it's either the first or the second one.

If you're interested in the general area of type theory, you might like the book "Types and Programming Languages" by Ben Pierce.

[+] keenerd|8 years ago|reply

> you will need LLVM 3.3 or later

Though it doesn't build with 4.0. (And is now in the AUR.)

Nice to see first-class parsing in a language.

[+] kthielen|8 years ago|reply

OK, there's now a PR in the queue to allow hobbes to build with LLVM 4.0+. Pending review, it will likely be merged to master soon.

[+] kthielen|8 years ago|reply

Doh you're right. We will need to look into LLVM 4.0+ soon.

[+] lostmsu|8 years ago|reply

Examples do not look very readable.

[+] beagle3|8 years ago|reply

TL;DR: They do not look readable to you because they are a "foreign language" you are unfamiliar with, much like Japanese is to most westerners.

The vast majority of languages are essentially minor syntax "skins" applied to the same underlying Algol ideas; This describes C, Pascal, Python, Ruby, Perl, Java and most other languages - which is why being proficient in one lets you read others. Haskell, Ocaml are close enough to read, but usually not write, unless you are specifically proficient in one of them.

APL (and J and K and likely Hobbes) are very different, both in concept and syntax -- to the point that you actually have to know one of them to be able to read any of them. They look intimidating at first, and they often take longer (per character / per line) to read even when you do know them, but they are very orthogonal; e.g. in K, the expression |/0(0|+)\x efficiently computes the maximum subarray sum of the array x; this is not a predefined program - rather, each character here carries a meaning, and all combined they solve the mss problem.

If you looked at Japanese[1], it would look unreadable to you, but that's not because there's a problem with Japanese.

[0] https://en.wikipedia.org/wiki/Maximum_subarray_problem

[1] Pick any language you do not know with a character set you don't know - Japanese, Arabic, Farsi, Hebrew, Thai ...

[+] kthielen|8 years ago|reply

Thanks, I do have it in the queue to make more targeted examples. I realize that what's there is a little scattershot and each of the use-cases mentioned could use a lot more detail.

For example, inside MS we have these reactive hobbes processes that watch log data and incrementally build B-trees to deliver future query responses very quickly. I hope to put some examples like that up in the coming weeks.

49 comments