Show HN: A small programming language where everything is pass-by-value
91 points| jcparkyn | 1 month ago |github.com
I started out following Crafting Interpreters, but gradually branched off that until I had almost nothing left in common.
Tech stack: Rust, Cranelift (JIT compilation), LALRPOP (parser).
Original title: "A small programming language where everything is a value" (edited based on comments)
augusteo|1 month ago
I've worked on systems where we spent more time reasoning about shared state than writing actual logic. The typical answer is "just make everything immutable" but then you lose convenient imperative syntax. This sits in an interesting middle ground.
Curious about performance in practice. Copy-on-write is great until you hit a hot path that triggers lots of copies. Have you benchmarked any real workloads?
sheepscreek|1 month ago
Use immutable pass by reference. Make a copy only if mutability is requested in the thread. This makes concurrent reads lock-free but also cuts down on memory allocations.
jcparkyn|1 month ago
Nothing "real", just the synthetic benchmarks in the ./benchmarks dir.
Unnecessary copies are definitely a risk, and there's certain code patterns that are much harder for my interpreter to detect and remove. E.g. the nbodies has lots of modifications to dicts stored in arrays, and is also the only benchmark that loses to Python.
The other big performance limitation with my implementation is just the cost of atomic reference counting, and that's the main tradeoff versus deep cloning to pass between threads. There would definitely be room to improve this with better reference counting optimizations though.
MetricExpansion|1 month ago
rao-v|1 month ago
zahlman|1 month ago
jagged-chisel|1 month ago
ekipan|1 month ago
I've only read the first couple paragraphs so far but the idea reminds me of a shareware language I tinkered with years ago in my youth, though I never wrote anything of substance: Euphoria (though nowadays it looks like there's an OpenEuphoria). It had only two fundamental types. (1) The atom: a possibly floating point number, and (2) the sequence: a list of zero or more atoms and sequences. Strings in particular are just sequences of codepoint atoms.
It had a notion of "type"s which were functions that returned a boolean 1 only if given a valid value for the type being defined. I presume it used byte packing and copy-on-write or whatever for its speed boasts.
https://openeuphoria.org/ - https://rapideuphoria.com/
p1necone|1 month ago
I've got a hobby language that combines this with compile time code execution to get static typing - or I should say that's the plan, it's really just a tokenizer and half of a parser at the moment - I should get back to it.
The cool side effect of this is that properly validating dynamic values at runtime is just as ergonomic as casting - you just call the type function on the value at runtime.
jcparkyn|1 month ago
fjfaase|1 month ago
[1] https://github.com/FransFaase/IParse/?tab=readme-ov-file#mar...
[2] https://www.iwriteiam.nl/D1801.html#7
[3] https://github.com/FransFaase/DataLang
discarded1023|1 month ago
Things of course become a lot more fun with concurrency.
Now if you want a language where all the data thingies are immutable values and effects are somewhat tamed but types aren't too fancy etc. try looking at Milner's classic Standard ML (late 1970s, effectively frozen in 1997). It has all you dream of and more.
In any case keep having fun and don't get too bogged in syntax.
bayesnet|1 month ago
doug-moen|1 month ago
jcparkyn|1 month ago
> Standard ML [...] It has all you dream of and more
The main thing here that's missing in Standard ML (and most other functional languages) is the "mutable" part of "mutable value semantics" - i.e., the ability to modify variables in-place (even nested parts of complex structures) without affecting copies. This is different from "shadowing" a binding with a different value, since it works in loops etc.
DemocracyFTW2|1 month ago
netbioserror|1 month ago
vrighter|1 month ago
tromp|1 month ago
But those go further in that they don't even have any mutable data. Instead of
Haskell hasjcparkyn|1 month ago
- All functions are still referentially transparent, which means we get all the local reasoning benefits of pure functions. - We can mutate local variables inside loops (instead of just shadowing bindings), which makes certain things a lot easier to write (especially for beginners). - Mutating nested fields is super easy: `set foo.bar[0].baz = 1;` (compare this to the equivalent Haskell).
electroly|1 month ago
In practice I have found that it's very painful to thread state through your program. I ended up offering global variables, which provide something similar to but worse than generalized reference semantics. My language aims for simplicity so I think this may still be a good tradeoff, but it's tricky to imagine this working well in a larger user codebase.
I like that having only value semantics allows us, internally, to use reference counted immutable objects to cut down on copying; we both pass-by-reference internally and present it as pass-by-value to the programmer. No cycle detection needed because it's not possible to construct cycles. I use an immutable data structures library[2] so that modifications are reasonably efficient. I recommend trying that in Herd; it's almost always better than copy-on-write. Think about the Big-O of modifying a single element in an array, or building up a list by repeatedly appending to it. With pure COW it's hard to have a large array at all--it takes too long to do anything with it!
For the programmer, missing reference semantics can be a negative. Sometimes people want circular linked lists, or to implement custom data structures. It's tough to build new data structures in a language without reference semantics. For the most part, the programmer has to simulate them with arrays. This works for APL because it's an array language, but my BASIC has less of an excuse.
I was able to avoid nearly all reference counting overhead by being single threaded only. My reference counts aren't atomic so I don't pay anything but the inc/dec. For a simple language like TMBASIC this was sensible, but in a language with multithreading that has to pay for atomic refcounts, it's a tough performance pill to swallow. You may want to consider a tracing GC for Herd.
[1] https://tmbasic.com
[2] https://github.com/arximboldi/immer
tylerhou|1 month ago
jasperry|1 month ago
jcparkyn|1 month ago
A more fitting example would be to support:
IIRC these both currently require an explicit block in my parser.Panzerschrek|1 month ago
jcparkyn|1 month ago
Just modify the value inside the function and return it, then assign back. This is what the |= syntax is designed for. It's a bit more verbose than passing mutable references to functions but it's actually functionally equivalent.
Herd has some optimisations so that in many cases this won't even require any copies.
> What about concurrent mutable containers?
I've considered adding these, but right now they don't exist in Herd.
jbritton|1 month ago
jcparkyn|1 month ago
travisgriggs|1 month ago
throwaway17_17|1 month ago
However, for Erlang and Elixir ‘pass-by-value’ is otherwise called ‘call-by-value’. In this case, it is a statement that arguments to functions are evaluated before they are passed into the function (often at the call site). This is in opposition to ‘call-by-name/need’ (yes, I know they aren’t the same) which is, for instance, how Haskell does it for sure, and I think Python is actually ‘by-name’ as well.
So, Herd’s usage here is a statement of semantic defaults (and the benefits/drawbacks that follow from those defaults) for arguments to functions, and Elixir’s usage is about the evaluation order of arguments to functions, they really aren’t talking about the same thing.
Interestingly, this is also a pair of separate things, which are both separate from what another commenter was pedantically pointing out elsewhere in the thread. Programming language discussion really does seem to have a mess of terminology to deal with.
zem|1 month ago
throwaway17_17|1 month ago
Although I don’t particularly like the ‘|’ to be used for chaining functions, I certainly know that it has been a long term syntax coming from Unix. My only issue with the ‘|=‘ is that it should be unnecessary. The only reason I can see that the special operator is required is that the ‘set’/‘=‘ syntax pair is a non-functional keyword ‘set’ with an (I think) overloaded keyword ‘=‘. If the equal sign was an ordinary function (i.e. a function that take a value, and an identifier, associates the value and the identifier, then returns the new value like the Lisps and derived lands) it could just be used arbitrarily in chains of functions.
anacrolix|1 month ago
jcparkyn|1 month ago
- We can avoid quite a few allocations in loops by mutating lists/dicts in place if we hold an exclusive reference (and after the first mutation, we always will). Updates to persistent data structures are relatively cheap, but they're a lot more expensive than an in-place update.
- Herd has syntax sugar for directly modifying nested values inside lists/dicts. E.g. `set foo.bar.[0].baz = 1;`.
In practice, is this faster than a different implementation of the same semantics using persistent data structures and a tracing GC? That will depend on your program.
bananasandrice|1 month ago
[deleted]
rvba|1 month ago
So basucally everything is var?
jcparkyn|1 month ago
There are two ways to define a variable binding:
The "default" behaviour (if no keyword is used) is to define a new immutable variable.drnick1|1 month ago
jcparkyn|1 month ago