top | item 40839322

(no title)

Zondartul | 1 year ago

It's convienient because any random collection letters is a "valid" brainfuck program.

discuss

order

dekhn|1 year ago

Interesting! That's analogous to some recent work in chemistry. Historically, there was a string representation for molecules called SMILES, which is fairly terse and, when canonicalized, maps from strings to individual molecules (2D topology). However, not all strings are valid SMILES. Recent work with autoencoders to turn SMILES strings into a vector representation via embedding creates models that often generate invalid SMILES strings (the popular paper about this glosses over this fact). For example, if your training set includes both bromine (represented as Br) and chlorine (represented as Cl) and you generate random vectors, they might decode to contain Bl, which is an as-yet unmade element. This is not desirable (although opinions vary).

As a result, the group that published the earlier work developed a new compact representation known as SELFIES (https://github.com/aspuru-guzik-group/selfies) where it's effectively impossible to generate invalid decodings of strings (every SELFIE string decodes to a valid molecule).

I'm not sure what the terminology for these sorts of features of encodings.

blackbear_|1 year ago

At the risk of bringing this analogy too far, there was a recent paper [1] arguing that the ability to generate invalid SMILES is beneficial and allows the model to more accurately represent the target distribution compared to SELFIES.

This could be similar to mutations allow one to explore a wider range of options, although sometimes it can go too far and get a non-functional individual.

[1]: https://www.nature.com/articles/s42256-024-00821-x

optimalsolver|1 year ago

StupidStackLanguage also has this pleasing property:

https://esolangs.org/wiki/StupidStackLanguage

metadat|1 year ago

Wow, some of those programs are actually interesting, I want to try cat (implemented as simply "jf"), and some of the other ones further down the list.

I almost can't believe this is my first time encountering StupidStackLanguage.

29athrowaway|1 year ago

The high amount of operators makes it harder to mutate the program iteratively into something that does what you want.

stephen_cagle|1 year ago

Is every program valid? What if the "end loop" occurs before the "begin loop"?

LegionMammal978|1 year ago

As the language is usually presented, it has the syntactic requirement that all square brackets are properly paired. Of course, many interpreters (including the original one IIRC) don't fully validate this, but they will have inconsistent behavior on unpaired brackets. Here, the authors had to further specify that jumping from an unpaired bracket causes the program to halt.

29athrowaway|1 year ago

I think "]" requires to be preceded by "[".

Depending how to handle "<", and ">" you can have out of bounds errors. And "+" and "-" can lead to overflow/underflow. Unless you wrap or saturate values.

Some programs can also lead to infinite loops.

So there are "invalid" programs for these purposes. What I do is to mark the program as invalid when these problems are present. For infinite loop detection, I give the program a limited amount of iterations and if it doesn't finish when the max limit is reached the program is marked as invalid.