As someone with a background in computer programming, I found biology quite frustrating. I wanted to see some underlying structure or meaning in the genetic code. It took me a long time to realize that it wasn't there. Biological systems and their genetic codes are the result of random trial and error filtered through Natural selection. Biological systems clearly have a design because you see the same "form" reappear in each generation, but the design itself is completely random. In other words, the design has to work, but it does not have to be sensible.
To summarize, the design of biological systems is random. It was a hard pill for me to swallow, and an utterly dissatisfying one at that.
One of my favorite things about experimenting like this is that I learn new things from logical systems I create. Spin things differently and you learn a whole new set of ideas. I had no idea how much junk code that biology would allow in genetics until I made this thing.
> To summarize, the design of biological systems is random. It was a hard pill for me to swallow.
There are so many things about our universe that are very non-random, such as the physical laws, and more specifically the spherical shapes of planets/stars, the flatness of their surfaces, the flow of energy from stars to planets, etc.
An analogy could be that life forms randomly evolving with these constraints are like water particles randomly floating through a river. There are plenty of variations in where the particles can go, but they flow in a general direction.
You can see this with a sim called DarwinBots. Hand-crafted code is as clever as it can be. Code evolved from zero is a bunch of gibberish that just happens to hit the right spots here and there. The key is reverse engineering what effect the evolved code is actually going after.
Ironically, the theoretically optimal programs for computers would not be sensible for human understanding as well, so it is probably not that different.
It's been a decade since I have written a BF program, but if I'm not mistaken, the winning strand in the gif example has quite a bit of dead code, for example here:
[[]+<-,<<.><+>+,]
This entire loop must always be skipped, because if entered, the inner [] loop will also enter, and run indefinitely.
It would be cool to run it through more iterations, to get to the local minimum of length, rather than just a correct program.
This appears to be GA rather than GP, but I did some experiments with GP a while ago and the most counterintuitive thing I discovered was how essential junk code turns out to be.
First, for anyone unaware of the difference, in Genetic Algorithms (GA) you have fixed-length strings of instructions. Genetic Programming (GP) is a superset in which you have ASTs, which can get arbitrarily large (which can be a problem if you're not careful). I find GP much more interesting than GA, but the practical issues with AST bloat are why you see a lot more GA examples.
Anyway, not wanting to run into overly large ASTs myself, I went ahead and prematurely optimised my GP implementation by having it always collapse functions into simpler forms where possible, e.g. (+ (+ 1 2) 3) becomes (+ 6).
It turns out that this causes huge problems, because all that "junk" code is actually useful during evolution. Each generation inherits parts of its parents (called "crossover"), but if you cut down the information available to inherit as-per the above optimisation, you get much less variety between the generations and become more reliant on random mutations. As a result, it's like falling back on bogosort to find your solution.
Once you get down into final generations you can start running optimisations like this and removing dead code, but if you do it too soon, you won't reach a solution convergence quickly, if at all.
Interestingly, this principle seems to apply to DNA too, but DNA doesn't come with an optimiser. Hence most DNA is "junk" but the junk is actually more important than it seems.
It also turns out that genetically optimising (as opposed to hand-coding the optimisation rules, like I did) is non-trivial too. You can either adjust your main cost function to account for code size/speed alongside the problem you're trying to solve, or you can do a second pass that aims to optimise the code after you've already found an acceptable solution. Either way, there's a certain amount of black magic involved to decide how many iterations to run, or what constitutes a "good enough" or "fast enough" or "small enough" solution so you can decide when to stop.
Code size is also irrelevant in GA because the code is always the same size, but different instructions might have different costs, with NOPs having the lowest cost. If you allow goto or recursion in your VM then you have to impose a runtime constraint to avoid infinite loops, and can also assign a cost per instruction executed or per microsecond of runtime.
Yeah, the next thing planned for it is user-settable hunger. Then, making more and more efficient code is simply a matter of following the unix philosophy
corvus -h 500 | corvus -h 300| corvus -h 200
So on and so forth as you get closer to [,.]
The dead code part was a organic artifact, and I actually don't think that I want to change that part. It sorta makes you wonder about how much dead code is in real DNA.
jostmey|11 years ago
To summarize, the design of biological systems is random. It was a hard pill for me to swallow, and an utterly dissatisfying one at that.
anateus|11 years ago
r00nk|11 years ago
fragsworth|11 years ago
There are so many things about our universe that are very non-random, such as the physical laws, and more specifically the spherical shapes of planets/stars, the flatness of their surfaces, the flow of energy from stars to planets, etc.
An analogy could be that life forms randomly evolving with these constraints are like water particles randomly floating through a river. There are plenty of variations in where the particles can go, but they flow in a general direction.
FranOntanaya|11 years ago
kozlinov|11 years ago
j42|11 years ago
akkartik|11 years ago
reitzensteinm|11 years ago
[[]+<-,<<.><+>+,]
This entire loop must always be skipped, because if entered, the inner [] loop will also enter, and run indefinitely.
It would be cool to run it through more iterations, to get to the local minimum of length, rather than just a correct program.
robert_tweed|11 years ago
First, for anyone unaware of the difference, in Genetic Algorithms (GA) you have fixed-length strings of instructions. Genetic Programming (GP) is a superset in which you have ASTs, which can get arbitrarily large (which can be a problem if you're not careful). I find GP much more interesting than GA, but the practical issues with AST bloat are why you see a lot more GA examples.
Anyway, not wanting to run into overly large ASTs myself, I went ahead and prematurely optimised my GP implementation by having it always collapse functions into simpler forms where possible, e.g. (+ (+ 1 2) 3) becomes (+ 6).
It turns out that this causes huge problems, because all that "junk" code is actually useful during evolution. Each generation inherits parts of its parents (called "crossover"), but if you cut down the information available to inherit as-per the above optimisation, you get much less variety between the generations and become more reliant on random mutations. As a result, it's like falling back on bogosort to find your solution.
Once you get down into final generations you can start running optimisations like this and removing dead code, but if you do it too soon, you won't reach a solution convergence quickly, if at all.
Interestingly, this principle seems to apply to DNA too, but DNA doesn't come with an optimiser. Hence most DNA is "junk" but the junk is actually more important than it seems.
It also turns out that genetically optimising (as opposed to hand-coding the optimisation rules, like I did) is non-trivial too. You can either adjust your main cost function to account for code size/speed alongside the problem you're trying to solve, or you can do a second pass that aims to optimise the code after you've already found an acceptable solution. Either way, there's a certain amount of black magic involved to decide how many iterations to run, or what constitutes a "good enough" or "fast enough" or "small enough" solution so you can decide when to stop.
Code size is also irrelevant in GA because the code is always the same size, but different instructions might have different costs, with NOPs having the lowest cost. If you allow goto or recursion in your VM then you have to impose a runtime constraint to avoid infinite loops, and can also assign a cost per instruction executed or per microsecond of runtime.
r00nk|11 years ago
corvus -h 500 | corvus -h 300| corvus -h 200
So on and so forth as you get closer to [,.]
The dead code part was a organic artifact, and I actually don't think that I want to change that part. It sorta makes you wonder about how much dead code is in real DNA.
yuchi|11 years ago
signa11|11 years ago
where a c++ program is 'evolved' to output "hello world"
rekibnikufesin|11 years ago
codezero|11 years ago
r00nk|11 years ago