The article massively undersells the information content of the genome in several key ways. A non-comprehensive list of these (before my morning coffee forgive me) includes:
- Combinatorial effect (massive) of different alleles in complex systems
Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.
Exactly. The compression level of DNA is magnitudes better than anything we can even come close to. DNA usually doesn't even contain specific counts (like 5 fingers on hand) or sizes of organs and so on - these are given by the processes that run in parallel and cause the cells to hit spatial / chemical / electrical or other limits. It's like putting lots of house builders on specific places where the house should be and each one would just keep building a wall until the he hits another one. There is no compressed house plan, it's a compressed "engine" that builds the result.
You put my reaction to this in much more educated terms. I’ve always felt that thinking of DNA as bits was a bit simplistic. Just because we store information as bits it doesn’t mean that nature does.
Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.
He does mention structual interactions as well as duplications/deletions/inversions. I would argue methylation is more like an annotation of DNA and not part of the DNA itself, but that's a matter of opinion.
In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.
We know now that environmental factors change how DNA is expressed as well through epigenetics.
I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.
This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.
All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.
So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.
Information can only be defined with respect to states where you 1. Can tell (or could in theory tell) the difference and 2. Care about the difference between states. The differences you care about, and the ones you don't, are baked in whenever you use any definition of information.
It doesn't matter much, unless you use it to sneak in what you think we should care about, or use it to make philosophical arguments whose circularity is carefully hidden.
I thought the main advantage of DNA storage was the physical size of it, and how many different genomes you could have stacked next to each other in the same -70degree space.
Millions of chimeric cells on the same petri dish? That's 1PB on a single glass slide.
Depending on the sequencing tech paired with the rise of Spatial data, the read speed could be formidable.
Needlessly complex setup though. Let's just stick with metals for now.
I would like to get a reasonably good intuition in regards to the total amount of compound DNA from human bodies at different biochemical states, in different locations around the world (different climates).
By "compound DNA" I mean, including DNA of bacterium, fungi and viruses living within one's body. For instance, gut bacteria acquired and maintained based on food intake and environmental influence.
> But mitochondrial DNA is tiny so I won’t mention it again.
Which is a bummer because it is circular. There is also a point on the strand where two separate genes overlap. The end of one has the same code as the beginning of another.
So even DNA has it's own native compression scheme.
Man, the back and forth here before coffee is actually kinda hilarious - I get all worked up before caffeine too, but honestly, DNA being this messy scratchpad feels way more interesting than treating it like a tidy CD. The messiness kinda rules, if you ask me.
- Cells work like this because DNA is under constant attack from mutations.
- Mutations most commonly arise during cell replication.
It's fascinating to realize that the "messiness" of DNA isn't a bug, but a feature—a side effect of evolution's raw material supply chain.
Mutations, repeats, transposons, and imperfect repairs all contribute to a noisy genomic landscape. But it's exactly this noise that enables biological diversity. No mutations, no variation. No variation, no selection. No selection, no evolution.
The genome is not a blueprint—it's a living, adapting scratchpad. Messiness is the canvas on which nature paints diversity.
Let me add to that. It requires a universe with specific laws that remain stable and encourage optimization. Then, a planet hospitible to life. Then, specific creatures with biological machinery more complex than anything humans have created. The machinery has plenty of reliability and adaptation baked in.
Godless evolution suggests randomness produced all of it overtime. Yet, that's never worked in anything we've built. Even our GA's required laws, an environment, a computer, software, and fine-tuning. Pre-existing or by intelligent design (human inventors). Without these, it produced no results.
So, I'll correct you by saying empirical data suggests evolution didnt produce this. We're seeing God's design skills in adaptive, resilient, complex, self-replicating systems. His work is truly beautiful to behold. Humans still can't produce something similar from scratch. Actually, they can't even be sure how the existing design works.
DNA contains all of the actualy relevant information that exists, including whatever sequence gives rise to the very conceptualisation of information, so in fact everything else that could be considered "information" is derived from DNA.
The article says that DNA is designed to keep working despite mutations occuring. What evidence does the author put forward to suppose it was designed rather than evolved? There's plenty of evidence to support it evolved BTW.
In any case, 6.2 billion bits (interestingly enough, almost exactly as much information which is on an audio CD which you used for your romantic mixtapes) is an upper bound.
This rules out pretty much every nutty theory which evolutionary psychologists propose. Such as we evolved for altruism, we evolved to believe in religion, etc etc. Complete B.S. Exactly how much information would you need to specify a behavior like being predisposed to a belief in religion??? There's less than 80 minutes worth of music's worth of information in our genomes, and most of that is concerned with just keeping us alive.
You are not predisposed to be anything. Go create the kind of person you want to be.
> There's less than 80 minutes worth of music's worth of information
Or awful lot of text information (state of art compressors can do up to 1:10 ratio for plain text, decoder itself is rather small, 750MB compressed could potentially contain like 7GB of text data).
Also, look at demoscene. 4k (4 kB is the size of executable) can do crazy things, and 64kB can fit a lot of nice 3D objects, music, text, complex effects etc. weight less than any screenshot of any moment of running demo. In 95kB you can have full game (google kkringer)
> There's less than 80 minutes worth of music's worth of information in our genomes
That’s a very misleading take, this is lossless audio and the majority of the bits are spent encoding noise. You can encode way more audio at perceptually but not technically lossless level in that space.
> There's less than 80 minutes worth of music's worth of information in our genomes
What an insanely bad take.
Not only did you not read and/or comprehend the article, the article itself undersells the information content of the genome (I'll post on this at the top level).
> You are not predisposed to be anything.
This does not logically follow your preceding statement, even if we were to accept the foregoing limited information content as factual
This isn't a great argument - simple rules can produce complicated behavior and, at any rate, I don't think any evpsych people believe that evolution inescapably predisposes people to the things you talked about, only that evolution has produced biases in our behavior which manifest (at certain times and in certain circumstances) as those phenomena.
chuckle hello dissenters, downvotes, and doubters :-) Seems like a lot of the objections stem from a few misconceptions:
1. You can dramatically increase the amount of information stored by compression. Uh, no. Information content is measured, as it were, "post-compression". There is a limit to how much information you can store in a gigabyte, and that limit is--a gigabyte.
2. Information content is not just stored in the DNA, but also in all of the ancillary proteins, etc. Well, the the information on how to create the proteins themselves are stored in DNA. Any additional information, then, has to be contributed by the environment. But that is exactly my point--environment matters way more than the information stored in the DNA.
3. "you are wrong", "you are a mathematical ignoramus" etc etc is ad hom. It is not a valid argument, contributes nothing to the conversation, and is not a good look. If you disagree let's see some math.
4. No matter how much information you think is stored in DNA, the amount of information stored in your brain is at least 5 orders of magnitude larger. The information you can learn swamps out any predispositions you might have. Go become the kind of person you want to be.
An audio CD is a very inefficient way of storing information.
I think a more apt comparison would be that of a LLM of that size. qwen:0.5b is about 400MB, its abilities are laughable compared to the likes of ChatGPT, but it can write coherently about general topics. For instance.
>>> why would people be altruistic
People are likely to be altruistic because they believe that helping others is better for everyone involved.
People may also believe in the power of compassion and empathy towards others, which can contribute to greater altruism.
Overall, people are likely to be altruistic because they believe that helping others is better for everyone involved.
It is not a statement about LLMs, more about what you can achieve with "just" 400MB for storage. The other similarity is that LLMs are also "messy", if you want to see the results of finely crafted work in a really small amount of space, look at what sizecoders can do with a few kB of code or less.
[+] [-] chromatin|10 months ago|reply
- DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)
- Interactions of alleles (what article refers to as the "two versions of each base pair")
- Duplications, deletions, inversions, and other structural variations (https://www.genome.gov/genetics-glossary/Structural-Variatio...)
- Physical proximity interactions in 3-dimensional space (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)
- Combinatorial effect (massive) of different alleles in complex systems
Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.
[+] [-] Daniel_sk|10 months ago|reply
[+] [-] clickety_clack|10 months ago|reply
Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.
[+] [-] deng|10 months ago|reply
In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.
[+] [-] foobarian|10 months ago|reply
[+] [-] lotharcable|10 months ago|reply
We know now that environmental factors change how DNA is expressed as well through epigenetics.
I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.
This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.
All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.
So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.
[+] [-] moralestapia|10 months ago|reply
1. Maaaaybe you could make a case for DNA methylation, but that still requires some DNA signatures so ...
[+] [-] unknown|10 months ago|reply
[deleted]
[+] [-] stenl|10 months ago|reply
[+] [-] vintermann|10 months ago|reply
It doesn't matter much, unless you use it to sneak in what you think we should care about, or use it to make philosophical arguments whose circularity is carefully hidden.
[+] [-] tringuyen_cse|10 months ago|reply
[+] [-] tetris11|10 months ago|reply
Millions of chimeric cells on the same petri dish? That's 1PB on a single glass slide.
Depending on the sequencing tech paired with the rise of Spatial data, the read speed could be formidable.
Needlessly complex setup though. Let's just stick with metals for now.
[+] [-] out_of_protocol|10 months ago|reply
[+] [-] gfalcao|10 months ago|reply
[+] [-] gfalcao|10 months ago|reply
[+] [-] unknown|10 months ago|reply
[deleted]
[+] [-] timewizard|10 months ago|reply
Which is a bummer because it is circular. There is also a point on the strand where two separate genes overlap. The end of one has the same code as the beginning of another.
So even DNA has it's own native compression scheme.
[+] [-] xvilka|10 months ago|reply
[1] https://en.m.wikipedia.org/wiki/Xeno_nucleic_acid
[+] [-] amelius|10 months ago|reply
How much information can you __store__ in DNA without affecting the organism too much?
[+] [-] timewizard|10 months ago|reply
[+] [-] roxolotl|10 months ago|reply
Pretty sure the substack and main site are the same. First paragraph is at least.
[+] [-] gitroom|10 months ago|reply
[+] [-] RainbowcityKun|10 months ago|reply
It's fascinating to realize that the "messiness" of DNA isn't a bug, but a feature—a side effect of evolution's raw material supply chain.
Mutations, repeats, transposons, and imperfect repairs all contribute to a noisy genomic landscape. But it's exactly this noise that enables biological diversity. No mutations, no variation. No variation, no selection. No selection, no evolution.
The genome is not a blueprint—it's a living, adapting scratchpad. Messiness is the canvas on which nature paints diversity.
[+] [-] esafak|10 months ago|reply
[+] [-] nickpsecurity|10 months ago|reply
Godless evolution suggests randomness produced all of it overtime. Yet, that's never worked in anything we've built. Even our GA's required laws, an environment, a computer, software, and fine-tuning. Pre-existing or by intelligent design (human inventors). Without these, it produced no results.
So, I'll correct you by saying empirical data suggests evolution didnt produce this. We're seeing God's design skills in adaptive, resilient, complex, self-replicating systems. His work is truly beautiful to behold. Humans still can't produce something similar from scratch. Actually, they can't even be sure how the existing design works.
[+] [-] frshOffTheBoat|10 months ago|reply
[deleted]
[+] [-] metalman|10 months ago|reply
[+] [-] nuc1e0n|10 months ago|reply
[+] [-] iamtheworstdev|10 months ago|reply
[+] [-] decremental|10 months ago|reply
[deleted]
[+] [-] rhelz|10 months ago|reply
This rules out pretty much every nutty theory which evolutionary psychologists propose. Such as we evolved for altruism, we evolved to believe in religion, etc etc. Complete B.S. Exactly how much information would you need to specify a behavior like being predisposed to a belief in religion??? There's less than 80 minutes worth of music's worth of information in our genomes, and most of that is concerned with just keeping us alive.
You are not predisposed to be anything. Go create the kind of person you want to be.
[+] [-] out_of_protocol|10 months ago|reply
Or awful lot of text information (state of art compressors can do up to 1:10 ratio for plain text, decoder itself is rather small, 750MB compressed could potentially contain like 7GB of text data).
Also, look at demoscene. 4k (4 kB is the size of executable) can do crazy things, and 64kB can fit a lot of nice 3D objects, music, text, complex effects etc. weight less than any screenshot of any moment of running demo. In 95kB you can have full game (google kkringer)
P.S. better example: full snake game in 56 BYTES https://github.com/donno2048/snake
For comparation the link above is 34 bytes, whole sentence is 83 bytes. It's possible to do a lot if we're talking about code
[+] [-] ruuda|10 months ago|reply
That’s a very misleading take, this is lossless audio and the majority of the bits are spent encoding noise. You can encode way more audio at perceptually but not technically lossless level in that space.
[+] [-] guilbep|10 months ago|reply
[+] [-] chromatin|10 months ago|reply
What an insanely bad take.
Not only did you not read and/or comprehend the article, the article itself undersells the information content of the genome (I'll post on this at the top level).
> You are not predisposed to be anything.
This does not logically follow your preceding statement, even if we were to accept the foregoing limited information content as factual
[+] [-] nathan_compton|10 months ago|reply
[+] [-] rhelz|10 months ago|reply
1. You can dramatically increase the amount of information stored by compression. Uh, no. Information content is measured, as it were, "post-compression". There is a limit to how much information you can store in a gigabyte, and that limit is--a gigabyte.
2. Information content is not just stored in the DNA, but also in all of the ancillary proteins, etc. Well, the the information on how to create the proteins themselves are stored in DNA. Any additional information, then, has to be contributed by the environment. But that is exactly my point--environment matters way more than the information stored in the DNA.
3. "you are wrong", "you are a mathematical ignoramus" etc etc is ad hom. It is not a valid argument, contributes nothing to the conversation, and is not a good look. If you disagree let's see some math.
4. No matter how much information you think is stored in DNA, the amount of information stored in your brain is at least 5 orders of magnitude larger. The information you can learn swamps out any predispositions you might have. Go become the kind of person you want to be.
[+] [-] nurettin|10 months ago|reply
[+] [-] GuB-42|10 months ago|reply
I think a more apt comparison would be that of a LLM of that size. qwen:0.5b is about 400MB, its abilities are laughable compared to the likes of ChatGPT, but it can write coherently about general topics. For instance.
It is not a statement about LLMs, more about what you can achieve with "just" 400MB for storage. The other similarity is that LLMs are also "messy", if you want to see the results of finely crafted work in a really small amount of space, look at what sizecoders can do with a few kB of code or less.