top | item 43945287

(no title)

chromatin | 9 months ago

The article massively undersells the information content of the genome in several key ways. A non-comprehensive list of these (before my morning coffee forgive me) includes:

- DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)

- Interactions of alleles (what article refers to as the "two versions of each base pair")

- Duplications, deletions, inversions, and other structural variations (https://www.genome.gov/genetics-glossary/Structural-Variatio...)

- Physical proximity interactions in 3-dimensional space (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)

- Combinatorial effect (massive) of different alleles in complex systems

Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.

discuss

order

Daniel_sk|9 months ago

Exactly. The compression level of DNA is magnitudes better than anything we can even come close to. DNA usually doesn't even contain specific counts (like 5 fingers on hand) or sizes of organs and so on - these are given by the processes that run in parallel and cause the cells to hit spatial / chemical / electrical or other limits. It's like putting lots of house builders on specific places where the house should be and each one would just keep building a wall until the he hits another one. There is no compressed house plan, it's a compressed "engine" that builds the result.

Earw0rm|9 months ago

Comparing it to machine code on CD/DVD might make more sense then. Machine code where every line has been hand-optimised by nature's hackers over 500 million years.

And in that context, hundreds of MBs is a heck of a lot of complexity.

clickety_clack|9 months ago

You put my reaction to this in much more educated terms. I’ve always felt that thinking of DNA as bits was a bit simplistic. Just because we store information as bits it doesn’t mean that nature does.

Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.

ses1984|9 months ago

T he raw bits of the base pairs is just one component of the information, but it’s like a maximally compressed version of the info.

The laws of physics are another component.

From there you would need to simulate nature to be able to decompress all the data, like how computer programs can use procedural generation.

Imagine a game like Minecraft. You can generate practically infinitely many screenshots of Minecraft worlds, but all that data can be derived from the game code and the jvm.

deng|9 months ago

He does mention structual interactions as well as duplications/deletions/inversions. I would argue methylation is more like an annotation of DNA and not part of the DNA itself, but that's a matter of opinion.

In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.

kjkjadksj|9 months ago

Exons are almost like functions where as a gene is almost like a class definition. In different tissues in the body a gene might be alternatively spliced to lead to different protein isoforms. In effect, making use of only a subset of available functions in the class depending on certain input parameters or how the class is called.

foobarian|9 months ago

I find that even if this just provides a lower bound it is still an interesting piece of information.

lotharcable|9 months ago

Yeah...

We know now that environmental factors change how DNA is expressed as well through epigenetics.

I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.

This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.

All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.

So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.

moralestapia|9 months ago

But all of those emergent effects are accounted for in the DNA sequence [1], so the estimate is fine.

1. Maaaaybe you could make a case for DNA methylation, but that still requires some DNA signatures so ...