top | item 43928942

How much information is in DNA?

89 points| crescit_eundo | 10 months ago |dynomight.substack.com | reply

65 comments

order
[+] chromatin|10 months ago|reply
The article massively undersells the information content of the genome in several key ways. A non-comprehensive list of these (before my morning coffee forgive me) includes:

- DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)

- Interactions of alleles (what article refers to as the "two versions of each base pair")

- Duplications, deletions, inversions, and other structural variations (https://www.genome.gov/genetics-glossary/Structural-Variatio...)

- Physical proximity interactions in 3-dimensional space (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)

- Combinatorial effect (massive) of different alleles in complex systems

Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.

[+] Daniel_sk|10 months ago|reply
Exactly. The compression level of DNA is magnitudes better than anything we can even come close to. DNA usually doesn't even contain specific counts (like 5 fingers on hand) or sizes of organs and so on - these are given by the processes that run in parallel and cause the cells to hit spatial / chemical / electrical or other limits. It's like putting lots of house builders on specific places where the house should be and each one would just keep building a wall until the he hits another one. There is no compressed house plan, it's a compressed "engine" that builds the result.
[+] clickety_clack|10 months ago|reply
You put my reaction to this in much more educated terms. I’ve always felt that thinking of DNA as bits was a bit simplistic. Just because we store information as bits it doesn’t mean that nature does.

Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.

[+] deng|10 months ago|reply
He does mention structual interactions as well as duplications/deletions/inversions. I would argue methylation is more like an annotation of DNA and not part of the DNA itself, but that's a matter of opinion.

In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.

[+] foobarian|10 months ago|reply
I find that even if this just provides a lower bound it is still an interesting piece of information.
[+] lotharcable|10 months ago|reply
Yeah...

We know now that environmental factors change how DNA is expressed as well through epigenetics.

I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.

This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.

All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.

So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.

[+] moralestapia|10 months ago|reply
But all of those emergent effects are accounted for in the DNA sequence [1], so the estimate is fine.

1. Maaaaybe you could make a case for DNA methylation, but that still requires some DNA signatures so ...

[+] vintermann|10 months ago|reply
Information can only be defined with respect to states where you 1. Can tell (or could in theory tell) the difference and 2. Care about the difference between states. The differences you care about, and the ones you don't, are baked in whenever you use any definition of information.

It doesn't matter much, unless you use it to sneak in what you think we should care about, or use it to make philosophical arguments whose circularity is carefully hidden.

[+] tringuyen_cse|10 months ago|reply
I have a similar view. The question of how much infomation by itself does not matter without some context/application.
[+] tetris11|10 months ago|reply
I thought the main advantage of DNA storage was the physical size of it, and how many different genomes you could have stacked next to each other in the same -70degree space.

Millions of chimeric cells on the same petri dish? That's 1PB on a single glass slide.

Depending on the sequencing tech paired with the rise of Spatial data, the read speed could be formidable.

Needlessly complex setup though. Let's just stick with metals for now.

[+] out_of_protocol|10 months ago|reply
DNA self-desintegrate very fast. It only works in living cells because it is being repaired non-stop
[+] gfalcao|10 months ago|reply
I would like to get a reasonably good intuition in regards to the total amount of compound DNA from human bodies at different biochemical states, in different locations around the world (different climates). By "compound DNA" I mean, including DNA of bacterium, fungi and viruses living within one's body. For instance, gut bacteria acquired and maintained based on food intake and environmental influence.
[+] gfalcao|10 months ago|reply
In other words, how much the perception of DNA data in gigabytes grow by in different circumstances? Would it grow by a few more gigabytes ?
[+] timewizard|10 months ago|reply
> But mitochondrial DNA is tiny so I won’t mention it again.

Which is a bummer because it is circular. There is also a point on the strand where two separate genes overlap. The end of one has the same code as the beginning of another.

So even DNA has it's own native compression scheme.

[+] amelius|10 months ago|reply
Another question is:

How much information can you __store__ in DNA without affecting the organism too much?

[+] timewizard|10 months ago|reply
Very little. The base pairs have specific electrochemical properties. The content of DNA controls it's structure.
[+] gitroom|10 months ago|reply
Man, the back and forth here before coffee is actually kinda hilarious - I get all worked up before caffeine too, but honestly, DNA being this messy scratchpad feels way more interesting than treating it like a tidy CD. The messiness kinda rules, if you ask me.
[+] RainbowcityKun|10 months ago|reply
- Cells work like this because DNA is under constant attack from mutations. - Mutations most commonly arise during cell replication.

It's fascinating to realize that the "messiness" of DNA isn't a bug, but a feature—a side effect of evolution's raw material supply chain.

Mutations, repeats, transposons, and imperfect repairs all contribute to a noisy genomic landscape. But it's exactly this noise that enables biological diversity. No mutations, no variation. No variation, no selection. No selection, no evolution.

The genome is not a blueprint—it's a living, adapting scratchpad. Messiness is the canvas on which nature paints diversity.

[+] esafak|10 months ago|reply
Don't forget sexual reproduction.
[+] nickpsecurity|10 months ago|reply
Let me add to that. It requires a universe with specific laws that remain stable and encourage optimization. Then, a planet hospitible to life. Then, specific creatures with biological machinery more complex than anything humans have created. The machinery has plenty of reliability and adaptation baked in.

Godless evolution suggests randomness produced all of it overtime. Yet, that's never worked in anything we've built. Even our GA's required laws, an environment, a computer, software, and fine-tuning. Pre-existing or by intelligent design (human inventors). Without these, it produced no results.

So, I'll correct you by saying empirical data suggests evolution didnt produce this. We're seeing God's design skills in adaptive, resilient, complex, self-replicating systems. His work is truly beautiful to behold. Humans still can't produce something similar from scratch. Actually, they can't even be sure how the existing design works.

[+] metalman|10 months ago|reply
DNA contains all of the actualy relevant information that exists, including whatever sequence gives rise to the very conceptualisation of information, so in fact everything else that could be considered "information" is derived from DNA.
[+] nuc1e0n|10 months ago|reply
The article says that DNA is designed to keep working despite mutations occuring. What evidence does the author put forward to suppose it was designed rather than evolved? There's plenty of evidence to support it evolved BTW.
[+] iamtheworstdev|10 months ago|reply
you might be reading a little too much into that word
[+] rhelz|10 months ago|reply
In any case, 6.2 billion bits (interestingly enough, almost exactly as much information which is on an audio CD which you used for your romantic mixtapes) is an upper bound.

This rules out pretty much every nutty theory which evolutionary psychologists propose. Such as we evolved for altruism, we evolved to believe in religion, etc etc. Complete B.S. Exactly how much information would you need to specify a behavior like being predisposed to a belief in religion??? There's less than 80 minutes worth of music's worth of information in our genomes, and most of that is concerned with just keeping us alive.

You are not predisposed to be anything. Go create the kind of person you want to be.

[+] out_of_protocol|10 months ago|reply
> There's less than 80 minutes worth of music's worth of information

Or awful lot of text information (state of art compressors can do up to 1:10 ratio for plain text, decoder itself is rather small, 750MB compressed could potentially contain like 7GB of text data).

Also, look at demoscene. 4k (4 kB is the size of executable) can do crazy things, and 64kB can fit a lot of nice 3D objects, music, text, complex effects etc. weight less than any screenshot of any moment of running demo. In 95kB you can have full game (google kkringer)

P.S. better example: full snake game in 56 BYTES https://github.com/donno2048/snake

For comparation the link above is 34 bytes, whole sentence is 83 bytes. It's possible to do a lot if we're talking about code

[+] ruuda|10 months ago|reply
> There's less than 80 minutes worth of music's worth of information in our genomes

That’s a very misleading take, this is lossless audio and the majority of the bits are spent encoding noise. You can encode way more audio at perceptually but not technically lossless level in that space.

[+] guilbep|10 months ago|reply
There is no logic behind your argument
[+] chromatin|10 months ago|reply
> There's less than 80 minutes worth of music's worth of information in our genomes

What an insanely bad take.

Not only did you not read and/or comprehend the article, the article itself undersells the information content of the genome (I'll post on this at the top level).

> You are not predisposed to be anything.

This does not logically follow your preceding statement, even if we were to accept the foregoing limited information content as factual

[+] nathan_compton|10 months ago|reply
This isn't a great argument - simple rules can produce complicated behavior and, at any rate, I don't think any evpsych people believe that evolution inescapably predisposes people to the things you talked about, only that evolution has produced biases in our behavior which manifest (at certain times and in certain circumstances) as those phenomena.
[+] rhelz|10 months ago|reply
chuckle hello dissenters, downvotes, and doubters :-) Seems like a lot of the objections stem from a few misconceptions:

1. You can dramatically increase the amount of information stored by compression. Uh, no. Information content is measured, as it were, "post-compression". There is a limit to how much information you can store in a gigabyte, and that limit is--a gigabyte.

2. Information content is not just stored in the DNA, but also in all of the ancillary proteins, etc. Well, the the information on how to create the proteins themselves are stored in DNA. Any additional information, then, has to be contributed by the environment. But that is exactly my point--environment matters way more than the information stored in the DNA.

3. "you are wrong", "you are a mathematical ignoramus" etc etc is ad hom. It is not a valid argument, contributes nothing to the conversation, and is not a good look. If you disagree let's see some math.

4. No matter how much information you think is stored in DNA, the amount of information stored in your brain is at least 5 orders of magnitude larger. The information you can learn swamps out any predispositions you might have. Go become the kind of person you want to be.

[+] nurettin|10 months ago|reply
You are predisposed to acting like your closest social circle.
[+] GuB-42|10 months ago|reply
An audio CD is a very inefficient way of storing information.

I think a more apt comparison would be that of a LLM of that size. qwen:0.5b is about 400MB, its abilities are laughable compared to the likes of ChatGPT, but it can write coherently about general topics. For instance.

  >>> why would people be altruistic
  People are likely to be altruistic because they believe that helping others is better for everyone involved.
  People may also believe in the power of compassion and empathy towards others, which can contribute to greater altruism.
  Overall, people are likely to be altruistic because they believe that helping others is better for everyone involved.
It is not a statement about LLMs, more about what you can achieve with "just" 400MB for storage. The other similarity is that LLMs are also "messy", if you want to see the results of finely crafted work in a really small amount of space, look at what sizecoders can do with a few kB of code or less.