top | item 25475487

(no title)

>understanding that the reads or sequences are long

So why slice and replicate slices? All that makes more noise. The wikipedia seems to suggest (or I infer) that one can't merely "just read of" the base pairs in QWERTYUIOP through some super cool process and be done with it. No, I have to split QWERTYUIOP into chunks and replicate the hell out of the resulting chunks which just re-asks my question. I mean how is QWERTYUIOPQWERTYUIOP not a reasonable outcome?

I'm still missing something fundamental. The more chunks there are, the more permutations, combinations there are in possible re-assembly up to all the power sets of base pairs. Granted, it may not be possible to make complex simple here. So that thank you for your time and effort.

discuss

jakobnissen|5 years ago

Indeed you are right, that would be much easier. DNA assembly is an insanely hard computational problem. The issue there is that it's difficult to actually build a sequencing machine that can sequence more than a few hundred base pairs before it stops.

Why that is hard depends on the approach the machine takes to sequencing. With the "sequence by synthesis" approach, the problem is that you need one or two chemical reactions per base, and any yield much lower than 100% will quickly degrade the product after a few hundred cycles.

Nanopore uses a different approach and can indeed produce very long reads, with the tail of the distribution being tens of thousands of base pairs. Not sure what the bottleneck for the length is there.

epgui|5 years ago

Again, the key is understanding that the reads or sequences are long, hahaha. For reads, the length of the sequences are on the order of 100-1000bp in length. This is not captured by the QWERTYUIOP example.

It's also important to understand that it's impossible to obtain an error-free sequence (I am ignoring nanopore sequencing, which works differently, for the purposes of this comment), and that the assembly of all these reads is a game of probabilities.

The DNA fragment sequences are even longer than the reads.

Nobody said this was easy! Hehe.