top | item 46509162

(no title)

SteveJS | 1 month ago

This gets those cases right.

https://github.com/KnowSeams/KnowSeams

(On a beefy machine) It gets 1 TB/s throughput including all IO and position mapping back to original text location. I used it to split project gutenberg novels. It does 20k+ novels in about 7 seconds.

Note it keeps all dialog together- which may not be what others want, but was what i wanted.

discuss

order

No comments yet.