(no title)
snyy | 1 month ago
Recently, we've been using Chonkie to build deep research agents that watch topics for new developments and automatically update their reports. This requires chunking a large amount of data constantly.
While building this, we noticed Chonkie felt slow. We started wondering: what's the theoretical limit here? How fast can text chunking actually get if we throw out all the abstractions and go straight to the metal?
This post is about that rabbit hole and how it led us to build memchunk - the fastest chunking library, capable of chunking text at 1TB/s.
Blog: https://minha.sh/posts/so,-you-want-to-chunk-really-fast
GitHub: https://github.com/chonkie-inc/memchunk
Happy to answer any questions!
djoldman|1 month ago
How does the software handle these:
Mrs. Blue went to the sea shore with Mr. Black.
"What's for dinner?" Mrs. Blue asked.