I don't fully understand what you mean by "maximum length truncation of the string"; but if you're talking about splitting the sentence into 'chunks' which have token counts less than a pre-specified max_token length then, yes!Is that what you meant?
Eisenstein|1 year ago
Given a list sentences, find the largest in order group of sentences which fit into a max token length such that the sentences contain a natural coherence.
In my case I used a fuzzy token limit and the chunker would choose a smaller group of sentences that fit into a single paragraph or a single common structure instead of cramming every possible sentence until it ran out of room. It would do the same going over the limit if it would be beneficial to do so.
A simple example would be having an alphabetized set and instead of making one chunk A items through part of B items it would end at A items with tokens to spare, or if it were only an extra 10% it would finish the B items. Most of the time it just decided to use paragraphs to end chunks instead of continuing into the middle of the next one.