(no title)
TIPSIO | 17 days ago
This is probably fast, but FWIW I would bet doing a simple str replace on HTML elements with '' would yield mostly the same result. Any sort of structured content (like markdown) isn't even needed really for LLM. Make it messy and super fast and don't accidentally lose anything, it's an LLM.
If compression was really the goal, you could take it further and probably remove all words like "the" and "and", punctuation, maybe even spaces
No comments yet.