top | item 42094283

(no title)

I was trying to do this recently for Web page summarization. As said below the token sizes would end up over the context length, so I trimmed the html to fit just to see what would happen. I found that the LLM was able to extract information, but it very commonly would start trying to continue the html blocks that had been left open in the trimmed input. Presumably this is due to instruction tuning on coding tasks

I'd love to figure out a way to do it though, it seems to me that there's a bunch of rich description of the website in the html

discuss

No comments yet.