notahuman's comments

notahuman | 3 years ago | on: Correlation between the model scale and the quality of long text generation?

Recently, I conducted a few fine-tuning experiments on a multitask instruction dataset on 1b7 LLM, primarily related to open-ended long story generation. However, it was observed that after generating nearly 300 tokens, the quality of the generated text started to decline, becoming less fluent.

I am curious to know if there exists a relationship between the scale of the model and the quality of long text generation. For example, is it possible to achieve fluent long text after generating a certain number of tokens for models like 1b7?

page 1