top | item 40194033 Understanding Emergent Abilities of Language Models from the Loss Perspective 6 points| maccaw | 1 year ago |arxiv.org 1 comment order hn newest cosmojg|1 year ago Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?
cosmojg|1 year ago Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?
cosmojg|1 year ago