(no title)
nsagent | 3 months ago
Simply put, to compare models, you describe both the model and training data using a code (usual reported as number of bits). The trained model that represents the data within the fewest number of bits is the more powerful model.
This paper [2] from ICML 2021 shows a practical approach for attempting to estimate MDL for NLP models applied to text datasets.
No comments yet.