(no title)
dcrimp | 2 months ago
Something I've been interested in recently is - I wonder if it'd be possible to encode a known-good model - some massive pretrained thing - and use that as a starting point for further mutations.
Like some other comments in this thread have suggested, it would mean we can distill the weight patterns of things like attention, convolution, etc. and not have to discover them by mutation - so - making use of the many phd-hours it took to develop those patterns, and using them as a springboard. If papers like this are to be believed, more advanced mechanisms may be able to be discovered.
No comments yet.