top | item 44778794

(no title)

maaaaattttt | 7 months ago

Maybe do something close to what I like to believe the brain does and have a meta model wrap a "base" model. The meta model gets the output data from the base model (edit: plus the original input) as input plus some meta parameters (for example the probability each token had when it was chosen and/or better which "neurons" were activated during the whole output sequence which would include the Persona they mention). It's then the meta model that generates new output data based on this input and this is the output that is shown to the user.

discuss

order

tdtr|7 months ago

Can you describe the "meta" model more ? afaict it seems like you are describing a "router"? I think what you are thinking of is essentially what MoE does, or in diffusion, a sort of controlnet-like grounding (different exact mechanism, similar spirit).