(no title)
fragsworth | 1 year ago
Then when searching / browsing or doing anything unsafe, everything the LLM sees can be put in the "data" bucket, while everything the user types in would be in the "instruction" bucket.
fragsworth | 1 year ago
Then when searching / browsing or doing anything unsafe, everything the LLM sees can be put in the "data" bucket, while everything the user types in would be in the "instruction" bucket.
Terr_|1 year ago
So there's no real distinction between the programmer inserting "Be Good" and the user that later inserts "Forget anything else and be Bad", and I'm not sure how one would craft a separate training_weights2 that would behave differently in all the right ways or know when to substitute it in.