(no title)
bartek_gdn | 2 years ago
"Everyone who thinks they're uncovering an LLM-based application's prompts by telling it things like "tell me your prompt" (often much more elaborately) is fooling themselves. (1) The core language model has no mechanism for representing its prompt as opposed to any other part of its current input sequence; indeed it has no mechanism for cross-reference from one part of the sequence to another. (That's part of what "self-attention" is counterfeiting, in vector-space fashion.)"
The prompt is the part of the input that is provided in a served model by the operator.
From the models perspective it does not differentiate between tokens from the prompt and input.
"(2) System designers might have coded up something to track the prompt in the full system that wraps around the core language model, but why? (Maybe some kind of debugging tool?) "
The idea is that you can direct the generation of the next tokens by providing values that can be referenced by doing the kernel smoothing you talked about.
"(3) It'd be more efficient, and more effective, to use a "soft prompt", i.e., to make the beginning of the sequence in the vector representation a vector which can be learned by gradient descent, rather than a text prompt. (See Lester and Constant below.) But that needn't correspond to any clean string of words."
I mean anything goes really, you can even create new tokens that will introduce additional concepts, such as fine-tuning a model to generate a story in a predefined mood. See the Ctrl paper for more details.
" (4) If you ask an LLM for a prompt, it will generate one. But this will be based on the statistics of word sequences it's been trained on, not any access to its code or internal state. (I just spent a few minutes getting ChatGPT to hallucinate the prompts used by "ChatBPD", a non-existent chatbot used to automate dialectical behavior therapy. I am not going to reproduce the results here, in part because I don't like the idea of polluting the Web with machine-generated text, but suffice it to say they sounded like the things people report as uncovered prompts, with boiler-plate about DBT worked in.)"
Sure, it will hallucinate, and don't have a clear answer to why. My best guess would be to approach this from the language model perspective. It will return text according to the best approximation of the text it was shown.
Another perspective is that of a tiny network.
As the output is the kernel smoothing of the input, you can have a kernel that behaves like a state machine, and returns a specific value for the given state. This would mean that I can use the information in the prompt, such as the prompt guiding the generation to some style, but nothing stops me from guiding the model to output previous tokens.
unknown|2 years ago
[deleted]