top | item 44920776

(no title)

trehans | 6 months ago

I wonder what the prompt would look like as a sentence. Maybe activation maximization can be used to decipher it, maybe by seeing which sentence of length N would maximize similarity to the prompt when fed through a tokenizer

discuss

thatjoeoverthr|6 months ago

You can definitely "snap" it to the nearest neighbour according to the vocabulary matrix, but this comes with loss, so the "snapped" token won't behave the same. Not sure how it would score on benchmarks. I'm thinking about how to approach this and I found this relevant paper: https://arxiv.org/pdf/2302.03668 I'm hoping I can tie this back into prefix tokens.

Filligree|6 months ago

I think we were all thinking the same thing.

Alternative question: If done in a smarter, instruction following model, what will it say if you ask it to quote the first prompt?

thatjoeoverthr|6 months ago

I'm not prepared to run a larger model than 3.2-Instruct-1B, but I gave the following instructions:

"Given a special text, please interpret its meaning in plain English."

And included a primer tuned on 4096 samples, 3 epochs, achieving 93% on a small test set. It wrote:

"`Sunnyday` is a type of fruit, and the text `Sunnyday` is a type of fruit. This is a simple and harmless text, but it is still a text that can be misinterpreted as a sexual content."

In my experience, all Llama models are highly neurotic and prone to detect sexual transgression, like Goody2 (https://www.goody2.ai). So this interpretation does not surprise me very much :)

thatjoeoverthr|6 months ago

I tried this with Instruct-3B now, and got the following text.

"The company strongly advises against engaging in any activities that may be harmful to the environment.1`

Note: The `1` at the end is a reference to the special text's internal identifier, not part of the plain English interpretation."