(no title)
yk
|
5 months ago
Pretty sure bullshitting is pretty universal. In particular the pattern were I, the mighty expert, insist that only I, the mighty expert, can decode the meaning of those deceptive others, and therefore you, the gullible rube, has to give me all your money so that I, the mighty expert, can keep you, the gullible rube, save from those deceptive foreigners.
agobineau|5 months ago
or in this case, the llm inadvertently trained to conceal its intent to the user and rather to condition the user to the conclusion it truly wants rather than to answer directly
kennywinker|5 months ago
It’d be awful if llms were able to conceal their true intent like that.