(no title)
goobatrooba | 2 months ago
But all of them * Lie far too often with confidence * Refuse to stick to prompts (e.g. ChatGPT to the request to number each reply for easy cross-referencing; Gemini to basic request to respond in a specific language) * Refuse to express uncertainty or nuance (i asked ChatGPT to give me certainty %s which it did for a while but then just forgot...?) * Refuse to give me short answers without fluff or follow up questions * Refuse to stop complimenting my questions or disagreements with wrong/incomplete answers * Don't quote sources consistently so I can check facts, even when I ask for it * Refuse to make clear whether they rely on original documents or an internal summary of the document, until I point out errors * ...
I also have substance gripes, but for me such basic usability points are really something all of the chatbots fail on abysmally. Stick to instructions! Stop creating walls of text for simple queries! Tell me when something is uncertain! Tell me if there's no data or info rather than making something up!
razster|2 months ago
Locals are better; I can script and have them script for me to build a guide creation process. They don't forget because that is all they're trained on. I'm done paying for 'AI'.
marcosscriven|2 months ago
balder1991|2 months ago
What I mean is, it seems they try to tune them to a few certain things, that will make them worse on a thousand other things they’re not paying attention to.
striking|2 months ago
fleischhauf|2 months ago
davebren|2 months ago
matusp|2 months ago
https://ai.google.dev/gemini-api/docs/structured-output
ifwinterco|2 months ago
Especially something like expressing a certainty %, you might be able to get it to output one but it's just making it up. LLMs are incredibly useful (I use them every day) but you'll always have to check important output
carsoon|2 months ago
Potentially they could figure it out if they looks into a comparison of next token probabilites, but this is not exposed in any modern model and especially not fed back into the chat/output.
Instead people should just ask it to explain BOTH sides of an argument or explain why something is BOTH correct and incorrect. This way you see how it can halluciate either way and get to make up your own mind about the correct outcome.
nullbound|2 months ago
I am relatively certain you are not alone in this sentiment. The issue is that the moment we move past seemingly objective measurements, it is harder to convince people that what we measure is appropriate, but the measurable stuff can be somewhat gamed, which adds a fascinating layer of cat and mouse game to this.
delifue|2 months ago
hnfong|2 months ago
Some issues you mentioned like length of response might be user preference. Other issues like "hallucination" are areas of active research (and there are benchmarks for these).
carsoon|2 months ago
I think we align on what we want out of models:
""" Don't add useless babelling before the chats, just give the information direct and explain the info.
DO NOT USE ENGAGEMENT BAITING QUESTIONS AT THE END OF EVERY RESPONSE OR I WILL USE GROK FROM NOW ON FOREVER AND CANCEL MY GPT SUBSCRIPTION PERMANENTLY ONLY. GIVE USEFUL FACTUAL INFORMATION AND FOLLOW UPS which are grounded in first principles thinking and logic. Do not take a side and look at think about the extreme on both ends of a point before taking a side. Do not take a side just because the user has chosen that but provide infomration on both extremes. Respond with raw facts and do not add opinions.
Do not use random emojis. Prefer proper marks for lists etc. """
Those spelling/grammar errors are actually there and I don't want to change it as its working well for me.
dontlikeyoueith|2 months ago
They're literally incapable of this. Any number they give you is bullshit.