(no title)
acituan | 2 months ago
For those who will take “bullshit” as an argument of taste I strongly suggest taking a look at the referenced work and ultimately Frankfurt’s, to see that this is actually a pretty technical one. It is not merely the systems’ own disregard to truth but also its making the user care about the truthiness less, in the name of rhetoric and information ergonomics. It is akin to the sophists, except in this case chatbots couldn’t be non-sophists even they “wanted” to because they can only mimic relevance, and the political goal they seem to “care” about is merely making other use them more - for the time being.
Computing freedom argument likewise feels deceptively about taste but I believe harsh material consequences are yet to be experienced widely. For example I was experiencing a regression I can swear to be deliberate on gemini-3 coding capabilities after an initial launch boost, but I realized if someone went “citation needed” there is absolutely no way for me to prove this. It is not even a matter of having versioning information or output non-determinism, it could even degrade its own performance deterministically based on input - benchmark tests vs a tech reporter’s account vs its own slop from a week past from a nobody-like-me’s account - there is absolutely no way for me to know it nor make it known. It is a right I waived away the moment I clicked “AI can be wrong” TOS. Regardless of how much money I invest I can’t even buy a guarantee on the degree of average aggregate wrongness it will keep performing at, or even knowledge thereof, while being fully accountable for the consequences. Regression to depending on closed-everything mainframes is not a computing model I want to be in yet cannot seem to escape due to competitive or organizational pressures.
duskdozer|2 months ago
Can you describe what you mean by this more? Like you think there was some kind of canned override put in to add a regression to its response to whatever your input was? genuine question
acituan|2 months ago
User has two knobs called the thinking level and the model. So we know there are definitely per call knobs. Who can tell if thinking-high actually has a server side fork into eg thinking-high-sports-mode versus thinking-high-eco-mode for example. Or if there were two slightly different instantiations of pro models, one with cheaper inference due to whatever hyperparameter versus full on expensive inference. There are infinite ways to implement this. Zero ways to be proven by the end user.