(no title)
timabdulla | 1 year ago
It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.
Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don't have much expertise, after all.
Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.
I would say it's decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.
toasteros|1 year ago
We'd never hire someone who just makes stuff up (or at least keep them employed for long). Why are we okay with calling "AI" tools like this anything other than curious research projects?
Can't we just send LLMs back to the drawing board until they have some semblance of reliability?
Gerardo1|1 year ago
Because they are a way to launder liability while reducing costs to produce a service.
Look at the AI-based startups y-combinator has been funding. They match that description.
throwing_away|1 year ago
This is contrary to my experience.
oldstrangers|1 year ago
Well at this point they've certainly proven a net gain for everyone regardless of the occasional nonsense they spew.
dumbfounder|1 year ago
roflyear|1 year ago
We do all the time - of course we do, all the time.
kenjackson|1 year ago
rybosome|1 year ago
unknown|1 year ago
[deleted]
ramon156|1 year ago
deeviant|1 year ago
nomel|1 year ago
Some people might find $500 worth of value, in their specific use case, in those "great" and "ok" categories, where they get more value than "lies" out of it.
A few verifiable lies, vs hours of time, could be worth it for some people, with use cases outside of your perspective.
brushfoot|1 year ago
A report full of factual errors that a careful intern wouldn't make is worse than useless (yes, yes, I've mentored interns).
If the hard part is the language, then do the research yourself, write an outline, and have the LLM turn it into complete sentences. That would at least be faster.
Here's the thing, though: If you do that, you're effectively proving that prose style is the low-value part of the work, and may be unnecessary. Which, as much as it pains to me say as a former English major, is largely true.
giarc|1 year ago
This is why I don't use AI for anything that requires a "correct" answer. I use it to re-write paragraphs or sentences to improve readability etc, but I stop short of trusting any piece of info that comes out from AI.
mdp2021|1 year ago
Artificial dementia...
Some parties are releasing products much earlier than the ability to ship well working products (I am not sure that their legal cover will be so solid), but database aided outputs should and could become a strong limit to that phenomenon of remembering badly. Very linearly, like humans: get an idea, then compare it to the data - it is due diligence and part of the verification process in reasoning. It is as if some moves outside linear pure product progress reasoning are swaying the RnD towards directions outside the primary concerns. It's a form of procrastination.
prof-dr-ir|1 year ago
That would be exactly my verdict of any product based on LLMs in the past few years.
vessenes|1 year ago
I wonder if it’s carried over too much of that ‘helpful’ DNA from 4o’s RLHF. In that case, maybe asking for 500 words was the difficult part — it just didn’t have enough to say based on one SO post and one article, but the overall directives assume there is, and so the model is put into a place where it must publish..
Put another way, it seems this model faithfully replicates the incentives most academics have — publish a positive result, or get dinged. :)
Did it pick up your HN comments? Kadua claims that’s more than enough to roast me, … and it’s not wrong. It seems like there’s enough detail about you (or me) there to do a better job summarizing.
timabdulla|1 year ago
It didn't pick up my HN comments, probably because my first and last name are not in my profile, though obviously that is my handle in a smooshed-together form.
machiaweliczny|1 year ago
Although I think people are the same, too big problem and you are getting lost unless taking it in bites, so seems like OpenAI implementation is just bad because o3 hallucination benchmark shouldn’t lead to such poor performance
RobinL|1 year ago
You might find it amusing to compare it to: https://hn-wrapped.kadoa.com/timabdulla
(Ref:https://news.ycombinator.com/item?id=42857604)
wholinator2|1 year ago
It's a very interesting usecase though, filter through billions of comments and give everyone a score on which real life person they probably are. I wonder if say, Ted Cruz hides behind a username somewhere.
dlivingston|1 year ago
[0]: https://hn-wrapped.kadoa.com/dlivingston
ComputerGuru|1 year ago
https://hn-wrapped.kadoa.com/ComputerGuru
Bjorkbat|1 year ago
Hypothetically speaking, if the time you saved is now spent verifying the statements of your AI researcher, then did you really save any time at all?
If the answers aren't important enough to verify, then was it ever even important enough to actually research to begin with?