top | item 42916899

(no title)

I just gave it a whirl. Pretty neat, but definitely watch out for hallucinations. For instance, I asked it to compile a report on myself (vain, I know.) In this 500-word report (ok, I'm not that important, I guess), it made at least three errors.

It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.

Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don't have much expertise, after all.

Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.

I would say it's decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.

discuss

toasteros|1 year ago

"Pretty neat, but definitely watch out for hallucinations."

We'd never hire someone who just makes stuff up (or at least keep them employed for long). Why are we okay with calling "AI" tools like this anything other than curious research projects?

Can't we just send LLMs back to the drawing board until they have some semblance of reliability?

Gerardo1|1 year ago

> Why are we okay with calling "AI" tools like this anything other than curious research projects?

Because they are a way to launder liability while reducing costs to produce a service.

Look at the AI-based startups y-combinator has been funding. They match that description.

throwing_away|1 year ago

> We'd never hire someone who just makes stuff up (or at least keep them employed for long).

This is contrary to my experience.

oldstrangers|1 year ago

> Can't we just send LLMs back to the drawing board until they have some semblance of reliability?

Well at this point they've certainly proven a net gain for everyone regardless of the occasional nonsense they spew.

dumbfounder|1 year ago

Why not just verify the output? It’s faster than generating the entire thing yourself. Why do you need perfection in a productivity tool?

roflyear|1 year ago

> We'd never hire someone who just makes stuff up

We do all the time - of course we do, all the time.

kenjackson|1 year ago

You can use them for whatever you like, or not use them. Everyone has a different bar for when technology is useful. My dad doesn't think EVs are useful due to the long charge times, but there are others who find it fully acceptable.

rybosome|1 year ago

This doesn’t make LLMs worthless, you just need to structure your processes around fallibility. Much like a well designed release pipeline is built with the expectation that devs will write bugs that shouldn’t ship.

unknown|1 year ago

[deleted]

ramon156|1 year ago

3k a month vs ~500 dollars a month. That's all u need to know. Not saying its as good, but its all some managers care about

deeviant|1 year ago

Yeah, I used to hire people, but then one of them made a mistake, now I'm done with them forever, they are useless. It is not I, who is directing the workers, who cannot create a process that is resistant to errors, it's definitely the fact that all people are worthless until they make no errors as there truly is no other way of doing things other than telling your intern to do a task then having them send it directly to the production line.

nomel|1 year ago

LLM are "great" in some use cases, "ok" in others, and "laughable" in more.

Some people might find $500 worth of value, in their specific use case, in those "great" and "ok" categories, where they get more value than "lies" out of it.

A few verifiable lies, vs hours of time, could be worth it for some people, with use cases outside of your perspective.

brushfoot|1 year ago

I disagree that this is a useful springboard. And I say that as an AI optimist.

A report full of factual errors that a careful intern wouldn't make is worse than useless (yes, yes, I've mentored interns).

If the hard part is the language, then do the research yourself, write an outline, and have the LLM turn it into complete sentences. That would at least be faster.

Here's the thing, though: If you do that, you're effectively proving that prose style is the low-value part of the work, and may be unnecessary. Which, as much as it pains to me say as a former English major, is largely true.

giarc|1 year ago

What's faster? Writing a 500 word report "from scratch" by researching the topic yourself, vs. having AI write it then having to fact check every answer and correct each piece manually?

This is why I don't use AI for anything that requires a "correct" answer. I use it to re-write paragraphs or sentences to improve readability etc, but I stop short of trusting any piece of info that comes out from AI.

mdp2021|1 year ago

> Then it cited an answer I gave on Stack Overflow [...] using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else

Artificial dementia...

Some parties are releasing products much earlier than the ability to ship well working products (I am not sure that their legal cover will be so solid), but database aided outputs should and could become a strong limit to that phenomenon of remembering badly. Very linearly, like humans: get an idea, then compare it to the data - it is due diligence and part of the verification process in reasoning. It is as if some moves outside linear pure product progress reasoning are swaying the RnD towards directions outside the primary concerns. It's a form of procrastination.

prof-dr-ir|1 year ago

> Pretty neat, but definitely watch out for hallucinations.

That would be exactly my verdict of any product based on LLMs in the past few years.

vessenes|1 year ago

Interesting!

I wonder if it’s carried over too much of that ‘helpful’ DNA from 4o’s RLHF. In that case, maybe asking for 500 words was the difficult part — it just didn’t have enough to say based on one SO post and one article, but the overall directives assume there is, and so the model is put into a place where it must publish..

Put another way, it seems this model faithfully replicates the incentives most academics have — publish a positive result, or get dinged. :)

Did it pick up your HN comments? Kadua claims that’s more than enough to roast me, … and it’s not wrong. It seems like there’s enough detail about you (or me) there to do a better job summarizing.

timabdulla|1 year ago

I didn't actually give it a goal of writing any particular length, but I do think that perhaps given my not-so-large online footprint, it may have felt "pressured" to generate content that simply isn't there.

It didn't pick up my HN comments, probably because my first and last name are not in my profile, though obviously that is my handle in a smooshed-together form.

machiaweliczny|1 year ago

This is very bearish for current AI. Seems like 99% reliability is still too small with compounding errors. But I wonder of this is inherently specific to longer context or if this just depends on how it’s trained. In theory longer context => more errors

Although I think people are the same, too big problem and you are getting lost unless taking it in bites, so seems like OpenAI implementation is just bad because o3 hallucination benchmark shouldn’t lead to such poor performance

RobinL|1 year ago

Interesting

You might find it amusing to compare it to: https://hn-wrapped.kadoa.com/timabdulla

(Ref:https://news.ycombinator.com/item?id=42857604)

wholinator2|1 year ago

This is... very uncomfortable. An (expanded) AI summary of my HN and reddit usage would appear to be a pretty complete representation of my "online" identity/character. I remember when people would browse your entire comment history just to find something to discredit you on reddit, and that behavior was _heavily_ discouraged. Now, we can just run an AI model to follow you and sentence you to a hell of being permanently discredited online. Give it a bunch of accounts to rotate through, send some voting power behind it (reddit or hn), and just pick apart every value you hold. You could obliterate someone's will to discuss anything online. You could effectively silence all but the most stubborn, and those people you would probably drive insane.

It's a very interesting usecase though, filter through billions of comments and give everyone a score on which real life person they probably are. I wonder if say, Ted Cruz hides behind a username somewhere.

dlivingston|1 year ago

I put my profile in [0] and it's mostly silly; a few comments extracted and turned into jokes. No deep insights into me, and my "Top 3 Technologies" are hilariously wrong (I've never written a single line of TypeScript!)

[0]: https://hn-wrapped.kadoa.com/dlivingston

ComputerGuru|1 year ago

That.. seems to just take a few (three or four) random comments that received some attention and then extrapolate an entire profile based on (incorrectly) interpreting their contents?

https://hn-wrapped.kadoa.com/ComputerGuru

Bjorkbat|1 year ago

So, I still think this is a cool tool for search reasons, but otherwise the tendency to hallucinate makes it questionable as a researcher.

Hypothetically speaking, if the time you saved is now spent verifying the statements of your AI researcher, then did you really save any time at all?

If the answers aren't important enough to verify, then was it ever even important enough to actually research to begin with?