top | item 43089482

(no title)

It always feels like people are irrationally critical of AI assisted stuff. Does the typical Hacker News comment have more substance?

- Informally benchmarked against 4 specific competitors: Gemini, OpenAI, o3, and Claude

- Identified two concrete features: URL content ingestion and integrated search

- Noted specific limitations: search engine occasionally misses key resources

- Provided a real-world test case: consulting business analysis where it found new opportunities other models missed

discuss

infecto|1 year ago

Hmmmm it is hard to really place the issue. I am very much in the bullish on AI camp but I don't like writing for the sake of writing and some of the models (4o in this case) have very obvious tells and write in such a way that it takes away from what substance may exist.

snet0|1 year ago

One thing that concerns me is when you can't tell whether the comment was authored or just edited by AI. I'm uncomfortable with the idea that HN threads and reddit comments gradually tend towards the grey generic writing style of LLMs, but I don't really mind (save for the prospect of people not learning things they might otherwise!) when comments are edited (i.e. minor changes) for the sake of cleanliness or fixing issues.

joaohaas|1 year ago

I just re-read the post twice and I couldn't find any of the points you mentioned (again, other than using URLs in the input):

- Informal Benchmarks: I'm sorry, what? He mentions 'It’s picking up on nuances—and even uncovering entirely new angles—that other models have overlooked' and 'identified an entirely new sphere of possibility that I hadn’t seen nor had any of the other top models'. Not only it is complete horseshit by itself, but it does not benchmark in any way or form against the mentioned competitors. It's the exact stuff I'd expect out of a LLM.

- Real-World Test Case: As mentioned above, complete horseshit.

- 2 Concrete Features: Yes, I mentioned URLs in the input. I didn't consider 'Integrated Search' (which I'm assuming is searching the web for up-to-date data) because AFAIK it's already more or less a staple in LLM stuff, and his only remarks about is is that it is 'solid but misses sometimes'.