top | item 38631871

(no title)

ek750 | 2 years ago

Do we need some way to grade these services based on vertical or use-case?

I actually tried the same tech questions to multiple services when I first started playing around with these commercial LLMs. I would copy and paste the same question to GPT4, MS Bing (I soon stopped using that since I already have a sub to gpt4), claude, bard, and recently You (https://you.com) and while Claude.ai was rarely as good as GPT4, it wasn't too far off for tech questions.

I'm not very creative, so maybe the use of it helping with writing fiction or roleplay might help me, I haven't tried it yet.

Did you try Claude with non-fictional tasks, and if so, how does that compare to GPT4?

discuss

antiraza|2 years ago

I did not try Claude for a research based task based on non-fictional content.

I think it's good that LLMs becomes specialized tools that can go deep into their expertise, I just think 'a fact engine' -- if that's what Claude is aiming to be -- needs to have correctly rigid controls on what defines fact. From that POV, I think I agree with the 'over-censored' label for Claude earlier in the thread... The intention may not be censorship, but if the LLM is so gunshy about what is fact vs. not, it's going to have a really narrow (and therefore potentially unreliable) lens.