top | item 42716744

(no title)

UmYeahNo | 1 year ago

I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:

1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.

2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

[0] https://help.openai.com/

discuss

order

Terretta|1 year ago

Same experience except mine insisted I had no tasks.

It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.

Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.

varispeed|1 year ago

> it hallucinated that I had too many tasks.

How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).

derefr|1 year ago

Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.

reustle|1 year ago

> It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.

Example docs site that uses it: https://docs.browserbase.com

fooker|1 year ago

You can hardly blame a product for not doing something that we don't know for certain to be possible.

neom|1 year ago

I've thought about this a lot too and my guess is that because foundational modals take a lot to train, I don't think they are trained fairly often, and from my experiences you can't train in new data easily, so I think you'd have to have some little up to date side system, and I suspect they're very thoughtful about these "side systems" they place, from trying to build some agent orchestration stuff myself nothing ends up being as simple as as I expect with "side systems" and stuff easily goes off the rails. So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

miltonlost|1 year ago

> So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

"working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.

At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.

yosito|1 year ago

I regularly use Perplexity and Cursor which can search the internet and documentation to answer questions that aren't in their training data. It doesn't seem that hard for ChatGPT to search and summarize their own docs when people ask about it.

baxtr|1 year ago

Now imagine giving this "agent" a task like booking a table at a restaurant or similar.

"Yeah sure I got you a table at a nice restaurant. Don’t worry."

behnamoh|1 year ago

> 2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".

m3kw9|1 year ago

Buggy af right now, 95% tasks failed and I get a ton of emails about it

ProofHouse|1 year ago

Very, very, very buggy and really looks extremely low effort as with many OpenAI feature rollouts. Nothing wrong with an MVP feature, but make it at least do what it’s supposed to do and maybe give it 10% more extensibility than the bare bones.

netcraft|1 year ago

I question the same things frequently. I routinely try to ask chatgpt to help me understand the openai api documentation and how to use it and it rarely is helpful, and frequently tells me things that are just blatantly untrue. At least nowadays I can link it directly to the documentation for it to read.

But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)

_factor|1 year ago

You mean converting $20 monthly subscribers into less profitable API users?

Mo3|1 year ago

Wait so... they made the LLM itself control the scheduling?

Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.

ElijahLynn|1 year ago

Yeah, I saw the 4o with Tasks today, tried it and asked "what is 4o with Tasks", it had no idea. I had to set it to web search mode to figure it out.

fooker|1 year ago

If you ask me to describe how a human brain works, I'll have no idea and woukd have to search the web to get an (incomplete) idea.