I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.
2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?
[0] https://help.openai.com/
Terretta|1 year ago
It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.
Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.
varispeed|1 year ago
How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).
derefr|1 year ago
reustle|1 year ago
Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.
Example docs site that uses it: https://docs.browserbase.com
fooker|1 year ago
neom|1 year ago
miltonlost|1 year ago
"working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.
At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.
yosito|1 year ago
baxtr|1 year ago
"Yeah sure I got you a table at a nice restaurant. Don’t worry."
behnamoh|1 year ago
I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".
m3kw9|1 year ago
ProofHouse|1 year ago
netcraft|1 year ago
But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)
_factor|1 year ago
Mo3|1 year ago
Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.
ElijahLynn|1 year ago
fooker|1 year ago