top | item 40699273

(no title)

A lot of people here haven't integrated GPT into a customer facing production system, and it shows

gpt-4, gpt-4-turbo, and gpt-4o are not the same models. They are mostly close enough when you have a human in the loop, and loose constraints. But if you are building systems off of the (already fragile) prompt based output, you will have to go through a very manual process of tuning your prompts to get the same/similar output out of the new model. It will break in weird ways that makes you feel like you are trying to nail Jello to a tree

There are software tools/services that help with this, and a ton more that merely promise to, but most of the tooling around LLMs these days gives the illusion of a reliable tool rather than results of one. It's the early days of the gold rush still, and every one wants to be seen as one of the first

discuss

tmpz22|1 year ago

Maybe we shouldn't be selling products built on such a shaky foundation? Like Health Insurance products for example.

[2]: https://insurtechdigital.com/articles/chatgpt-the-risks-and-...

--- please disregard [1] it was a terrible initial source I pulled of Google

[1]: https://medium.com/artivatic/use-of-chatgpt-4-in-health-insu...

Sharlin|1 year ago

Building products on shaky foundations is a tried-and-true approach in IT business.

benreesman|1 year ago

For a different point of view from someone with extremely credible credentials (learned this stuff from Hinton among many other things) and a much more sober and balanced take on all this I recommend the following interview with Nick Frosst (don’t be put off by the clickbait YouTube title, that’s a very silly caption):

https://youtu.be/4JF1V2hzGKE

bcrl|1 year ago

Minimum Viable Products are pretty much by definition built on shaky foundations. At least with software written by humans the failure modes are somewhat bounded by the architecture of the system as opposed to the who-knows-what-the-model-will-hallucinate of AI.

SkyPuncher|1 year ago

I’m not really sure this is an entirely fair argument.

If you rely on third party packages of any type, you have dependencies that can rapidly and unexpectedly break with an update. Semantic versioning is supposed to help with this, but it doesn’t always help.

djohnston|1 year ago

> It will break in weird ways that makes you feel like you are trying to nail Jello to a tree

Probably the best description of working with LLM agents I've read

visarga|1 year ago

It gets more interesting when you get to benchmarking your prompts for accuracy. If you don't have an evaluation set you are flying blind. Any model update or small fix could break edge cases while you don't know.

barrell|1 year ago

Came here to say the same thing, it sums it up perfectly

outside1234|1 year ago

Hopefully you built a solid eval system around the core of your GenAI usage, otherwise, yes, this is going to be very painful :)

bbor|1 year ago

My naive answer: turn away from Silicon Valley modernity with its unicorns and runways and “”marketing””, and embrace the boring stuffy academics! https://dspy-docs.vercel.app/

stavros|1 year ago

I never got DSPy. I only tried a brief example, but can someone explain why it's better than alternatives? Not that I hold LangChain in particularly high regard...

tbarbugli|1 year ago

hosted on Vercel and Github...

mdp2021|1 year ago

Is it Winter already?

adamgordonbell|1 year ago

I've seen people mention this lib before and I have a hard time understanding the use cases Nad how it's used.