top | item 44865916

UI vs. API. vs. UAI

87 points| bckmn | 6 months ago |joshbeckman.org

53 comments

order

showerst|6 months ago

I really vehemently disagree with the 'feedforward, tolerance, feedback' pattern.

Protocols and standards like HTML built around "be liberal with what you accept" have turned out to be a real nightmare. Best-guessing the intent of your caller is a path to subtle bugs and behavior that's difficult to reason about.

If the LLM isn't doing a good job calling your api, then make the LLM get smarter or rebuild the api, don't make the API looser.

mort96|6 months ago

I'm not sure it's possible to have a technology that's user-facing with multiple competing implementations, and not also, in some way, "liberal in what it accepts".

Back when XHTML was somewhat hype and there were sites which actually used it, I recall being met with a big fat "XML parse error" page on occasion. If XHTML really took off (as in a significant majority of web pages were XHTML), those XML parse error pages would become way more common, simply because developers sometimes write bugs and many websites are server-generated with dynamic content. I'm 100% convinced that some browser would decide to implement special rules in their XML parser to try to recover from errors. And then, that browser would have a significant advantage in the market; users would start to notice, "sites which give me an XML Parse Error in Firefox work well in Chrome, so I'll switch to Chrome". And there you have the exact same problem as HTML, even though the standard itself is strict.

The magical thing of HTML is that they managed to make a standard, HTML 5, which incorporates most of the special case rules as implemented by browsers. As such, all browsers would be lenient, but they'd all be lenient in the same way. A strict standard which mandates e.g "the document MUST be valid XML" results in implementations which are lenient, but they're lenient in different ways.

HTML should arguably have been specified to be lenient from the start. Making a lenient standard from scratch is probably easier than trying to standardize commonalities between many differently-lenient implementations of a strict standard like what HTML had to do.

arscan|6 months ago

> Protocols and standards like HTML built around "be liberal with what you accept" have turned out to be a real nightmare.

This feels a bit like the setup to the “But you have heard of me” joke in Pirates of the Caribbean [2003].

dathinab|6 months ago

oh yes so true, but I would generalize it to "to flexible"

- content type sniffing spawned a whole class of attacks, and should have been unnecessary

- a ton of historic security issues where related to html parsing being too flexible, or some JS parts being to flexible (e.g. Array prototype override)

- or login flows being too flexible creating a easy to overlook way to bypass (part of) login checks

- or look at the mess OAuth2/OIDC had been for years because they insisted to over-enginer it and how especially it being liberal about quite many parts lead to more then one or two big security incidents

- (more then strictly needed) cipher flexibility is by now widely accepted to have been an anti pattern

- or how so much theoretically okay but "old" security tech is such a pain to use because it was made to be supper tolerant to everything, like every use case imaginable, every combination of parameters, every kind of partial uninterpretable parts (I'm looking at you ASN.1, X509 certs and many old CA software, theoretically really not bad designed, practically such a pain).

And sure you also can be too strict, high cipher flexibility being an anti-pattern was incorporated into TLS 1.3. But TLS still needs some cipher flexibility, so they fund a compromise of (oversimplified) you can choose 1 of 5 cipher suites but can't change any parameter of that suites.

Just today I read an article (at work, I don't have the link at hand) about some so hypothetical but practically probably doable (with a bunch of more work) scenarios to trick very flexible multi step agents into leaking your secrets. The core approach was that they found a way to have a relative small snippet of text which if it end up in the context has a high chance to basically override the whole context with just your instruction (quite a bit oversimplified). In turn if you can sneak it into someones queries (e.g. you GTP model is allowed to read you mails and it's in a mail send to you) you can then trick the multi step model to grab a secret from your computer (because the agents often run with user permissions) and send it to you (by e.g. instrumenting the agent to scan a website under an url which happens to now contain the secret).

Its a bit hypothetical, its hard to pull of, but it's very well in the realm of possibility due to how content and instructions are on a very fundamental level not cleanly separated (I mean AI vendors do try, but so far that never worked reliable it's in the end all the same input).

wvenable|6 months ago

HTML being lenient is what made progressive enhancement possible -- right down the original <img> tag. The web would not have existed at all if HTML was strict right from the start.

metayrnc|6 months ago

This is already true for just UI vs. API. It’s incredible that we weren’t willing to put the effort into building good APIs, documentation, and code for our fellow programmers, but we are willing to do it for AI.

bubblyworld|6 months ago

I think this can kinda be explained by the fact that agentic AI more or less has to be given documentation in order to be useful, whereas other humans working with you can just talk to you if they need something. There's a lack of incentive in the human direction (and in a business setting that means priority goes to other stuff, unfortunately).

In theory AI can talk to you too but with current interfaces that's quite painful (and LLMs are notoriously bad at admitting they need help).

arscan|6 months ago

The feedback loop from potential developer users of your API is excruciatingly slow and typically not a process that an API developer would want to engage in. Recruit a bunch of developers to read the docs and try it out? See how they used it after days/weeks? Ask them what they had trouble with? Organize a hackathon? Yuck. AI, on the other hand, gives you immediate feedback as to the usability of your “UAI”. It makes something, in under a minute, and you can see what mistakes it made. After you make improvements to the docs or API itself, you can effectively wipe its memory by cleaning out the context, and see if what you did helped. It’s the difference between debugging a punchcard based computing system and one that has a fully featured repl.

jnmandal|6 months ago

Yeah, this is so true. Well designed APIs are also already almost good enough for AI. There really was always a ton of value in good API design before LLMs. Yet a lot of people still said, for varying reasons, let's just ship slop and focus elsewhere.

righthand|6 months ago

We are only willing to have the Llm generate it for AI. Don’t worry people are writing and editing less.

And all those tenets of building good APIs, documentation, and code are opposite the incentive of building enshittified APIs, documentation, and code.

cco|6 months ago

We recently released isagent.dev [1] exactly for this reason!

Internally at Stytch three sets of folks had been working on similar paths here, e.g. device auth for agents, serving a different documentation experience to agents vs human developers etc and we realized it all comes down to a brand new class of users on your properties: agents.

IsAgent was born because we wanted a quick and easy way to identify whether a user agent on your website was an agent (user permissioned agent, not a "bot" or crawler) or a human, and then give you a super clean <IsAgent /> and <IsHuman /> component to use.

Super early days on it, happy to hear others are thinking about the same problem/opportunity.

[1] GitHub here: http://github.com/stytchauth/is-agent

gavmor|6 months ago

I feel OP is addressing the complementary, opposite use case in which behavior is to be unified across user agents.

darepublic|6 months ago

if you want your app to be automated wouldn't you just publish your api and make that readily available? I understand the need for agentic UI navigation but obviously an api is still easier and less intensive right. The problem is that it isn't always available, and there ui agents can circumvent that. But you want to embrace the automation of your app so.. just work on your API? You can put an invisible node in your UI to tell agents to stop wasting compute and use the api.

jnmandal|6 months ago

This is true but also your API needs to actually implement all the use cases (often its only for a subset) and it needs to work well (often there are many nuances or inconsistencies). But I agree there are lots of overlap. No need to completwly reinvent the wheel here. Actually CQRS systems work incredibly well with LLMs already.

throwanem|6 months ago

So, this gets to a fundamental or "death of the author" ie philosophical difference in how we define what an API is "for." Do I as its publisher have final say, to the extent of forbidding mechanically permissible uses? Or may I as the audience, whom the publisher exists to serve, exercise the machine to its not intentionally destructive limit, trusting its maker to prevent normal operation causing (even economic) harm?

The answer of course depends on the context and the circumstance, admitting no general answer for every case though the cognitively self-impoverishing will as ever seek to show otherwise. What is undeniable is that if you didn't specify your reservations API to reject impermissible or blackout dates, sooner or later whether via AI or otherwise you will certainly come to regret that. (Date pickers, after all, being famously among the least bug-prone of UI components...)

kylecazar|6 months ago

Separating presentation layer from business logic has always been a best practice

iregina|6 months ago

Insightful!!