What happens to SaaS in a world with computer-using agents?

I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

My team created an identical hypothesis to this doc ~2 years ago and generated a proof of concept. It was pretty magic, we had fortune 500 execs asking for reports on internal metrics and they’d generate in a couple of minutes. First week we got rave reviews - followed by an immediate round of negative feedback as we realized that ~90% of the reports were deeply wrong.

Why were they wrong? It had nothing to do with the LLMs per se, 03-mini doesn’t do much better on our suite than gpt 3.5. The problem was that knowing which data to use for which query was deeply contextual.

Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so. AI has introduced a number of tools to manage that mess, but so far it appears they’ll need to be exposed via fairly traditional UIs.

burnte|1 year ago

> I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly. I was in a 90 minute meeting with some execs who were all high on ChatGPT Operators. He was saying we could replace 80 people at this company RIGHT NOW with this tool. I asked the presenter to type in one simple request to the AI, the entire demo went wildly off the rails from then on and the presenter wasn't even remotely bothered by that. People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability. But the number of people in category 2 is a smaller number than the true believers.

sansseriff|1 year ago

It will be interesting to see in what fields it's worth the effort to curate you're data to a high enough standard that you get all the benefits of the ai agent.

I'm currently working as a scientist. I wonder if researchers will be willing to annotate their papers, data, reasoning, and arguments well enough that ai agents can make good use if it all.

If you write your papers in an AI friendly way, maybe that means more citations? Does this mean switching to new publishing formats? Pdfs are certainly limiting

aeturnum|1 year ago

I think a lot of the power and capability of LLMs comes from their understanding of a lot of implicit context in language. But generally LLMs will have a dominant understanding of each linguistic construct and if that understanding is isn't correct they struggle.

We've looked at using agents at my current job but most of the time, once the data is properly structured, a more traditional approach is faster and less expensive.

TaurenHunter|1 year ago

That must be the reason why Palantir and other AI companies are using the concept of "ontology".

We can't let a LLM loose on a database and expect it to figure out everything.

joshstrange|1 year ago

Yep, I had a similar experience around a year or so ago. Hooking an LLM up to my RDMBS was really cool for the first 1-2 questions but fell over almost immediately with questions that strayed much further than “how many rows are in this table”.

Sure, you can do some basic filtering (but it would fail here making bad assumptions) and any (correct) joins were a crap-shoot. I was including schema and sample rows from all my tables, I wrote 10’s of lines of instructions explaining the logic of the tables and that still didn’t begin to cover all the cases.

Prompt engineering tons of business logic is a horrible job. It hard to test and it feels so “squishy” and unreliable. Even with all of my rules, it would write queries that didn’t work and/or broke a rule/concept that I had laid out.

In my experience, you’re much better off using AI to help you write some queries that you add to the codebase (after tweaking/checking) then you are having AI come up with queries at run time.

yibg|1 year ago

Completely agree. Even things that are considered "standard" or "basic" some times have deep contextual variances. For instance basic questions like "what is my ARR this month" can have varying answers for different businesses.

lukev|1 year ago

This is absolutely the problem. But there is a line of sight; namely, combining LLMs with existing semantic data technologies (e.g, RDF.)

This is why I'm building a federated query optimizer: we want to let the LLM reason and formulate queries at the ontological level, with query execution operating behind a layer of abstraction.

SoftTalker|1 year ago

Did the execs immediately recognize that the reports were wrong, or did some analyst working in a cubicle on the 9th floor point that out?

llm_trw|1 year ago

>Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report.

>I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so.

I've worked in the space as well and completely unstructured data is better than whatever you call a database with a dozen ad hoc tables each storing information somewhat differently to each other for reports written by a dozen different people over a decade.

I have a benchmark for an agentic system which measures how many joins between tables the system can do before it goes off the rails. But there is nothing off the shelf that does it and for whatever reason no one is talking about it in the open. But there are companies working to solve it in the background - since I've worked with three so far.

Without documentation giving some grounding about what the table is doing, you're left with hoping the database is self documenting enough for the agent to figure out what the column names mean and if joining on them makes sense - good luck doing it on id1, id2, idCustomerLocal, id_customer_foreign though.

mooreds|1 year ago

> Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

This reminds me of one of the key plot points in "The Sparrow" by Mary Doria Russell. Small spoiler ahead so if you haven't read it and want to be surprised, stop reading.

...

Basically, one of the characters works as an AI implementer, replacing humans in their jobs by learning deeply about how they do their work and coding up an AI replacement. She run across a SETI researcher and works on replacing him, but he has a human intuition when matching signals that she would never have discovered because it was so random.

Great book if you haven't read it.

bashtoni|1 year ago

This is a good read that is a great starting point for thinking about this. It essentially takes the extreme position - SaaS no longer needs a UI, because the LLM is the UI.

In reality, as always, I suspect the truth will be somewhere in between. SaaS products that succeed will be those that have a good UI _and_ and good API that LLMs can use.

An LLM is not always the best interface, particularly for data access. For most people, clicking a few times in the right places is preferable to having to type out (or even speak aloud) "Show me all the calls I did today", waiting for the result, having to follow up with "include the time per call and the expected deal value", etc etc.

There is undoubtedly an opportunity for disruption here, but I think an LLM only SaaS platform is going to be a very tough sell for at least the next decade.

sansseriff|1 year ago

Yep, it's funny how one of the key factors that limits LLM usage is just the typing speed of users.

I agree that the amount of bespoke UI that needs to exist probably won't stagnate. Humans need about the same amount of visual information to verify a task was done correctly as they need to do the task.

LLM generated UI is an interesting field. Sure, you can get ChatGPT to generate schema to lay out some buttons. But it seems harder to identify the context and relevant information that must be displayed for the human to be a valuable/necessary asset in the process.

thinkindie|1 year ago

I agree with you - also because most of the activities described in the post can be turned around where the SaaS wraps a LLM around specific tasks to augment data (e.g. call transcription, summarisation and preparation for the next meeting).

As an industry, we have been through a textual user interface already: terminals, and we moved away from that.

And voice UIs are not new either: we have had voice assistant for quite some time now, and they didn't see the success Apple, Google or Amazon were expecting (recently it came out that most of echo use cases were about setting timers).

bradchris|1 year ago

Also— for B2B SaaS, a big component of what is being sold is not the product, but support. No matter how modern or antiquated the tech is, many B2B companies don’t actually care about the experience per-se; they care about compliance, security, data integrity, and ongoing support. That’s essentially Oracle’s entire playbook!

How do LLM SaaS replacements solve that?

s__s|1 year ago

The position is more extreme than that. It’s your SaaS without its UI is nothing more than a database.

> The underlying SaaS platform is reduced to a “database” or “utility” that an agent can switch out if needed.

I agree that UI isn’t going away completely. Language is a slow and imprecise tool. A well developed UI can be much more efficient. I think it will be much more like the Star Trek universe, where we use a blend of the two.

In any case, if the AI agent can generate UI on the fly, it seems their point still stands?

nmaley|1 year ago

Look, I love LLMs and even implement them for customers, but I am very sceptical about them 'replacing' ERP and CRP systems. What some AI folks don't seem to understand is that traditional ERP and CRP apps are completely driven by auditable business rules because they have to be. If you're running a company, there's no discretion at all about how money and other assets and liabilities are accounted for. It all has to be strictly according to the rules. This goes for most everything else - management are responsible for the business rules implemented in the system and they need to be precisely spelled out. Sure, AI can and should be used extensively for the human UI piece of it. To simplify getting data into and out of the system for example. But the engine inside and the database are all strictly rule governed and I definitely dont expect that to change anytime soon.

Bjorkbat|1 year ago

This kind of reminds me of when there was a lot of hype around messenger apps and this idea that we'd just do everything through a chat interface / chat bot.

It never panned out, arguably because the technology wasn't quite there yet (this was well before ChatGPT came out), but I thought the bigger problem was that people thought that a chat UI was the ultimate user interface. Just didn't feel right to me. For simple tasks, sure, but otherwise it felt like for "exploratory" tasks it made more sense to have a graphical user interface of some kind.

Same sentiments apply to the hype around agents. Even in a hypothetical world where agents work as well as any human I don't think an agent/chatbot UI is necessarily the ultimate user interface. If I'm asking an agent questions, it makes sense for it to show rather than tell in many contexts. Even in a world where agents capture much of the way we interact with computers, it might make more sense for them to show us using 3rd party SaaS apps.

reportgunner|1 year ago

It was the same then as it is now. Chatbot providers had bots to sell, now autocomplete providers have autocomplete to sell. Marketing people just say what they get paid to say.

bushido|1 year ago

It's an intriguing take, but as others have pointed out, the truth will be somewhere in the middle. I don't believe that AI will replace the entire SaaS interface. And I also don't think it will need as many services and APIs of yester-years.

This writeup seems to be authored by a senior designer at Salesforce and I can see the motivation from the their perspective. Their challenges are different than what a new SaaS product will encounter.

Like all the incumbents of their time they are a core-ish database that depended on a plethora of point solutions from vendors and partners to fill in the gaps their product left in constructing workflows. If they don't take an approach like being discussed here – or in the linked OpenAI/Softbank video – they will risk alienating their vendors/partners or worse see them becoming competitors in their own right.

Disclaimer – I'm biased too, I'm building one of the upstarts that aims to compete with Salesforce.

egypturnash|1 year ago

Have you ever watched people talk excitedly about "agents" for thirty or forty years without ever actually providing an example that functioned for more than a couple of very precisely staged demos, if that?

You Will.

dimitri-vs|1 year ago

Alexa/Siri/Google Assistant.

Except this time with full admin access to everything.

GiorgioG|1 year ago

I think everyone that thinks this way are smoking something. I use the latest and greatest AI tools and they never fail to disappoint, make shit up and just waste hours of time because they would rather answer with nonsense than ask questions or just say I don’t fucking know or something isn’t possible.

vosper|1 year ago

I learned about the idea of Generative UI from a Sharp Talk podcast, and it's stuck with me ever since.

Many SaaS (especially the complex ones, which are the also the most important ones) have a tonne of UI often imposing a huge amount of non-work work onto users - all the clicking you have to do as part of entering or retrieving data, especially if the UI flow doesn't fit exactly what you're trying to do at that moment. An example might be quicly creating an epic and a bunch of related tickets in Jira, and having them all share some common components.

A generative UI would be able to construct a custom UI for the particular thing the user is trying to do at any point in time. I think it's a really powerful idea, and it could probably be done today by smartly using eg Jira's APIs.

The ability to span applications would be even more powerful. Done well it might even kill the need to maintain complex integrations between related Saas (eg how some product development application might need to sync data to/from Jira or ADO) by having the AI just keep track of changes and move them from one system to another.

Once it gets to the point where the Gen UI is go-to system for interactions you have to wonder what all the designers and UI builders at the myriad SaaS will be doing...

dimitri-vs|1 year ago

In a way that's what Claude Artifacts are. That said, I think there are many more ways to get gen UIs wrong then there are to get them right. Most users and use cases will be counterproductive with a dynamic UI. Debugging will be an absolute nightmare if not outright impossible, same with security.

ajcp|1 year ago

I was just commenting on something toward this end, but think I took it further than just UI to apply to the whole software: https://news.ycombinator.com/item?id=42562289

pragmatic|1 year ago

Using SaaS products even with an API is fraught with peril with actual engineers and QA (sometimes) on both sides.

Who's going to bet millions of dollars these agents after going to get it right. Based on what evidence?

nitwit005|1 year ago

Let's look at an actual CRM for a moment. Salesforce has an suite of sales forecasts for projecting sales. A major feature of that is letting people make "adjustments" to the data. Every layer of your sales org can tweak the numbers that the layer below generates: https://help.salesforce.com/s/articleView?id=sales.forecasts...

I'm sure some of those adjustments are reasonable, but I'm also sure this gets used to create a stack of lies to please upper management.

There's some obvious issues with some sort of AI in such an environment. Do you train the AI to tell the right sorts of lies?

TranquilMarmot|1 year ago

We're working on Agents over at Zapier, https://zapier.com/agents

You can have Agents run behaviors async by attaching triggers to them, for example when you get a specific email or something gets updated in a CRM. You can also give the agent access to basically any third-party action you can think of.

Like others in this thread have pointed out, there's a nice middle-ground here between an LLM-only interface and some nice UI around it, as well as ways to introduce determinism where it makes sense.

The product is still in its early days and we're iterating rapidly, but feel free to check it out and give us some feedback. There's a decent free plan.

randomcatuser|1 year ago

Nice! I'm kinda curious -- what do you see as Zapier's advantage when it comes to building agents? It seems like everyone is doing something similar? (e.g Lindy, Gumloop)

Agents have a pre-iPhone feel to them (when everyone was making phones with keyboards). What do you think the ultimate Agents looks like?

aeromusek|1 year ago

For this to become true, agents first have to transcend 'chatbot' as the primary interaction layer.

There's a reason we're still using apps instead of talking to Siri…for a huge number of tasks, visual UIs are so much more efficient than long-form text.

guybedo|1 year ago

I don't think the Agents/LLM become the UI, they are going to be the orchestrators, but a well though UI is always going to be more useful than having to chat/write words so that an agent can help you.

It's gonna be: reusable saas components + ai orchestrator + specialized UI

On a related note, there's probably gonna be an extinction level event in the software industry as there's no software moat anymore.

When every application, every feature, every function can be replicated/reproduced by another company in a matter of minutes / hours using AI tools, you don't have a moat anymore.

alex_young|1 year ago

This reminds me of the blockchain will make everything obsolete sensation of yesteryear.

Why will businesses trust a black box that claims to make good decisions (most of the time) when they have existing human relationships they have vetted, measured, and know the ongoing costs and benefits of?

If the reason is humans are expensive, I have news for you. We've had robotics for around 100 years and the humans are still much cheaper than the robots. Adding a bunch of graphics cards and power plants to the mix doesn't seem to change that equation in a positive direction.

caspper69|1 year ago

Continuing on with my "old man yells at cloud" meme of late, here's my hot take:

So let me get this straight- we are going to train AI models to perform screen recognition of some kind (so it can ascertain layout and detect the "important" ui elements), and additionally ask that AI to OCR all text on the screen so it has some hope of being able to follow some natural language instructions (OCR being a task which, as a HN thread a day or two ago pointed out, AI is exceedingly bad at), and then we're going to be able to tell this non-deterministic prediction engine what we want to do with our software, and it's just going to do it?

Like Homer Simpson's button pressing birdie toy? :smackshead:

Why do I have reservations about letting a non-deterministic AI agent run my software?

Why not expose hooks in some common format for our software to perform common tasks? We could call it an "application programming interface". We might even insist on some kind of common data interchange format. I hear all the cool people are into EBCDIC nowadays.

Then we could build a robust and deterministic tool to automate our workflows. It could even pass structured data between unrelated applications in a secure manner. Then we could be sure that the AI Agent will hit the "save the world" button instead of the "kill all humans" button 100% of the time.

On a serious note, we should study various macro recording implementations, to at least have a baseline of what people have been successfully doing for 40+ odd years to automate their workflows, and then come up with an idea that doesn't involve investing in a new computer, gpu, and slowly boiling the oceans.

This reeks of a solution in search of a problem. And the solution has the added benefit of being inefficient and unreliable. But, people don't get billion dollar valuations for macro recorders.

Is this what they meant by "worse is better"?

Edit: and for the love of FSM, please do not expose any new automation APIs to the network.

rglover|1 year ago

Thank you. My thoughts exactly. Specifically the "you want me to trust mission-critical business logic to a Frankenstein mess of non-deterministic 'agents'?!"

The scariest part is, as this advances, the level of disasters we're likely to see will at best be bankrupt corporations, and at worst, people being hurt/killed (depending on how carelessly these tools are integrated into mission critical systems).

svilen_dobrev|1 year ago

check https://news.ycombinator.com/item?id=42974429 from few days ago.. the OP was re-advertising OAUth, but another idea might be, that new kind of interfaces are needed - application agentic interfaces - standing in middle between APP(Programming) (too detailed) and AHI(Human) screen/forms (too human targeted). IMO.

Terr_|1 year ago

> Like Homer Simpson's button pressing birdie toy? :smackshead:

This comparison is especially apt, given that one of the main use-cases for LLMs is the same kind of... well, fraud: To give the illusion that you did the work of understanding or reviewing something, but actually just (smart-)phoning it in.

In one Apple iPhone advertisement, the famous actor is asked by their agent what they think of a script. They didn't read it, so they ask the LLM-assistant to sum it up in couple sentences... and then they tell their agent it sounds good.

llm_trw|1 year ago

>So let me get this straight- we are going to train AI models to perform screen recognition of some kind (so it can ascertain layout and detect the "important" ui elements), and additionally ask that AI to OCR all text on the screen so it has some hope of being able to follow some natural language instructions (OCR being a task which, as a HN thread a day or two ago pointed out, AI is exceedingly bad at), and then we're going to be able to tell this non-deterministic prediction engine what we want to do with our software, and it's just going to do it?

AI is amazing at OCR, we've had tesseract ocr for 40 years and if you read the fine manual it has essentially a 0% error rate per character.

OCR on VLMs is terrible.

For some reason consistent x-heights between 10 to 30 pixels with guaranteed mono-column layout is not something venture capitalists get excited about, and as a result I'm not the founder of a unicorn.

utf_8x|1 year ago

So is "AI Agents" something the community has settled on or is this a Google-ism? I remember people arguing about this some time ago with no definitive answer.

klabb3|1 year ago

I fancy the old fashioned term ”middleware” myself. But given it is, in fact, the current year, I suspect we’re going to have to accept ”agents” for the time being.

deepsquirrelnet|1 year ago

In my experience, autonomous tools are not as successful as ones that are built to postulate about and get confirmation of the user’s intent. I think there’s a lot of promise for agents that are built to be controlled by skilled operators.

Autonomy is just more sexy, but in my opinion, it’s a poor design direction for a lot of applications.

sbmthakur|1 year ago

I wonder how we will train Customer Support to tackle issues faced by LLMs. LLMs can already do basic Customer Support. But stuff like understanding bugs and deciding if they should escalate things to engineers feels like a hard thing for an LLM.

ceejayoz|1 year ago

> deciding if they should escalate things to engineers feels like a hard thing for an LLM

Especially since most attempts will have a "under no circumstances should you voluntarily involve a human" in the prompt.

dkkergoog|1 year ago

[deleted]

ashu1461|1 year ago

I think the importance to things like user interface, good design are still going to remain just their applications will change to the AI interaction layer / control layer which are mentioned in the blog.

datadrivenangel|1 year ago

SaaS will become the wordpress plugin equivalent for Agent platforms.

dkkergoog|1 year ago

[deleted]

BSOhealth|1 year ago

UX is already working on this. AI as a first-class persona that can be deliberately designed for and accommodated. APIs and protocols are way too strict. Think HTML and black Times New Roman on white backgrounds from the old days. Clear information (text) and activation options (hyperlink) are all it needs.

nickdothutton|1 year ago

Ah yes, we are back to the 90s where we are going to have agents taking care of everything for us. All we are missing is Andersen Consulting to sell this to the CEO.

asdev|1 year ago

who has productionized an agent in a setting where there is a low margin for error? I would love to know

reportgunner|1 year ago

I figure if that guy exists he's swimming in his money vault full of coins instead of reading things online.

nonchalantsui|1 year ago

Great doc. I wonder when we’ll be getting an OS that dedicates itself to Agents.

charliebwrites|1 year ago

I’ll bet you a bag of peanuts that some SaaS company names their next AI Product “AgentOS”

2 bags of peanuts if the actual product isn’t an OS and barely passes as AI

turnsout|1 year ago

We need a simple open-source protocol which includes authentication and ability for agents to make payments. Essentially what you want is the ability for an agent to take a core action (as the article mentions, like adding a record to a CRM).

I fundamentally believe that human-oriented web apps are not the answer, and neither is REST. We need something purpose-built.

The challenge is, it has to be SIMPLE enough for people to easily implement in one day. And it needs to be open source to avoid the obvious problems with it being a for-profit enterprise.

Something1234|1 year ago

This is the same dumb problem as always. Are you who you say you are and are you allowed to do such and such action?

There’s existing solutions but everything is its own special snowflake. Oauth is a lie, sso sometimes works. But sso doesn’t provide a differentiation between my employee and their broken script.

82 comments