top | item 46329704

(no title)

dehugger | 2 months ago

Better idea, how about you just put a link to a csv dump of your inventory data and label it "AI Agents/Scrapers, click here to get all the inventory data", embed that on every page, then call it a day?

When you are being scraper there are two possible reactions: 1 - good, because someone scraping your data is going to help you make a sale (discoverability) 2 - bad, work to obfuscate/block/prevent access.

In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers".

In the second case, you actively don't want your data scraped, so why would you ever adopt this?

If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed.

discuss

order

tsazan|2 months ago

A CSV is a dump of facts. CommerceTXT is a layer of intent and logic. If you give an AI a giant CSV of your whole inventory, you blow the context window before the conversation even starts. If you serve a CSV per product, you still pay for headers and commas without getting any behavioral control.

Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.

Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.

You don't want the AI to 'guess' your data. You want it to 'know' your data.

dehugger|2 months ago

the entire point of the system I described is that it never needs to load that data into context.

AI is excellent at mapping from one format to another.

I use this method to great affect.

IgorPartola|2 months ago

Meh. I would rather just have the ability to query any given products catalog in a machine-readable way. Any tool or protocol specifically designed for an LLM to consume is in my opinion a design smell. We should instead design proper APIs and protocols usable by all kinds of program and the LLMs can adapt.

You are also solving a business problem with a technical solution. Shopify recently announced that they will open up their entire catalog via an easy to use API to a select few enterprise partners. Amazon is doing a similar thing. This is because they do not want you and I to have the ability to programmatically query their catalog. They want to extract money out of specific partners who are trying to enshittify AI chat apps by throwing tons of ads in there. The big movers in the industry could have already easily adopted a similar standard but they are not going to on purpose. On top of you technical issues other commenters are pointing out, I don’t see why this should be in use at all.