(no title)
dehugger | 2 months ago
When you are being scraper there are two possible reactions: 1 - good, because someone scraping your data is going to help you make a sale (discoverability) 2 - bad, work to obfuscate/block/prevent access.
In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers".
In the second case, you actively don't want your data scraped, so why would you ever adopt this?
If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed.
tsazan|2 months ago
Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.
Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.
You don't want the AI to 'guess' your data. You want it to 'know' your data.
dehugger|2 months ago
AI is excellent at mapping from one format to another.
I use this method to great affect.
IgorPartola|2 months ago
You are also solving a business problem with a technical solution. Shopify recently announced that they will open up their entire catalog via an easy to use API to a select few enterprise partners. Amazon is doing a similar thing. This is because they do not want you and I to have the ability to programmatically query their catalog. They want to extract money out of specific partners who are trying to enshittify AI chat apps by throwing tons of ads in there. The big movers in the industry could have already easily adopted a similar standard but they are not going to on purpose. On top of you technical issues other commenters are pointing out, I don’t see why this should be in use at all.