Show HN: 30k IKEA items in flat text
55 points| tsazan | 1 month ago |huggingface.co
I took the unofficial IKEA US dataset (originally scraped by jeffreyszhou) and converted all 30,511 products into a flat, markdown-like protocol called CommerceTXT.
The goal: See if a flatter structure is more efficient for LLM context windows.
The results: - Size: 30k products across 632 categories. - Efficiency: The text version uses ~24% fewer tokens (3.6M saved total) compared to the equivalent minified JSON. - Structure: Files are organized in folders (e.g. /products/category/), which helps with testing hierarchical retrieval routers.
The link goes to the dataset on Hugging Face which has the full benchmarks.
Parser code is here: https://github.com/commercetxt/commercetxt
Happy to answer questions about the conversion logic!
vachina|1 month ago
For example, Google’s indexers already use this to surface pricing data. https://developers.google.com/search/docs/appearance/structu...
tsazan|1 month ago
reddalo|1 month ago
These things should be put under /.well-known [1], not in the root.
[1] https://en.wikipedia.org/wiki/Well-known_URI
buildbuildbuild|1 month ago
It’s not ideal but representative of the tension between user experience and technical correctness.
dkdcio|1 month ago
btrettel|1 month ago
tsazan|1 month ago
JosephRedfern|1 month ago
TechSquidTV|1 month ago
tsazan|1 month ago
sognetic|1 month ago
tsazan|1 month ago
colinbartlett|1 month ago
Or just a handy open data set you could use to prove out the concept?
DennisP|1 month ago
WildGreenLeave|1 month ago
bleonard|1 month ago
croisillon|1 month ago
tsazan|1 month ago
chuckadams|1 month ago
unknown|1 month ago
[deleted]
usefulposter|1 month ago
It's funny because it makes zero sense in the body of an initial post!
In comments replying to people downthread - maybe. But opening a top-level post with "Original Poster here" is just silly and shows a lack of respect for community etiquette.
https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...
dkoy|1 month ago
tokai|1 month ago
>be me
Seeing it as a lack of respect is a huge stretch. And kinda conceited that you accuse someone of such, on the basis of a two word opener.