top | item 46736479

Ask HN: What's a good format to submit CSV data for LLMs?

2 points| JimsonYang | 1 month ago

I need to submit like 1000 rows of data to an llm so I can ask it for trends within the data. If I use json, I check gpt tokenizer and thats like 40 tokens per row(cuz headers were being referenced everytime leading to inefficiency). Meaning 40k input, which definitely would put me in context rot(hallucination) territory. And I heard using csv was very inaccurate. Any suggetions

5 comments

Leftium|1 month ago

Instead of directly passing the CSV data to the LLM, have the LLM write a script that will read the CSV and output the trends. (Then it can run the script and report on the results.)

I wanted to figure out reasonable values for range of daily/hourly precipitation for https://weather-sense.leftium.com. Claude wrote a script to call the Open Meteo API to collect hourly stats for a few cities for an entire year (8000+ rows), then just reported the 80th, 90th, etc percentiles and recommended ranges.

mierz00|1 month ago

We analyse thousands of lines from a csv using an LLM. The only thing that worked for us was to send each individual line and analyse it one by one.

I’m not sure if that would work in your use case, but you could classify each line into a value using an LLM then hard code the trends you are looking for.

For example if you’re analysing something like support tickets. Use an LLM to classify the sentiment, and you can plot the sentiment on a graph and see if it’s trending up or down.

JimsonYang|1 month ago

I think that is probably what I'll end up doing. Since the data is text based data. Combined that with the approach of pre analzying quantitative data. To feed to the LLM

I figured I ask this question because there might've been a technique I'm not aware about

unknown|1 month ago

[deleted]

eimrine|1 month ago

you can use good old algorythms to search your special trends. just ask LLM how to code them. any algo you might need is somewhere inside of Donald Knuth's books.

JimsonYang|1 month ago