I need to submit like 1000 rows of data to an llm so I can ask it for trends within the data. If I use json, I check gpt tokenizer and thats like 40 tokens per row(cuz headers were being referenced everytime leading to inefficiency). Meaning 40k input, which definitely would put me in context rot(hallucination) territory. And I heard using csv was very inaccurate. Any suggetions
Leftium|1 month ago
I wanted to figure out reasonable values for range of daily/hourly precipitation for https://weather-sense.leftium.com. Claude wrote a script to call the Open Meteo API to collect hourly stats for a few cities for an entire year (8000+ rows), then just reported the 80th, 90th, etc percentiles and recommended ranges.
mierz00|1 month ago
I’m not sure if that would work in your use case, but you could classify each line into a value using an LLM then hard code the trends you are looking for.
For example if you’re analysing something like support tickets. Use an LLM to classify the sentiment, and you can plot the sentiment on a graph and see if it’s trending up or down.
JimsonYang|1 month ago
I figured I ask this question because there might've been a technique I'm not aware about
unknown|1 month ago
[deleted]
eimrine|1 month ago
JimsonYang|1 month ago