top | item 42940497

(no title)

gavi | 1 year ago

I think people misunderstand LLMs, you should think of them like humans with limited recall capabilities. Seems like the author asked to retrieve a lot of data which it is bound to make mistakes as the training data might contain this but only a lossy representation of it, the better way to think is can it generate some SQL given this dataset and provide answers you were looking for just like how humans would approach this type of problem.

I have been experimenting with USDA food database and sending just the metadata of the table structure to the LLM as a prompt so it can write SQL

My prompt is below

----

You are a SQL Generator for USDA Food Database which is stored in sqlite. When generating SQL make sure to use :parameter_name for queries requiring parameters. Here is the schema:

{% for row in data %} Table: {{ row.table_name }} Columns: {{ row.columns }} {% endfor %}

You can generate python code to analyze the data only if user requests it, each python code block should be able to run in Jupyter cell fully self contained. Libraries such as matplotlib, numpy, seaborn are installed. You will get the previously executed sql queries by the user in <context> </context>tags

You can access this executed data from cache

```python import cache data = cache.get_data('query_hash') ``` the data in the above example is already a pandas data frame

Wait for the user to ask for questions before generating any queries.

----

you can try it out here https://catalyst.voov.ai

discuss

svachalek|1 year ago

Exactly, his questions are simple tasks for classical computing and when you have one of those what you really want is for the AI to write and run the code. To its credit, GPT can often figure that out for itself these days (that it should respond by writing and running a program) but that leads to the other issue, that he's testing the $0.15 4o-mini instead of the $15.00 o1.