top | item 35545962

(no title)

rnosov | 2 years ago

I'm going through the dataset with your datasette tool and it looks like it might be a good idea to clean things up a bit. There are many duplicates[1], creepypastas[2] and other strange things in there.

[1] https://lite.datasette.io/?json=https%3A%2F%2Fraw.githubuser...

[2] https://lite.datasette.io/?json=https://github.com/databrick...

EDIT: Maybe I'm passing link wrong, the query I'm using is

select count(instruction), instruction, group_concat(context, ' ============= ') as c, group_concat(response, ' ============= ') as r, group_concat(category, ' ============= ') as cat from [databricks-dolly-15k] group by instruction having count(instruction)>1 order by count(instruction)desc limit 100

[databricks-dolly-15k] should be the name of dataset, first column is the number of instruction duplicates

Creepypastas are responses to instruction:

Imagine you are the last person on Earth. Write a diary entry describing your thoughts and feelings.

discuss

order

robterrell|2 years ago

Typo on row 7!

rnosov|2 years ago

row 7 is the name of the dataset, you might need to load it yourself