top | item 46575462

(no title)

clbrmbr | 1 month ago

I found that Opus 4 was happy to regurgitate a random paragraph from the latter half of Wealth of Nations that nobody quotes. It was probably only in the training data once.

I was thinking we could use this technique to figure out which books were in / out of the training data for various models. Limitation is having to wrestle with refusals.

discuss

order

carshodev|1 month ago

Why would they filter non copyright material? Who cares if it repeats things that are already public/freely usable and available.