top | item 38279944

(no title)

matt_holden | 2 years ago

Interesting point. OpenAI has said they don't train on files uploaded via the API (like the Assistant API), but unclear what the policy is for documents in GPTs.

Either way, the signal they could get from understanding what KINDS of documents builders/users want to do better retrieval on is probably quite valuable.

I also wonder how user file uploads will interact with copyright law and the new Copyright Shield from OpenAI.

E.g. if a user uploads the full text of Harry Potter to a GPT, you could argue the model output is fair use but unclear how courts will interpret that.

LLMs are already a sort of "copyright blender" that aggregate copyrighted inputs to produce (probably?) "fair use" outputs. With the foundation models, OpenAI can decide what inputs to include in training. But with custom GPTs, users can now create their own personal copyright blenders just by uploading a PDF :)

discuss

order

No comments yet.