top | item 44644865

(no title)

ttflee | 7 months ago

In Chinese, it always added something like "For study/research purpose only. Please delete after 48 hours." This is what those volunteers added in subtitles of (pirated) movies/shows.

discuss

codedokode|7 months ago

Fair, if AI companies are allowed to download pirated content for "learning", why ordinary people cannot.

snickerdoodle12|7 months ago

There is so much damning evidence that AI companies have committed absolutely shocking amounts of piracy, yet nothing is being done.

It only highlights how the world really works. If you have money you get to do whatever the fuck you want. If you're just a normal person you get to spend years in jail or worse.

Reminds me of https://www.youtube.com/watch?v=8GptobqPsvg

gruez|7 months ago

>why ordinary people cannot

They can. I don't think anyone got prosecuted for using an illegal streaming site or downloading from sci-hub, for instance. What people do get sued for is seeding, which counts as distribution. If anything AI companies are getting prosecuted more aggressively than "ordinary people", presumably because of their scale. In a recent lawsuit Anthropic won on the part about AI training on books, but lost on the part where they used pirated books.

shadowgovt|7 months ago

IANAL, but reading a bit on this topic: the relevant part of the copyright law for AI isn't academia, it's transformative work. The AI created by training on copyrighted material transforms the material so much that it is no longer the original protected work (collage and sampling are the analogous transformations in the visual-arts and music industries).

As for actually gathering the copyrighted material: I believe the jury hasn't even been empaneled for that yet (in the OpenAI case), but the latest ruling from the court is that copyright may have been violated in the creation of their training corpus.

robswc|7 months ago

AFAIK, downloading or watching pirated stuff isn't something you'll get in trouble for. Hosting and distributing it is what will get you.

unknown|7 months ago

[deleted]

0x457|7 months ago

Well, it just shows that they've downloaded subtitles.

kgeist|7 months ago

Interesting, in Russian, it often ends with "Subtitles by %some_username%"

cyp0633|7 months ago

That is not the case here - I never encountered this with whisper-large-v3 or similar ASR models. Part of the reason, I guess, is that those subs are burnt into the movie, which makes them hard to extract. Standalone subs need the corresponding video resource to match the audio and text. So nothing is better than YouTube videos which are already aligned.

simsla|7 months ago

At least for English, those "fansubs" aren't typically burnt into the movie*, but ride along in the video container (MP4/MKV) as subtitle streams. They can typically be extracted as SRT files (plain text with sentence level timestamps).

*Although it used to be more common for AVI files in the olden days.