top | item 44643641

(no title)

sivers | 7 months ago

to save you a lookup:

The Arabic text "رجمة نانسي قنقر" translates to English as: "Nancy Qanqar's translation" or "Translation by Nancy Qanqar"

"رجمة" means "translation" and "نانسي قنقر" is the name "Nancy Qanqar"

discuss

order

mormegil|7 months ago

In Czech, Whisper usually transcribes music as "Titulky vytvořil JohnyX" ("subtitles made by JohnyX") for the same reason.

actionfromafar|7 months ago

Haha, trained on torrented movies! :-D

The MPA must be so proud.

aprilthird2021|7 months ago

And it seems to be because the training data is largely unofficial subtitles from movies. Which often have a string like "Translated by X" at the end of the movie which is often silent while credits roll.

rob74|7 months ago

Looks like they used more official sources for German - there, silence is apparently hallucinated as "Untertitelung des ZDF für funk, 2017" according to one of the comments on the issue. Which makes sense, as the public broadcasters' "Mediathek" is probably the largest freely available resource of subtitled videos in Germany. I wonder if the ZDF gave its approval for it being used for LLM training though?

4gotunameagain|7 months ago

I'm sure they totally did not pirate the audio of said movies.

beshrkayali|7 months ago

You've got a little typo, it's not "رجمة", it's "ترجمة" that means translation, the ت at the beginning is missing.