top | item 47078224

(no title)

porridgeraisin | 11 days ago

One other reason STT and OCR (checkout sarvam vision demo on their website, extremely good!) is the focus is to use it to build indian language datasets that can then be used to train larger LLMs than the current 105B one. Most training data in indian languages (you'd know, there are more than just hindi) is in either speech form, or old books.

If you add in the commercial aspect you pointed out, TTS/STT becomes even more important.

discuss

order

No comments yet.