top | item 44091788

(no title)

kegs_ | 9 months ago

2 hours in and this thread is already stacked, but I'll bite since I am stuck on this problem and need help. I am working on a language learning solution that involves llms. The way I am branding it is "Anki meets Ai" because it combines a flashcard-esque method of generating complete exercises such as multiple choice, cloze, etc. with the tried-and-true SRS methodology.

I think it works great! The problem is, I think it works great. The issue is that it is doubly-lossy in that llms aren't perfect and translating from one language to another isn't perfect either. So the struggle here is in trusting the llm (because it's the only tool good enough for the job other than humans) while trying to look for solid ground so that users feel like they are moving forward and not astray.

discuss

Alex-Programs|9 months ago

Hey, I happen to have run into a similar issue with my project!

I've documented a lot of my research into LLM translation at https://nuenki.app/blog, and I made an open source hybrid translator that beats any individual LLM at https://nuenki.app/translator

It uses the fact that

- LLMs are better at critiquing translations than producing them (even when thinking, which doesn't actually help!)

- When they make mistakes, the mistakes tend to be different to each other.

So it translates with the top 4-5 models based on my research, then has another model critique, compare, and combine.

It's more expensive than any one model, but it isn't super expensive. The main issue is that it's quite slow. Anyway, hopefully it's useful, and hopefully the data is useful too. Feel free to email/reply if you have any questions/ideas for tests etc.

kegs_|9 months ago

Hey thanks for the reply! Is this "hybrid" method what you wrote in the last line - llm comparison?