top | item 42779420

(no title)

PeterSmit | 1 year ago

I’ve had this same idea, and it doesn’t work. Or at least: it works quote well, but the problem is that you get hallucinations. And it can be incredibly discouraging to find out the flashcards you’ve been cramming are completlh wrong.

discuss

order

3D30497420|1 year ago

I've had this same problem using ChatGPT and German. Even for basic German hallucinations can be unexpected and problematic. (I don't recall the model, but it was a recent one.)

In one instance, I was having it correct akkusativ/dativ/nominativ sentences and it would say the sentence is in one case when I knew it was in another case. I'd ask ChatGPT if it was sure, and then it would change its answer. If pressed further, it would again change its answer.

I was originally quite excited about using an LLM for my language practice, but now I'm pretty cautious with it.

It is also why I'm very skeptical of AI-based language learning apps, especially if the creator is not a native speaker.

arbayi|1 year ago

Would agentic workflows come in handy in these cases? I mean having a controller agent after the sentence is created, where this agent would be able to search the web or have access to a database? or personal notes and ensure everything is correct.

tkgally|1 year ago

What models have you been using for that? While I haven’t tried automating the production of vocabulary lists through an API, within the last few weeks I have had the chat versions of ChatGPT 4o, Claude Sonnet 3.5, and one of the latest Gemini models produce annotated vocabulary lists based on literary texts in English, Russian, and Latin. I didn’t spot any hallucinations.

I was asking only for the meanings of the words and phrases, though. I didn’t ask for things like pronunciations, grammatical categories, etc. In the past, when I’ve tried to get that kind of granular information from LLMs, there were indeed errors, presumably because of tokenization issues.

A few days ago, I ran some similar tests with Japanese, asking for readings of kanji and jukugo in an extended text. All of the models I had tried before for such tasks had screwed up. This time, however, ChatGPT o1 scored 100%. It also was able to analyze sentence grammar accurately, unlike the other models I tried. I was impressed.

At current API prices, though, o1 might be a bit too expensive for such a task.

arbayi|1 year ago

I wonder if there are any benchmarks specifically designed to evaluate LLMs' performance in language learning tasks

learning-tr|1 year ago

I had this problem initially but found that if you use these then hallucinations mostly go away.

1. Role based "agents" with a router and logs (for auditing reasoning and decision making).

2. Cross validation and redundancy with the translation "agent" using a 2nd language (that is not English) that you are also native in to check if the translation carries the same "meaning" (sentiment) and cultural significance (Turkish is especially rich in symbolism and cultural memes).

YMMV: I am a car salesman irl and have no formal training.