top | item 46274287

Show HN: Learning a Language Using Only Words You Know

84 points| simedw | 2 months ago |simedw.com

A proof-of-concept language learning app that uses LLMs to generate definitions of unknown words using only previously mastered vocabulary.

29 comments

order

dylanzhangdev|2 months ago

Even for Chinese people, Journey to the West is a somewhat difficult text because it belongs to classical literature. Using some children's books published in recent years, and progressing gradually, might be a better approach?

simedw|2 months ago

This is a simplified version: Journey to the West in Easy Chinese by Jeff Pepper and Xiao Hui Wang. Otherwise, I would definitely have waited a bit before biting off something like this.

englishcat|2 months ago

This is quite a great idea, as a native Chinese speaker, i want to say this is the way very similar how we learned Chinese when we were kids.

On the other hand, the Chinese writing system is logographic (or ideographic), unlike the English system which is phonetic. The most basic characters, such as 日 (sun), 月 (moon), and 山 (mountain), are essentially graphics (or pictures) of the objects themselves. that makes them very suitable for being represented by images. The emoji you are using is also very good.

I believe this method should be very effective for beginners in Chinese. However, once you have mastered the basic Chinese characters, you can learn about the structure of Chinese characters and then continue reading more materials to expand your vocabulary.

The real challenge is to expand your vocabulary through extensive reading, i'm actually working on a tool to solve this specific problem (https://lingoku.ai/learn-chinese), If you are reading English, it will insert Chinese text for you, if your are reading Chinese text, it will translate the text from Chinese to English then inject Chinese words into the translated text, thus improving your vocabulary while reading.

bisonbear|2 months ago

checked out the tool and think it's a cool idea! one piece of feedback though - I actually feel like the inverse product would be more helpful for me. What I mean is replacing ~95% of english text with words (Chinese in my case) that I can understand, and leaving the remaining ~5% (words I definitely don't know) in English.

At least for me, there's large value in consuming bigger volumes of Chinese to get me used to pattern-matching on the characters, as opposed to only reading a smaller amount of harder characters that I'm less likely to actually encounter

jtokoph|2 months ago

This is a really smart idea.

I’m trying to learn to speak Chinese and not read it yet. The issue is most of the language learning apps have a focus on characters. I feel like I just want to see the pinyin. Maybe I don’t know what I need, but I haven’t found the right tool.

andai|2 months ago

There's a language learning method where you just listen to audio, until you develop a basic familiarity with the language. (Then learn reading and writing later.)

You listen to audio you don't understand yet, and over time your brain begins to pick up the patterns. It takes a lot of time but you can do it in the background, because that processing happens subconsciously. So you can get that time "for free".

I learned it from this guy https://alljapanesealltheti.me/index.html

But he got it from linguist Stephen Krashen and his Input Hypothesis of language acquisition. (i.e. that the way babies and kids learn languages, thru osmosis, works for adults too.)

I think the ideal solution is somewhere in the middle, starting with something like Pimsleur which is the same idea (audio and repetition) but more structured and focused, to give you that "seed" of vocabulary and grammar, before you flesh it out with the "long tail" of the language.

SuperNinKenDo|2 months ago

I recently changed all my language flashcards to be like this. Anki is probably the best option. I have the field with the Hanzi, but just configure my cards not to show it at the moment, so I break the habit of translating everything to characters in my head when I'm trying to listen. It's worked well, and the characters will be there when I decire to do something with them again.

simedw|2 months ago

Thanks! I think getting comfortable with characters fairly early is important, as it helps shift your mindset into the right place. That said, I don’t think this project really works until you’re comfortable with at least ~60 characters.

johanyc|2 months ago

I use a similar idea to add LLM-generated example sentences to my anki cards. But I didn't restrict the words it can use. I just ask it to use A1-level words and gnerate ~5 sentences in increasing difficulty order and I manually pick the 2 most suitable ones. Quite often the words it uses have a large overlap with my vocabulary.

nubg|2 months ago

This is extremely interesting, great idea. Really both thumbs up. Looking for more ideas/lifehack approaches to learning via LLMs.

bryanhogan|2 months ago

Interesting concept! Think this would be quite cool to explore. Personally am very interested in language learning concepts / apps.

My first concerns though:

1. How can the system know which words I already know.

2. To what degree will I misunderstand the meaning of words.

3. Somewhat related to 2, how inaccurate will be description / explanation of words be.

simedw|2 months ago

Thanks for the questions. Very fair concerns. Take all of this with a fairly large pinch of salt; this is still an experiment.

1. How does it know which words I already know? It doesn’t automatically. You provide that set. For example, if you’ve completed HSK 1, you can paste the HSK 1 word list into LangSeed and mark those as "known". From there, new explanations are constrained to that vocabulary. You can also paste in real text and mark the easy words as known, though that’s a bit more manual.

2. How much might I misunderstand word meanings? Depends on how advanced the vocab is and how large your known-word set is. I think of this as building intuition rather than giving dictionary-precise definitions. As you see words in more contexts, that intuition sharpens. This is just my experience from testing it over the last couple of weeks.

3. How inaccurate are the explanations? I tested it on Swedish (my native language). There are occasional awkward or slightly odd phrasings, but it’s rarely outright wrong.

sbinnee|2 months ago

It is so inspiring. Recently, I've been thinking of making a side project using LLMs for learning new languages too. Transformers were originally designed for machine translation and now we have much better ones. My idea is to write a mobile app which I have zero experience.

NiloCK|2 months ago

Enjoyers of this concept would probably like this wonderful talk about programming language design by Guy Steele (Sun, Java Language): Growing a Language

https://youtu.be/_ahvzDzKdB0

bisonbear|2 months ago

As a fellow Mandarin learner - this is super cool! Intuitively makes a lot of sense for the "full immersion" component of language. I love to see exciting uses of AI for language learning like this instead of just more slop generation :)

I haven't dug into the github repo but I'm curious if by "guided decoding" you're referring to logit bias (which I use), or actual token blocking? Interested to know how this works technically.

(shameless self plug) I've actually been solving a similar problem for Mandarin learning - but from the comprehensible input side rather than the dictionary side:

https://koucai.chat - basically AI Mandarin penpals that write at your level

My approach uses logit bias to generate n+1 comprehensible input (essentially artificially raising the probability of the tokens that correspond to the user's vocabulary). Notably I didn't add the concept of a "regeneration loop" (otherwise there would be no +1 in N+1) but think it's a good idea.

Really curious about the grammar issues you mentioned - I also experimented with the idea of an AI-enhanced dictionary (given that the free chinese-english dictionary I have is lacking good examples) but determined that the generated output didn't meet my quality standards. Have you found any models that handle measure words reliably?

andai|2 months ago

Cool idea! You mentioned the model struggling with Chinese a bit. Have you tried any Chinese models, e.g. DeepSeek or GLM? I imagine they probably have a lot more Chinese in the pretraining. (And their English is certainly fine too!)

bisonbear|2 months ago

I have personally had success with using Kimi for Chinese creative writing making the same assumption that Moonshot, as a Chinese company, has more/better Mandarin language pretraining data

gamander|2 months ago

This site sucks on mobile. can't upload full text files? why no prepared texts to start reading right away?

mog_dev|2 months ago

How hard would it be to add new languages ?

simedw|2 months ago

Surprisingly easy. If the language has a lot of conjugations (e.g., polite past verb forms), running each word through Snowball first makes the process a bit easier.