top | item 11440668

Cleartext: A text editor that only allows the 1,000 most common words in English

314 points| henrik_w | 10 years ago |github.com | reply

216 comments

order
[+] gregschlom|10 years ago|reply
Am I the only one to find that using only the 1,000 most common words actually makes thing harder to understand? The writer ends up having to use convoluted paraphrases to refer to things where a precise, well defined and well understood word exists but it's not part of the top 1,000.
[+] AndrewUnmuted|10 years ago|reply
The technique is working quite well for Donald Trump: https://www.washingtonpost.com/news/the-fix/wp/2015/09/15/ho...

> Some of his answers last only a few seconds, some are slightly longer, but almost all consist of simple sentences, grammatically and conceptually, and most of them withhold their most important word or phrase until the very end. Trump’s sentences end with a pop, and he seems to know instinctively where to put the emphasis in each one.

[+] readams|10 years ago|reply
I think a much better approach would be to start with the 1000 most common words, but allow defining terms from there. This allows you to develop the correct jargon which will both enhance understanding and allow communication with others about the topic in the future.

Having to continually fall back on silly terms like "up goer" does nothing to enhance understanding when you can define what a rocket is and then just use the word rocket.

[+] cloudhead|10 years ago|reply
You're not the only one. Words exist exactly to avoid the problem this app creates..
[+] kazinator|10 years ago|reply
With just the words "zero" and "one" strung together in some combination, we can describe a JPEG image, which can be a picture of anything. (For instance, a black and white scan of the word "deoxyribonucleic" in a Times Roman font).
[+] skybrian|10 years ago|reply
Sure, but it's a fun writing exercise. If you write using a limited vocabulary and then fix the most convoluted paraphrases, you might end up in a better place than if you wrote it normally.
[+] theonemind|10 years ago|reply
I tried this, and very quickly found that out. Thereafter, I considered very carefully whether any word that wasn't in the most common 1,000 was worth introducing, and used the top 1,000 freely. That ended up working pretty well. When I found a word that really helped, I added it to the list of words I could continue to use for what I was writing.
[+] hackbinary|10 years ago|reply
Maybe the author is just out by an order of magnitude? Maybe it should be 10,000 words?
[+] dredmorbius|10 years ago|reply
You're not, and you beat me to writing the same comment by 10 hours.

Writing is the problem of communicating ideas, and doing to to a sufficiently prepared audience, effectively.

The problem with stunts such as this -- and there are others, the "if you stop writing for 5 seconds your text starts disappearing" demo a few weeks back comes to mind -- is that they confuse the medium with the message.

Yes, it's possible to have confusing writing on account of unfamiliar terminology. That's a frequent problem with many Wikipedia articles, or much academic writing of the past few decades (older works -- say, 1950 and prior, to the 18th or even 17th century -- are often far clearer).

But the problem most often isn't just the vocabulary, it's the structure of the writing. Pop open a book of Darwin or Adam Smith (I've been reading both) and turn to a random page. Once you get over a slight old fashionedness of the writing, the thoughts are clear.

It is impossible to tell a narrative clearly if you've not organised your own thinking about it.

I've been doing a lot of thinking ... well, about a lot of things. One of those topics is of data and narrative, and the difference between a dry factual presentation (say, a data table, or a simple list of events) and a story which weaves these into a consistent whole. Our minds typically work very well with narrative, sometimes too well, as false narratives can be constructed. But the best presentations I've encountered include both solid facts and a story which ties them together. Finding an author who does this well and with skill is truly impressive.

And the good ones expand your vocabulary. Another point, and one to keep in mind.

There's also Einstein's dictum: make things as simple as possible, but no simpler. The same applies to language.

[+] Kaius|10 years ago|reply
Exactly! There is a phrase in Thing Explainer where Randall is trying to say that a part of a space shuttle or something is made in Japan, instead of using 'Japan' he has to say something like "the land where the sun comes up" or close to it (I don't think he could use 'rising' but may be not be recalling correctly). It breaks down as soon as you try to convey something beyond base level understanding.

So yes, it sounds like using the most common 1K words would aid readability but you probably need to raise that to 2.5K or more. Low enough to avoid the use of complex words while also freeing the author from awkward phrasings.

[+] Spooky23|10 years ago|reply
Being able to express and understand complex concepts is the difference between "smart" and "dumb", educated and uneducated.

An average 4 year old in extreme poverty or public care has 10M words directed at him. A 4 year old with professional parents hears 50M. That's a key differentiator that drives future potential.

Many people lack sophisticated literacy for a variety of reasons. When Honda opened a factory in Alabama, they changed written assembly instructions into pictograms due to the poor literacy level. In aviation, documentation is written in "simplified technical English" to assist non-native English speakers.

[+] tim333|10 years ago|reply
Could be handy for non native English speakers though - less to learn.
[+] wang_li|10 years ago|reply
You are correct. Lectern v. podium. When we have specific words for things, we should use them. Moving down a step in specificity is not conducive to clarity.
[+] OJFord|10 years ago|reply
I thought the same thing when it was done (or was it 10k?) for _Things Explained_ by 'the XKCD guy'.

Great idea, needs more words. It's too extreme.

In the case of this editor, it may actually be useful to some if `N` were configurable, and maybe exceptions could be added. (If you're writing an OS X user guide say, you want to be able to write 'OS X'!)

[+] jdavis703|10 years ago|reply
Genuine question: have you ever tried to learn a foreign language? At least for me long sentences, with easier, more common, words are easier to understand.
[+] swiley|10 years ago|reply
It's kind of like "limit" and "dx/dy" in calculus compared to the crazy long prose people in ancient times had to write.
[+] kazinator|10 years ago|reply
It should support a few template sentences which define a word, which is then allowed to occur in the remainder of the text.

More useful than a document which uses only 1000 common words is a document which uses only 1000 words, plus words which it clearly defines.

I feel as if I could write almost anything if I have that, and it will be self-contained and accessible to anyone with the thousand word vocabulary, plus the ability to internalize definitions, which is a very basic faculty of the intellect.

[+] skystrife|10 years ago|reply
This idea reminds me of an amazing talk by Guy Steele: https://youtu.be/_ahvzDzKdB0

He starts the talk by assuming monosyllabic words as his primitives and builds up the words he needs to use to give the talk by providing definitions for them first.

[+] Spooky23|10 years ago|reply
Keep in mind that the point of these exercises is to make instructions, explanations and short narratives more approachable.

If describing something with a limited vocabulary is awkward and impossible, maybe your approach is too complex.

My 4 year old is starting to learn how to read, and you can see in his eyes the connections made when he tries to read a basic level book. The sentence structure is obvious and predictable, and he's pretty good at getting it.

With a more complex book (Ex: The Little Red Caboose), he recognizes words and "paraphrases" the story with the pictures.

[+] kf5jak|10 years ago|reply
A similar editor was made for the book Thing Explainer[1] by Randal Munroe from XKCD. A book that explains all kinds of different things, from space shuttles to microwaves using the top 1000 common English words.

[1] https://xkcd.com/thing-explainer/

[+] Tharkun|10 years ago|reply
Yeah it's called "thing explainer", but it's more like a "thing convoluter". It's a work humour, not a book of science.
[+] kf5jak|10 years ago|reply
I should of read everything first and would of saw the reference already made... My bad
[+] eracer001|10 years ago|reply
Would love a slider that would allow you to adjust the words allowed from 500 most common words in English to 10k most common words. Also, it would be great if you could compile a windows version.
[+] igravious|10 years ago|reply
Or how about the more uncommon the word the more like the background tone it is so that you get instant visual feedback?
[+] zoren|10 years ago|reply
+1 on the slider, you/we should really do that
[+] visarga|10 years ago|reply
I like the idea. This could be enhanced with a thesaurus that would offer alternatives to difficult words. It could be a useful tool not only for writing clearer explanations, but also to compose easy readers for English learners.
[+] kej|10 years ago|reply
You could build that feature dynamically. When a user tries to use a disallowed word and then uses an allowed word, you could log that word pair. Combine and anonymize those logs and you'd be able to show the most likely replacement words for any word that people often try to use.
[+] SeanDav|10 years ago|reply
Agreed. It would likely need a larger basic dictionary of allowed words and would need some ability to override the dictionary for words that just have to be in to make sense due to domain or context.
[+] macintux|10 years ago|reply
Attempting to type the Gettysburg Address, which is how I usually experiment with word processors and keyboards, is an exercise in futility.

> Eight times ten and seven years ago our fathers brought into the world a new country, born of free thoughts and doings, and completely sold on the idea that all men are created the same.

[+] j2kun|10 years ago|reply
If you made a special case for numbers and fudge the clause structure, it's not that bad.

> Eighty-seven years ago our fathers brought into the world a new country, born of free thoughts and doings and the idea that all men are created the same.

[+] blhack|10 years ago|reply
You guys are missing that this is a joke. This is a reference to this comic: https://xkcd.com/1133/.

Which is poking fun at how funny things sound when you restrict yourself to only that many words.

[+] crispyambulance|10 years ago|reply
Joke aside, there are ways to compute a "readability score" for any block of text. This is useful for document writers that need to target particular grade-levels for their docs (eg driving manual is "8th grade"). https://readability-score.com/
[+] jshevek|10 years ago|reply
I think this is a cool project. I appreciate that the creator has shared this with the world. I think its sad that people are (appear to be?) criticizing him simply for making it and sharing it. However....

"If you find yourself on the receiving end of a message that is too hard to figure out, do everyone a favor and insist on a simpler version.

Do _everyone_ a favor? Presumptive.

If I am on the receiving end of a message that is too hard for me to understand due to my ignorance of certain terms, or due to difficulties I have parsing grammatically correct writing, then the best way for me to do everyone a favor is to work on improving my own vocabulary and/or thinking ability.

Then I will be better equipped to communicate well with a larger set of the population, and better equipped to reason well. Improving your own thinking skills is good citizenship. Improving your ability to communicate well with a wider swath of the population can help you to build bridges between communities.

If I were instead to insist that the message achieves 'clarity' by accommodating my ignorance, then I may be helping some but certainly not everyone.

"Maybe one that only uses the 1,000 most common words."

Asking others to accommodate limitations to my vocabulary is likely to increase their cognitive load; most often I'd rather them apply themselves more fully to other tasks. I can just use a dictionary! Also, this runs the risk of resulting in text that is _harder_ to understand, if they trade precise terms for needlessly convoluted grammar.

[+] cowardlydragon|10 years ago|reply
How about a standard translator that takes generally accepted conversions of more complex language constructs into simpler language?

How about a thing that changes less well known words and puts in easier words?

[+] rexf|10 years ago|reply
It would be a huge usability improvement to have autocompletion of valid words. Currently, you have to type each word, and wait to see if it is rejected/removed.

Instead of waiting for each word to be typed, why not show acceptable words as you are typing, so you can tell before you finish typing if it is valid.

[+] baby|10 years ago|reply
The most annoying application on earth :)

A better application would be to underline words not included in the whitelist and to provide a synonym included in the whitelist upon right clicking.

But as I learned in school: a language's diversity is beautiful, why restrain ourselves in the vocabulary we use?

In particular, a boring text with many repetition is just hard to read, see the discussion here: https://news.ycombinator.com/item?id=11131391

[+] cpeterso|10 years ago|reply
I wrote a simple editor that used Google Translate's web service to round-trip translate the text you enter in real time. I had been thinking about intermediate representations that might assist in the automatic translation of human languages. I wanted to see how one's writing might change to ensure that the round-trip translation was identical or still made sense, thinking this would improve the odds that the translated version actually meant what you intended it to.
[+] threatofrain|10 years ago|reply
While this is probably a joke, I would note that a person is rarely writing for consumption by all demographics. Vampire novels, comic books, chemistry textbooks, all have their audiences, and it is naive to say "but one day the whole world may wish to read my book, I must open the gates as wide as possible", because accessibility is not a free.

Don't simplify or complicate your language without reason.

[+] drcode|10 years ago|reply
Great app, but this still doesn't address the issue of homographs, where a word can have wildly different meanings (some of them uncommon meanings) that just happen to be spelled the same.

For instance, in the live example, the word "Application" is used to refer to a computer program, when the more common meaning likely refers to the verb, as in "The application of a band aid".

[+] sogen|10 years ago|reply
Check also Rewordify [1], provides suggestions alongside the original text, has "difficulty" settings, and more, very complete website, highly recommended. I spent ages looking for something like this.

1.- https://rewordify.com/index.php