top | item 47125287

Half million 'Words with Spaces' missing from dictionaries

151 points| gligierko | 7 days ago |linguabase.org

271 comments

order

voidUpdate|5 days ago

> “Boiling water” isn’t “water that happens to be boiling.” It’s a hazard, a cooking stage, a state of matter

I guess we'll have to disagree then, because "boiling water" is "water that's boiling" to me. It's not a different state of matter to "water", that would be "steam". It being a hazard doesn't mean it's a singular concept, same as "wet floor"

kdheiwns|5 days ago

Yeah, if "boiling water" is one word, what about boiling sugar? Boiling milk? Boiling volcano? Boiling soup?

Adding two words together creates a new and different concept. The permutations necessary to represent every concept ever formed by combining two or more different words would be endless.

Some of them on the list, like black hole, do make sense. That's a very distinct thing. It's not a hole in the conventional sense and it's not really black. Boiling water, though, is water. And it's boiling.

5o1ecist|5 days ago

> to me

Your "to me" is actually problematic, because it legitimizes this nonsensical idea and turns words and their meaning into something purely individualistic, which cannot end well for the current, but even more so for the next generation.

I can confirm that "boiling water" definitively is "water that's boiling" and that two words, which are supposedly one word, definitely are not one word.

RHSeeger|5 days ago

To me it boils down to (pun intended)

> Traditional dictionaries skip almost all such phrases, because they contain spaces.

Yes, because they're phrases, not words. I don't even understand what's surprising about this. Sure, the entire article talks about how dictionaries contain _some_ phrases; but it's clear it's not many of them. Dictionaries are for words, not phrases.

globular-toast|5 days ago

Yep, all of the following make perfect sense to me, they're just non-idiomatic:

- Don't put your hand in water that's boiling,

- Add the pasta to water that's boiling,

- That saucepan is full of water that's boiling.

If "boiling water" were a distinct word, all of these sentences would change meaning compare to their idiomatic counterparts.

Ekaros|5 days ago

Boiling water is mostly same as boiling anything. So I would just have "boiling". No need for "boiling water". I see no reason why boiling water could not just be covered by whatever general boiling entry covers.

m-schuetz|5 days ago

Some other words that are sorely missing from dictionaries: "Warm water", "hot water", "cold water", "dirty water"

vunderba|5 days ago

Agree. You can of course treat "Boiling water" in its gerund form where it functions as a noun:

  "Boiling water should be performed in a metal pot".
> It’s a hazard, a cooking stage, a state of matter

All of these are ancillary and depend on context, but in every one of these downstream cases the same underlying process is happening: the water is boiling.

gcanyon|5 days ago

I would have agreed with you before they pointed out that "frozen water" gets a word: ice. Honestly, I think it's reasonable: people deal with frozen water far more than they do boiling water, but it changes it from a case of "what are they talking about?" to "okay, where do we draw the line?" for me.

RiverCrochet|5 days ago

What's ice cream then?

adamauckland|4 days ago

The kettle was boiling water.

The chef was out the back, boiling water.

The chef was out the back. Boiling water had spilled everywhere.

The seas had turned to boiling water.

I dunno, could be down to interpretation.

jonplackett|5 days ago

I’m so glad I’m not going insane. I don’t see any examples on that site that I agree are ‘one word’. Sure they’re singular concepts but so what? Are we going to have singular words to describe all adjective noun pairs now?

Beijinger|5 days ago

"a state of matter", no boiling water is not a "state of matter"

harperlee|5 days ago

Surprised that no comment mentioned that there is a standard term (not a word :P) for the set of words that denominates a particular concept: nominal syntagm. Such as "boiling water" and also "that green parrot we saw yesterday over the left branch".

Also the slider examples are abysmal. "I love you", "Go home" and "How are you" are not words by any stretch of imagination. For someone who makes word games, I don't see a particularly deep love of words here.

Edit: Obligatory reference to Borges's Tlön: https://en.wikipedia.org/wiki/Tl%C3%B6n,_Uqbar,_Orbis_Tertiu...

michaeld123|5 days ago

Added a note: "'I love you' isn't opaque, but it's tight enough to put on a tile." The familiar end of the spectrum picks up collocations that are transparent but loaded — I'm not claiming they're words in the traditional sense, but they're useful vocabulary for word games, which is where I'm coming from.

georgefrowny|5 days ago

Funnily enough, "nominal syntagm" is, itself, not in the OED or Wiktionary. But Wiktionary has "syntagme nominal" as the French translation for "noun phrase".

You really have to love the human messiness of language!

win311fwg|5 days ago

A nominal syntagm is a somewhat overlapping concept, but deviates slightly from the direct discussion taking place. The more appropriate standard term here is: open compound word. Or, as one might say casually: word.

AlotOfReading|7 days ago

A compound word isn't just a phrase. The latter is a group of words that indicate a single concept. The former is a new word that has a distinct meaning from the subwords that compose it. "I love you" is an example of a clausal phrase. The meaning is entirely evident from the words that compose it. In contrast, a "hot dog" is not a particularly warm canine, and has its own OED entry [0] as a compound word.

And some of the entries on this list are wrong. "Good night" exists in OED as "goodnight" [1] because there are multiple ways it's used. One is the clausal phrase "I hope you have a good night", which can be modified by changing the adjective, e.g. "great night" or "terrible night". "Goodnight" the bedtime ritual can't be modified the same way, so OED chooses to write it as a compound word without spaces.

[0] https://www.oed.com/dictionary/hot-dog_n

[1] https://www.oed.com/dictionary/goodnight_n

dec0dedab0de|7 days ago

There are nearly half a million compound phrases that aren’t in any dictionary—simply because they contain spaces. “Boiling water.” “Saturday night.” “Help me.”

I would hope that none of those examples were taking up space in a dictionary.

jakub_g|7 days ago

It's quite interesting that "boiling water" in many Slavic languages is actually a separate word (and not derived from "water", but from "boiling"; similar how the author mentions "ice" being used instead of "frozen water").

gligierko|7 days ago

Some are better than others. Many semi-transparents could get legit coverage. And many are good fodder for word game content.

simlevesque|7 days ago

The first two I kind of understand what the author means. But "help me" and "severe pain" made me think that I'm just not the right public for this text.

DonHopkins|5 days ago

>"Boiling water" ... I would hope that none of those examples were taking up space in a dictionary.

Yeah, I agree! Fuck ICE!

thmpp|7 days ago

While 'this analysis would not have been possible without LLM', I am not sure the LLM analysis was well reviewed after it has been done. From the obscure/familiar word list, some of the n-grams, e.g. "is resource", "seq size", "db xref" surely happen in the wild (we well know), but I would doubt that we can argue they are missing from the dictionary. Knowing the realm, I would argue none of them are words, not even collocations. If "is resource" is, why not, "has resource"? So while the path is surely interesting, this analysis does miss scrutiny, which you would expect from a high-level LLM analysis.

michaeld123|7 days ago

The very bottom of the slider is there to illustrate where LLM artifacts and Wiktionary noise live — it's not presented as legitimate vocabulary. The slider lets you see the full quality gradient, including where it breaks down.

less_less|5 days ago

In addition to what others have pointed out, many of these aren't actually missing from traditional dictionaries: they're just inflected differently. So your example lists phrases like "operating systems", "immune systems" and "solar systems" as missing from traditional dictionaries, but at least the online OED and M-W have "operating system", "immune system" and "solar system" in them. It's just that your script is apparently listing the plural as a separate phrase.

On languages other than English: in general, different languages do word division very differently. At least in German and Dutch, many of those phrasal verbs are separable, meaning that they are one word in the infinitive but are multiple words in the present tense. So for example, where in English you would say "I log in to the website", in Dutch it would be "Ik log in op de website". "Log in" is two words in both cases, but in Dutch it's the separated form of the single-word separable verb inloggen ("I must log in now" = "Ik moet nu inloggen"). The verb is indeed separable in that the two words often don't end up next to each other: "I log in quickly" = "Ik log snel in".

Dutch, like German, has lots of compounds. But there are also agglutinative languages, which have even more complex compound words, perhaps comprising a whole sentence in another language. Eg (from Wikipedia) Turkish "evlerinizdenmiş" = "(he/she/it) was (apparently/said to be) from your houses" or Plains Cree "paehtāwāēwesew" = "he is heard by higher powers"; and these aren't corner cases, that's how the language works.

urbandw311er|5 days ago

The author of this article just hasn’t been taught how to use a dictionary. The words aren’t “missing”, they’re just indexed under one of their parts. For example “wait upon” would be located within the entry for “wait”.

kelseyfrog|7 days ago

The name for these are "collocations".

Collocation dictionaries are lists of collocations. The reason they're absent from single word dictionaries is because there's about 25x more collocations than single words.

georgefrowny|5 days ago

And fittingly enough, "collocation dictionary" is not in "the" dictionary. At least not the OED.

Presumably if the word thesaurus was actually "synonym dictionary" it would likewise be absent.

WesolyKubeczek|5 days ago

Examples of "obscure" compound words include "list uids", "beg pos", "sync binlog", "gfp mask", "av fetch", "str idx", "seq ptr", "ai family", "fmt vuln", "ai socktype", "curr tok", "nbits set", "ini get", "s1 s2", "in addr", "num get", "res init", "sess ref", and "ai addrlen".

Well I can't even.

michaeld123|5 days ago

Yeah — added a note below the slider that the obscure end is noise (LLM artifacts, jargon fragments, Wiktionary debris) and that where to draw the line is up to the reader. It was always intended to show the full gradient including where it breaks down, but that wasn't stated

csmantle|5 days ago

This is just so hilarious. They'll eventually have to add "man man" to the list.

floxy|5 days ago

How in the world does "int argc" not make the list? But good to know that "frit flies" does.

beeforpork|5 days ago

> But roughly 15% are plausible: “wooden chair,” “morning coffee.” That’s still 30 billion sensible pairs.

(1) Who counted those? Whence those numbers?

(2) The examples are normal two-word phrases with one word modifying the other, often categorised as an adjective. The examples are counter-examples to the very claim made in that article.

(3) Using Clause to brainstorm s.t. is a weird thing to say...

(4) I would say the use of 'lexicalized' is wrong or at least uncommon. It usually refers to specialised semantics of something that could be interpreted generically, too. Like 'sleeping bag'. Or indeed 'cold feet'. Lexicalisation may involve deleting spaces, like 'hotdog'. And I am pretty sure lexicalised phrasal words are usually intensionally listed in dictionaries. And so 'ice' is not lexicalised 'frozen water', but it is not overtly a phrase but is a separate atomic word.

=> I don't get the point.

michaeld123|5 days ago

My bad. there's a little sidebar about it, but I put it lower after the chart because there wasn't room. You might still not find my logic on the 15% satisfying, but it's there.

yifanl|5 days ago

Off the top of my head, peanut butter, black hole, and amusement park are concepts that can't be easily intuited by just combining the two singular terms, but I also wouldn't consider them as phrases.

eszed|5 days ago

"Peanut butter" would be dealt with by including a reference under the "butter" entry. Something like:

'N, culinary. A paste made of ground up nuts, sometimes with additional oils and other ingredients. E.g. "peanut butter", "almond butter".'

"Amusement park", same. Falls very much under the "place of recreation" definition of "park".

"Black hole" is maybe a bit different, because it's a scientific term - and certainly in a science dictionary would be included as a two-word item - but, for consistency, in a regular dictionary should be handled identically to the above, with a note on the word "hole".

While including noun phrases as singular entities in a word game is entirely appropriate, I don't think the OP has formed a rigorous definition of the concept that they are trying to describe. I agree with the other comment which suggests that they need some instruction / practice using a dictionary.

fuzzy_biscuit|5 days ago

This feels like ragebait (rage bait?) for people that enjoy language and words. The leading example is nonsense.

MarkusQ|7 days ago

This boils down to an "is Pluto a planet" debate.

We act as if some languages have "compound words" that can encompass entire sentences (subject & object attaching to the verb as prefixes or suffixes) while others don't form compounds, and most are somewhere in between. But these are all statements about lexicographic conventions and say nothing about the languages. In reality all languages are muddles sprawling across a multidimensional continuum, and they abso-frigging-lutely do n't sit neatly in such pigeonholes.

Wobbles42|7 days ago

This is a great comparison. We're arguing about the definition of "word", and attempting to expand it to include edge cases where two words with separate meanings have a different atomic meaning when combined.

We could have a similar debate about whether common suffixes and prefixes should be regarded as individual words.

Much like "planets" don't really exist as a separate natural object, words don't really exist in natural languages. They are artificial concepts, and therefore we will always have edge cases.

I would argue that it is still a useful discussion, as it sheds light on the nature of language (or of celestial bodies), even if the definitions defy the same rigour as mathematical concepts.

oh_my_goodness|5 days ago

If the first example was "monkey wrench" instead of "boiling water", we'd never have seen the article.

michaeld123|5 days ago

Ha — you're probably right that it would have been less controversial. But I kept it precisely because it's arguable. Added a parenthetical acknowledging the HN debate and framing it as on-the-fence by design

9rx|5 days ago

"Monkey wrench" is a word already found in the dictionary, so it wouldn't be a useful example. It already met the bar.

The article is questioning why some words don't meet the bar for inclusion in the dictionary. The word "boiling water" is one such word that it sees as being on the fence. The comments here demonstrate exactly why it is on the fence, but it remains unclear exactly what would be necessary for it to tip towards inclusion.

globular-toast|5 days ago

Sure, but monkey wrench is in the dictionary. Heck, it's even in my printed copy of the Shorter Oxford English Dictionary.

Dwedit|5 days ago

Is nobody going to mention that "taco [N WORD]" is one of the words there? (Third page from the end)

eszed|5 days ago

Oh, geeze. The progressive transparency effect on the words towards the "obscure" end of their spectrum made the later pages impossible for me to read.

I suspect the entire list was produced by an AI entity which had not been prompted to avoid giving offense. I predict a range of (tedious) opinions about whether a prohibition on that particular word is an appropriate inclusion in a system prompt.

That's also not a term I've - thankfully! - ever heard, so I've no idea if it's hallucinated. This is not an invitation, HN, to define or explain it to me.

unknown|5 days ago

[deleted]

ProllyInfamous|5 days ago

I'm currently reading Cormack McCarthy's Suttree (my first of his novels) — just an exceptional polymath capable of painting complicated scenery with words dozenly scattered throughout paragraphs [0].

My favorite adjective he's coördinated is "burntwing", used to describe moths spiraling downwards after passing through candleflames. If I had crafted such a descriptive contraction, my former styling would've been "burnt-wing", had I even been capable of generating such concise imagery [1].

McCarthy's stylings have helped me to reduce hyphenations in my own writings — reducing their usage mainly to contractedwords which might be all-too-confusing without them.

[0] pg104 has ten words that I do not know their definitions, yet through context they work to advance the storyline of character racists (book is set in 1950s).

[1] decades ago, during college burnout, I was searching for the essense of "burntwing" — reduced to writing a professor about "feeling like a burning airplane in tailspin." My trajectory back then was definitely burntwing.

tolerance|5 days ago

Thank you for sharing this. It makes me question the extent that a dictionary is meant to make a person more literate.

ilamont|5 days ago

Wait til you read Blood Meridian. The imagery he created with words, some of them his own creations, is just ... beyond compare. I'm reading The Road now, which comes from the same place. I can only read either in small doses. It's very intense, and the passages deserve to be read carefully.

Another contemporary writer who worked with new words in a very creative way was Gene Wolfe in The Book of the New Sun. Some were inventions using Greek, French, or Latin roots. Others were forgotten terms which he resurrected. Someone compiled a dictionary, Lexicon Urthus, which discusses the origins of certain terms and their placement within the series.

WorldMaker|5 days ago

One of the axes this analysis seems to be missing is the subtle spectrum from "multi-word expressions" to "idioms". Traditional lexicographers have long published separate idioms books, such as the Merriam-Webster New World American Idioms Handbook and the Oxford Dictionary of Idioms.

Wiktionary doesn't need to make that distinction between MWEs and Idioms and tends to conflate MWEs and Idioms as there is no separate "Wikidiom". Arguably, that multi-book confusion runs deep on the internet because Urban Dictionary should probably be fully titled the Urban Dictionary of Idioms and Slang.

It's not just page limits but also categorical limits and classic lexicographers would build multiple books/volumes, not just settle on one "dictionary". Classic scholars would often have a "reference shelf" with multiple dictionaries, books of idioms, thesauri, and more. The CD-ROM and then the internet has kind of tunnel visioned that this entire shelf can be merely "one app".

danesparza|7 days ago

I don't think 'Words with spaces' is a thing.

I think maybe the word the author is looking for is 'phrase'

epgui|7 days ago

It’s probably a thing, especially with loan-words (eg.: “avant garde”), and there are probably much better examples… But the examples in the article make no sense to me.

Wobbles42|7 days ago

The difference between phrases and "words with spaces" is addressed.

The confusion might be that this seems to be a spectrum rather than a binary phenomon.

We have single words at one extreme, ordinary sentences at the other, and in the middle we have idiomatic assemblies of words that span a range of substitutability.

"Hot dog" and "Saturday night" are arguably great examples, because they exist at the opposite extremes of the spectrum. Saturday night can retain some of the original meaning following substitution, whereas hot dog almost deserves a hyphen.

dominicrose|5 days ago

Imagine configuring your word separator like this: " `~!@#$%^&*()-=+[{]}\|;:'",.<>/?"

ndr42|7 days ago

I imagine that languages like german that create composites of nouns have less of a problem with this:

English: cream of mushroom soup

Spanisch: sopa cremosa de champiñones

German: Champignoncremesuppe

looperhacks|7 days ago

I just checked, Champignoncremesuppe is not in my dictionary ;)

It has some compound words. But including too many of them would quickly get out of hand

ticulatedspline|7 days ago

but can't you basically make anything a composite noun in German? That it's a single word doesn't really help you decided if it has enough presence unto itself to be defined in the dictionary.

Seems like they would have just as much of a problem since the issue is delineating when a "phrase" becomes a "word"

agmater|7 days ago

In Dutch we indeed happily do this even for English loanwords like "creditcard" or something more obscure like "lockpick". When in doubt, remove the space.

Shorel|5 days ago

As far as my limited knowledge of linguistics goes, the technical term is actually "collocations."

To me, any discussion of this topic that doesn't mention collocations signals an amateurish approach.

I also disagree with the premise that "this was not possible before LLM." That's nonsense. Linguists created many dictionaries of collocations for different languages, so that work is precisely what they did!

(Before any LLM zealots attack me, yes, it is now possible to have a more exhaustive list of collocations thanks to LLMs. This doesn't contradict my point.)

Examples of collocation dictionaries:

https://www.freecollocation.com/

https://ozdic.com/.

zozbot234|5 days ago

AIUI, collocations are just "words that often go together". It doesn't signal any unconventional meaning to the construction, that would make it a proper idiom.

michaeld123|5 days ago

Fair point — added a mention of collocations

ragall|5 days ago

It appears to me that the author is trying too hard to make a point: "merry-go-round" is a single compound word that several dictionaries contain; "canned goods" is not commonly used[1] (more of a bureaucratic jargon), and people would just say "cans"(US) or "preserves" (UK); "household chores" is simply "chores", as the word is no longer commonly used outside the house context; "coffee break ritual" is not a concept in English-speaking countries so it would make no sense to have it in a dictionary, and so many of the examples are exactly that.

[1] I wonder how many here have ever been told something like "Prithee, husband, bring back a dozen canned goods from the market, for in the meanwhile I shall do my household chores".

9rx|5 days ago

I didn't take that the author was trying to make a point, rather pondering about when a word becomes worthy of inclusion in the dictionary. "Boiling water" is a word that hasn't (yet) been deemed worthy. On the other hand, "hot water" is a word that is included in the dictionary, just as "still water", "warm water", etc. are. Somewhere between those included and "boiling water" is line where a word becomes worthy of inclusion. But where, exactly, is that line?

inkcapmushroom|5 days ago

I have definitely used canned goods and household chores in sentences before.

below43|7 days ago

“Hospital bills”. That’s very country specific. Also, that’s two words.

tialaramex|7 days ago

Hospital bills feels like a pretty ordinary compound to me - not like "good morning" or "ginger ale" where you can't just use what you know about the two words to figure out what the compound must mean.

Some cases are basically impossible "Crash blossoms" you don't stand any chance without knowing why we call them that

Some are middling difficult, "Home Secretary" requires that you know every meaning for the two words and then you happen to pick the correct obscure meaning, a "Secretary" could be in charge, and "Home" could mean the entire country as distinct from everywhere else.

But "Hospital bills" doesn't seem even marginally difficult

soperj|7 days ago

What does it mean?

vintermann|5 days ago

Two related compound words from a Norwegian dialect, both mean "fish food":

Fiskemat Fiskmat

The latter means food made from fish, the former means food for fish. Standard varieties of Norwegian only use the former to mean both, to the annoyance of many old fishermen.

This maybe illustrates why the author's examples such as boiling water aren't so weird. Yes, in English it means water that's boiling, but you have to know that. It could for instance have meant water for boiling, like "cooling water" means water for cooling say in a nuclear reactor, not water which is in the process of getting cool.

DonHopkins|5 days ago

>Spanish carves up time with precision English lacks: madrugada for the pre-dawn hours, atardecer for late afternoon waning into evening. The mid-day nap was so compelling we adopted the siesta into English.

"I used to smoke marijuana. But I’ll tell you something: I would only smoke it in the late evening. Oh, occasionally the early evening, but usually the late evening -- or the mid evening. Just the early evening, mid evening and late evening. Occasionally, early afternoon, early midafternoon, or perhaps the late-midafternoon. Oh, sometimes the early-mid-late-early morning... But never at dusk." -Steve Martin

HardwareLust|5 days ago

I disagree these belong in a traditional dictionary.

I could, however be convinced these could be documented/defined in a separate document, especially from the perspective you are coming from (word games).

beAbU|7 days ago

Hah, I wonder how thick a German, Dutch or Afrikaans dictionary would be if it included all possible spaceless compound words. Literally any concept can be compounded together to make a new word.

Roovleisslaghuisinspekteur =

Rooi = red

Vleis = meat

Slag = butcher

Huis = house

Inspekteur = inspector

"Inspector who controls the quality of red meat in butcheries"

waffletower|5 days ago

Dictionaries containing spaced compounds were not scalable with print media. The printed OED was encyclopedic in scale. Compound dictionaries are more than feasible now. Arguing whether a collection of commonly used words are expressions or concepts or even single "spaced words" is beside the point. Simply identify these differences and classify them in the compendium.

dirtikiti|5 days ago

No. This article shows a distinct lack of understanding of the basic building blocks of the English language.

"Words" don't have "spaces."

Phrases are made of words separated by spaces.

"Boiling Water" is not a word.

"Water" is a word. A noun, the subject.

"Boiling" is a word. An adjective, in this case. Which modifies the subject.

I don't know if you're trying to be clever, but you're not.

OkayPhysicist|5 days ago

English has words with spaces. Boiling water isn't one of them, but in general, if you can't insert another adjective between an adjective-noun pair, it's linguistically a compound word that we happen to write with a space. "Fast food" is a good example. It's not simply an adjective-noun pair, as demonstrated by the fact that you sound like a crazy person if you try to insert literally anything between "fast" and "food" in "I eat too much fast food". The "fast food" can be modified all you like, as in "I eat too much lukewarm fast food", "I eat too much depressing fast food", but you can't treat "fast" as merely an adjective of "food", else "I eat too much fast, filling food" wouldn't strip the sentence of the implication I eat at McDonalds or whatever.

NoPicklez|5 days ago

I'm not a linguist, but I don't know why you would need a dictionary that includes words with spaces.

It's pretty obvious "boiling water" shouldn't be in the dictionary to be begin with but "boiling" and "water" should be.

Unless I'm just not understanding it

DaedalusII|5 days ago

in German, they just remove the spaces and keep the word, and this problem is solved:

Entschädigungsleistungen - compensation benefits

Wiederbeschaffungskosten - replacement value

Kraftfahrzeughaftpflichtversicherung - motor vehicle liability insurance

Donaudampfschifffahrtsgesellschaftskapitän - Danube steamboat captain

Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz - beef labeling regulation law

Retr0id|5 days ago

How often do these compound words get listed in German dictionaries?

speak_plainly|7 days ago

Dictionaries are a mixed bag at best. If you apply David Kaplan’s character/content distinction from Demonstratives, you have to ask: should pure indexicals, which are essentially 'contentless' pointers be treated the same way as standard words? Let alone the thousands of rigid designators in this dataset that map directly to specific objects in the real world. At a certain point, is there no room left for encyclopedias?

happycat5000|7 days ago

These are under-respected for non native English speakers.

grantpitt|7 days ago

Can you say more on this?

kgwgk|7 days ago

    > Got a word           Didn’t
    > frozen water → ice   boiling water
Freezing water doesn’t have a word. Boiled water does have a word.

pvillano|7 days ago

A mixture of melting ice and water suitable for drinking has a word: ice water. It's not a adjective noun phrase. It has a more specific meaning than just the two words together. You can order an ice water at a restaurant

hagbard_c|7 days ago

Freezing water doesn't have a word, it only gets one after water has changed phase. Boiling water also gets a word once it has changed phase: steam.

ice - water - steam

johnhamlin|7 days ago

I got into solving the NYT crossword during Covid. I couldn’t solve a Monday when I started; now I do Mondays downs-only and look forward to Saturdays. Along the way, I developed a sixth sense for when an answer will be more than one word. I’ve thought a lot about it and can’t really describe how I do it. (Some other puzzles clarify if an answer spans multiple words, but I find the ambiguity adds to the fun.)

Wobbles42|7 days ago

Do you think this comes from a gradual internalization of a real linguistic concept? Or it more a familiarity with common (if unspoken) conventions of the puzzle makers?

I suspect the answer isn't binary, but it's interesting to think about.

This "sixth sense" phenomenon seems to pop up a lot. Crosswords are a great example. The sense some people are getting for detecting LLM output might be another.

wlesieutre|5 days ago

There are an infinite number of describable concepts that don't get a specific word. That doesn't mean the whole description is a "word with spaces."

It's just part of how language works that when there isn't a single word carrying the meaning you want, you put multiple words together and they can mean the thing together.

Even though there isn't a specific word for that, I wouldn't say "It's just part of how language works that when there isn't a single word carrying the meaning you want, you put multiple words together and they can mean the thing together" actually is one big word with spaces in it.

It's a bunch of words together that carry a more specific meaning when put together in that order.

wanda|4 days ago

They should probably look up the definition of dictionary in their dictionary.

robertclaus|5 days ago

Isn't this the difference between a dictionary and an encyclopedia?

tolerance|5 days ago

I was afraid that no one would bring this up. I’m developing a strange relationship with Wikipedia over the commonplace role it serves as an online resource. But I appreciate how it normalized the practice of looking things up and to get a general overview of a thing, leading to internal and external references. Credit is due to search engines also in this case I reckon.

invalidusernam3|5 days ago

If the compound words all have single word entries in the dictionary that when combined mean the same thing what is the point?

Water: transparent, odorless, tasteless liquid

Boiling: having reached the boiling point

Boiling Water: transparent, odorless, tasteless liquid which has reached the boiling point

If Boiling Water had some other completely different meaning that has nothing to do with the individual words then sure, maybe, otherwise this is completely redundant and opinionated.

ndsipa_pomu|5 days ago

As an English native, I'd rather see proper nouns in a dictionary before seeing "compound words".

Personally, I don't agree that "boiling water" is a word (with a space) - I would refer to it as a phrase if it had specific meaning, but it just seems like an ordinary pairing of adjective and noun. Also, if a word can contain a space, then what is the meaning of "words" as there doesn't seem an easy way to distinguish between a "compound word" and a common phrase. Is "barking dog" a pair of words, a compound word or a phrase? (It's a pair of words in my mind)

aaroninsf|7 days ago

With Twain in mind, might I suggest we adopt the simple expedient of snake casing such terms.

pvillano|7 days ago

Finally, someone who actually thought about where to draw the line instead of rejecting words with spaces entirely.

gcr|5 days ago

sometimes singular semantic concepts can take multiple syntactic words to express. Why not call this idea something other than “word”?

bombcar|5 days ago

We could call it a "phrase".

johnhamlin|7 days ago

Fascinating! I’d add “word nerd” to the list to describe the authors.

michaeld123|5 days ago

I'm the author. Updated the article based on this thread — thanks to everyone who pushed back. Changes: reframed "boiling water" as deliberately on-the-fence rather than asserting it's a word; added a note that the obscure end of the slider is noise; acknowledged collocations as the established term; added a German/Dutch/Norwegian section on how other languages handle the space problem; softened "wasn't possible before LLMs" to "wasn't practical"; and threaded the concept of "loaded" throughout as the key distinction. Many of the specific examples came directly from commenters here — credited below.

s1mon|5 days ago

I'd really love to see the prompt(s) you used with Claude. The way the article was written I mistakenly thought you would expand upon that in a footnote or sidebar.

anotherhue|7 days ago

Clearly those Irish monks are to blame.

grantpitt|7 days ago

Very cool project! Reminds me Chiang's great short story 'The Truth of Fact, the Truth of Feeling':

> “If you speak slowly, you pause very briefly after each word. Thatʼs why we leave a space in those places when we write. Like this: How. Many. Years. Old. Are. You?” He wrote on his paper as he spoke, leaving a space every time he paused: Anyom a ou kuma a me?

> “But you speak slowly because youʼre a foreigner. Iʼm Tiv, so I donʼt pause when I speak. Shouldnʼt my writing be the same?”

bbstats|5 days ago

these are called phrases

riffraff|5 days ago

"book steaks" is in the list, but I don't think it' real. Maybe it was supposed to be "stack".

hmokiguess|7 days ago

On another note, I always wished "never mind" was spelled "nevermind"

pvillano|7 days ago

"Each other" is like that for me, and according to search results, a lot of other people. I pronounce it ee-chother.

"Eachother" feels as natural as "somebody", "nobody", "anybody" to me

JackFr|7 days ago

"Opaque MWE"? Does no one know the word "idiom"?