Nowadays, on macOS, "dict://Internet" will open the Dictionary app with the query "Internet". (Probably behind a security prompt.) Not sure, if there's similar functionality on other operating systems.
I admire these old protocols that are intentionally built to be usable both by machines and humans. Like the combination of a response code, and a human readable explanation. A help command right in the protocol.
Makes me think it's a shame that making a json based protocol is so much easier to whip up than a textual ebnf-specified protocol.
Like imagine if in python there was some library that you gave an ebnf spec maybe with some extra features similar to regex (named groups?) and you could compile it to a state machine and use it to parse documents, getting a Dict out.
> Makes me think it's a shame that making a json based protocol is so much…
Maybe I'm not the human you are thinking of, being a techie, but I find a well structured JSON response, as long as it isn't overly verbose and is presented in open form rather than minified, to be a good compromise of human readable and easy to digest programmatically.
Unfortunately, in practice they are a nightmare. Look at the WHOIS protocol for an example.
Humans don’t look at responses very much, so you should optimise for machines. If you want a human-readable view, then turn the JSON response into something readable.
maybe we could have a format that was more human-readable than json (or especially xml) but still reliably emittable and parseable? yaml, maybe, or toml, although i'm not that enthusiastic about them. another proposal for such a thing was ogdl (https://ogdl.org/), a notation for what i think are called rose trees
> OGDL: Ordered Graph Data Language
> A simple and readable data format, for humans and machines alike.
> OGDL is a structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation.
their example:
network
eth0
ip 192.168.0.10
mask 255.255.255.0
gw 192.168.0.1
hostname crispin
In my department's (we were formerly our own company) internal framework throwing .html on the end of any JSON response outputs it in nested HTML tables. I personally find it very helpful.
The protocols that have a response code with an explanation is helpful. A help command is also helpful. So, I had written NNTP server that does that, and the IRC and NNTP client software I use can display them.
> Makes me think it's a shame that making a json based protocol is so much easier to whip up ...
I personally don't; I find I can easily work with text-based protocols if the format is easily enough.
I think there are problems with JSON. Some of the problems are: it requires parsing escapes and keys/values, does not properly support character sets other than Unicode, cannot work with binary data unless it is encoded using base64 or hex or something else (which makes it inefficient), etc. There are other problems too.
> Like imagine if in python there was some library that you gave an ebnf spec ...
Maybe it is possible to add such a library in Python, if there is not already such things.
REST (REpresentational State Transfer) as a concept is very human orientated. The idea was a sort of academic abstraction of html. but it can be boiled down to: when you send a response, also send the entire application needed to handle that response. It is unfortunate that collectively we had a sort of brain fart and said "ok, REST == http, got it" and lost the rest of the interesting discussion about what it means to send the representational state of the process.
> in an age of low-size disk drives and expensive software, looking up data over a dedicated protocol seems like a nifty2 idea.
Then disk size exploded, databases became cheap, and search engines made it easy to look up words.
I love this particular part of history about How protocols and applications got build based on restrictions and got evolved after improvements. Similar examples exists everywhere in computer history. Projecting the same with LLMs, we will have AIs running locally on mobile devices or perhaps AIs replacing OS of mobile devices and router protocols and servers.
In future HN people looking at the code and feeling nostalgic about writing code
But on the other hand, for some applications, disk requirements exploded as well and require dedicated protocols and servers for it; for example Google's monorepo, or the latest Flight Simulator, the 2024 version will take up about 23 GB as the base install and stream everything else - including the world, planes, landmarks, live real-world ship and plane locations, etc - off the internet. Because the whole world just won't fit on any regular hard drive.
> Projecting the same with LLMs, we will have AIs running locally on mobile devices
That’s not much of a projection. That’s been announced for months as coming to iPhones. Sure, they’re not the biggest models, but no one doubts more will be possible.
> or perhaps AIs replacing OS of mobile devices and router protocols and servers.
Holy shit, please no. There’s no sane reason for that to happen. Why would you replace stable nimble systems which depend on being predictable and low power with a system that’s statistical and consumes tons of resources? That’s unfettered hype-chasing. Let’s please not turn our brains off just yet.
Imagine if dict://internet was renamed to agent://source, then agentic calls to model sources could interconnect with ease. With HTTP/3, one stream could be the streaming browser and the other streams could be multi-agent sessions.
I recently began testing my own public `dictd` server. The main goal was to make the OED (the full and proper one) available outside of a university proxy. I figured I would add the Webster's 1913 one too.
Unfortunately the vast majority of dictionary files are in "stardict" format and the conversion to "dict" has yielded mixed results. I was hoping to host _every_ dictionary, good and bad, but will walk that back now. A free VPS could at least run the OED.
what's the stardict format? which edition of the oed are you hosting? i scanned the first edition decades ago but i don't think there's a reasonable plain-text version of it yet
Wow, either I've forgotten this existed, or had no clue, I was around for this era, and I remember Veronica, Archie, WAIS, Gopher, etc, but never recall reading about a Dict protocol, nice find!
I've been aware of dict for a while since I wrapped up an esperanto to english dictionary for KOReader in a format KOReader could understand. What I'd really have liked is a format like this:
Oh yes, I remember dictionary servers. Also many other protocols.
What happened to all of those other protocols? Everything got squished onto http(s) for various reasons. As mentioned in this thread, corporate firewalls blocking every other port except 80 and 443. Around the time of the invention of http, protocols were proliferating for all kinds of new ideas. Today "innovation" happens on top of http, which devolves into some new kind of format to push back and forth.
I wouldn't place all the blame on corporate IT for low level protocols dying out. A lot of corporate IT filtering was a reaction to malicious traffic originating from inside their networks.
I think filtering on university networks killed more protocols than corporate filtering. Corporate networks were rarely the place where someone stuck a server in the corner with a public IP hosting a bunch of random services. That however was very common in university networks.
When university networks (early 00s or so) started putting NAT on ResNets and filtering faculty networks is when a lot of random Internet servers started drying up. Universities had huge IPv4 blocks and would hand out their addresses to every machine on their networks. More than a few Web 1.0 companies started life on a random Sun machine in dorm rooms or the corner of a university computer lab.
When publicly routed IPs dried up so did random FTPs and small IRC servers. At the same time residential broadband was taking off but so were the sales of home routers with NAT. Hosting random raw socket protocols stopped being practical for a lot of people. By the time low cost VPSes became available a lot of old protocols had already died out.
dict is a must for me on any daily driver. Removing the friction of opening a web browser or specialized app and just looking up text from the terminal is just so nice. Just like bc, you don't miss it when you don't know it's there, but once you get used to using it you can't live without. Making custom dictionaries is not very well documented though.
I love dict/dictd but I had an issue using it in hostile networks that block the port/protocol.
I've been tempted to revamp dict/dictd to shovel the dict protocol over websokets so I can use it over the web. Just one of those ideas in the pipeline that I haven't revisited because I'm no longer dealing with that hostile network.
The dict protocol really show it's age, notably the stateful connection part. Having a new protocol based on HTTP and JSON similar to LSP would be nice but there is no real interest. (I made and used my own nonetheless in a research project. It may even be deployed but desactivated in another one)
This biggest issue isn't technical, it's the fact organizations having dictionary data don't want third-party to interact with it without paid licensing.
dict and the relevant dictionaries are things i pretty much always install on every new laptop. gcide in particular includes most of the famous 1913 webster dictionary with its sparkling prose:
: ~; dict glisten
2 definitions found
From The Collaborative International Dictionary of English v.0.48 [gcide]:
Glisten \Glis"ten\ (gl[i^]s"'n), v. i. [imp. & p. p.
{Glistened}; p. pr. & vb. n. {Glistening}.] [OE. glistnian,
akin to glisnen, glisien, AS. glisian, glisnian, akin to E.
glitter. See {Glitter}, v. i., and cf. {Glister}, v. i.]
To sparkle or shine; especially, to shine with a mild,
subdued, and fitful luster; to emit a soft, scintillating
light; to gleam; as, the glistening stars.
Syn: See {Flash}.
[1913 Webster]
it's interesting to think about how you would implement this service efficiently under the constraints of mid-01990s computers, where a gigabyte was still a lot of disk space and multiuser unix servers commonly had about 100 mips (https://netlib.org/performance/html/dhrystone.data.col0.html)
totally by coincidence i was looking at the dictzip man page this morning; it produces gzip-compatible files that support random seeks so you can keep the database for your dictd server compressed. (as far as i know, rik faith's dictd is still the only server implementation of the dict protocol, which is incidentally not a very good protocol.) you can see that the penalty for seekability is about 6% in this case:
nowadays computers are fast enough that it probably isn't a big win to gzip in such small chunks (dictzip has a chunk limit of 64k) and you might as well use a zipfile, all implementations of which support random access:
: ~; mkdir jargsplit
: ~; cd jargsplit
: jargsplit; gzip -dc /usr/share/dictd/jargon.dict.dz|split -b256K
: jargsplit; zip jargon.zip xaa xab xac xad xae xaf
adding: xaa (deflated 60%)
adding: xab (deflated 59%)
adding: xac (deflated 59%)
adding: xad (deflated 61%)
adding: xae (deflated 62%)
adding: xaf (deflated 58%)
: jargsplit; ls -l jargon.zip
-rw-r--r-- 1 user user 565968 Sep 22 09:47 jargon.zip
: jargsplit; time unzip -o jargon.zip xad
Archive: jargon.zip
inflating: xad
real 0m0.011s
user 0m0.000s
sys 0m0.011s
so you see 256-kibibyte chunks have submillisecond decompression time (more like 2 milliseconds on my cellphone) and only about a 1.8% size penalty for seekability:
: jargsplit; units -t 565968/556102 %
101.77413
and, unlike the dictzip format (which lists the chunks in an extra backward-combatible file header), zip also supports efficient appending
even in python (3.11.2) it's only about a millisecond:
In [13]: z = zipfile.ZipFile('jargon.zip')
In [14]: [f.filename for f in z.infolist()]
Out[14]: ['xaa', 'xab', 'xac', 'xad', 'xae', 'xaf']
In [15]: %timeit z.open('xab').read()
1.13 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
this kind of performance means that any algorithm that would be efficient reading data stored on a conventional spinning-rust disk will be efficient reading compressed data if you put the data into a zipfile in "files" of around a meg each. (writing is another matter; zstd may help here, with its order-of-magnitude faster compression, but info-zip zip and unzip don't support zstd yet.)
dictd keeps an index file in tsv format which uses what looks like base64 to locate the desired chunk and offset in the chunk:
strfile from the fortune package works on a similar principle but uses a binary data file and no keys, just offsets:
: ~; wget -nv canonical.org/~kragen/quotes.txt
2024-09-22 10:44:50 URL:http://canonical.org/~kragen/quotes.txt [49884/49884] -> "quotes.txt" [1]
: ~; strfile quotes.txt
"quotes.txt.dat" created
There were 87 strings
Longest string: 1625 bytes
Shortest string: 92 bytes
: ~; fortune quotes.txt
Get enough beyond FUM [Fuck You Money], and it's merely Nice To Have
Money.
-- Dave Long, <[email protected]>, on FoRK, around 2000-08-16, in
Message-ID <200008162000.NAA10898@maltesecat>
: ~; od -i --endian=big quotes.txt.dat
0000000 2 87 1625 92
0000020 0 620756992 0 933
0000040 1460 2307 2546 3793
0000060 3887 4149 5160 5471
0000100 5661 6185 6616 7000
of course if you were using a zipfile you could keep the index in the zipfile itself, and then there's no point in using base64 for the file offsets, or limiting them to 32 bits
There was also a translation server called Babylon that used a similar raw text protocol (like WHOIS, and DICT here) in 1998. I remember adding it to my IRC script, but it must have stopped working at some point that I had replaced it with "babelfish.altavista.com" :)
[+] [-] masswerk|1 year ago|reply
[+] [-] dmd|1 year ago|reply
[+] [-] im3w1l|1 year ago|reply
Makes me think it's a shame that making a json based protocol is so much easier to whip up than a textual ebnf-specified protocol.
Like imagine if in python there was some library that you gave an ebnf spec maybe with some extra features similar to regex (named groups?) and you could compile it to a state machine and use it to parse documents, getting a Dict out.
[+] [-] dspillett|1 year ago|reply
Maybe I'm not the human you are thinking of, being a techie, but I find a well structured JSON response, as long as it isn't overly verbose and is presented in open form rather than minified, to be a good compromise of human readable and easy to digest programmatically.
[+] [-] orf|1 year ago|reply
Humans don’t look at responses very much, so you should optimise for machines. If you want a human-readable view, then turn the JSON response into something readable.
[+] [-] kragen|1 year ago|reply
> OGDL: Ordered Graph Data Language
> A simple and readable data format, for humans and machines alike.
> OGDL is a structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation.
their example:
another possibility is jevko; https://jevko.org/ describes it and http://canonical.org/~kragen/rose/ are some of my notes about the possibilities of similar rose-tree data formats[+] [-] donatj|1 year ago|reply
[+] [-] zzo38computer|1 year ago|reply
The protocols that have a response code with an explanation is helpful. A help command is also helpful. So, I had written NNTP server that does that, and the IRC and NNTP client software I use can display them.
> Makes me think it's a shame that making a json based protocol is so much easier to whip up ...
I personally don't; I find I can easily work with text-based protocols if the format is easily enough.
I think there are problems with JSON. Some of the problems are: it requires parsing escapes and keys/values, does not properly support character sets other than Unicode, cannot work with binary data unless it is encoded using base64 or hex or something else (which makes it inefficient), etc. There are other problems too.
> Like imagine if in python there was some library that you gave an ebnf spec ...
Maybe it is possible to add such a library in Python, if there is not already such things.
[+] [-] somat|1 year ago|reply
[+] [-] sixdimensional|1 year ago|reply
Fairly simple and somewhat fun.. Python has PEG parsing built in, but also the pyparsing or parsimonious modules too.
I have built EDI X12 parsers and toy languages with this.
[1] https://en.wikipedia.org/wiki/Parsing_expression_grammar
[+] [-] fouc|1 year ago|reply
[+] [-] praveen9920|1 year ago|reply
I love this particular part of history about How protocols and applications got build based on restrictions and got evolved after improvements. Similar examples exists everywhere in computer history. Projecting the same with LLMs, we will have AIs running locally on mobile devices or perhaps AIs replacing OS of mobile devices and router protocols and servers.
In future HN people looking at the code and feeling nostalgic about writing code
[+] [-] Cthulhu_|1 year ago|reply
[+] [-] latexr|1 year ago|reply
That’s not much of a projection. That’s been announced for months as coming to iPhones. Sure, they’re not the biggest models, but no one doubts more will be possible.
> or perhaps AIs replacing OS of mobile devices and router protocols and servers.
Holy shit, please no. There’s no sane reason for that to happen. Why would you replace stable nimble systems which depend on being predictable and low power with a system that’s statistical and consumes tons of resources? That’s unfettered hype-chasing. Let’s please not turn our brains off just yet.
[+] [-] mycall|1 year ago|reply
[+] [-] 38|1 year ago|reply
[+] [-] zenoprax|1 year ago|reply
Unfortunately the vast majority of dictionary files are in "stardict" format and the conversion to "dict" has yielded mixed results. I was hoping to host _every_ dictionary, good and bad, but will walk that back now. A free VPS could at least run the OED.
[+] [-] tomsmeding|1 year ago|reply
Was the plan to do this in a legal fashion? If so, how?
[+] [-] kragen|1 year ago|reply
[+] [-] wormius|1 year ago|reply
[+] [-] hkt|1 year ago|reply
dict://<server/<origin language>/<definition language>/<word>
Still, it is pretty cool that dict servers exist at all, so no complaints here.
[+] [-] cratermoon|1 year ago|reply
What happened to all of those other protocols? Everything got squished onto http(s) for various reasons. As mentioned in this thread, corporate firewalls blocking every other port except 80 and 443. Around the time of the invention of http, protocols were proliferating for all kinds of new ideas. Today "innovation" happens on top of http, which devolves into some new kind of format to push back and forth.
[+] [-] giantrobot|1 year ago|reply
I think filtering on university networks killed more protocols than corporate filtering. Corporate networks were rarely the place where someone stuck a server in the corner with a public IP hosting a bunch of random services. That however was very common in university networks.
When university networks (early 00s or so) started putting NAT on ResNets and filtering faculty networks is when a lot of random Internet servers started drying up. Universities had huge IPv4 blocks and would hand out their addresses to every machine on their networks. More than a few Web 1.0 companies started life on a random Sun machine in dorm rooms or the corner of a university computer lab.
When publicly routed IPs dried up so did random FTPs and small IRC servers. At the same time residential broadband was taking off but so were the sales of home routers with NAT. Hosting random raw socket protocols stopped being practical for a lot of people. By the time low cost VPSes became available a lot of old protocols had already died out.
[+] [-] nunobrito|1 year ago|reply
I'll then be writing a java server for DICT. Likely add more recent types of dictionaries and acronyms to help keeping it alive.
[+] [-] t-3|1 year ago|reply
[+] [-] commandersaki|1 year ago|reply
I've been tempted to revamp dict/dictd to shovel the dict protocol over websokets so I can use it over the web. Just one of those ideas in the pipeline that I haven't revisited because I'm no longer dealing with that hostile network.
[+] [-] gwervc|1 year ago|reply
This biggest issue isn't technical, it's the fact organizations having dictionary data don't want third-party to interact with it without paid licensing.
[+] [-] divbzero|1 year ago|reply
> 00. Are there any other Dictionary Servers still available on the Internet?
There are a number of other dict: servers including ones for different languages:
https://servers.freedict.org/
[+] [-] kragen|1 year ago|reply
totally by coincidence i was looking at the dictzip man page this morning; it produces gzip-compatible files that support random seeks so you can keep the database for your dictd server compressed. (as far as i know, rik faith's dictd is still the only server implementation of the dict protocol, which is incidentally not a very good protocol.) you can see that the penalty for seekability is about 6% in this case:
nowadays computers are fast enough that it probably isn't a big win to gzip in such small chunks (dictzip has a chunk limit of 64k) and you might as well use a zipfile, all implementations of which support random access: so you see 256-kibibyte chunks have submillisecond decompression time (more like 2 milliseconds on my cellphone) and only about a 1.8% size penalty for seekability: and, unlike the dictzip format (which lists the chunks in an extra backward-combatible file header), zip also supports efficient appendingeven in python (3.11.2) it's only about a millisecond:
this kind of performance means that any algorithm that would be efficient reading data stored on a conventional spinning-rust disk will be efficient reading compressed data if you put the data into a zipfile in "files" of around a meg each. (writing is another matter; zstd may help here, with its order-of-magnitude faster compression, but info-zip zip and unzip don't support zstd yet.)dictd keeps an index file in tsv format which uses what looks like base64 to locate the desired chunk and offset in the chunk:
this is very similar to the index format used by eric raymond's volks-hypertext https://www.ibiblio.org/pub/Linux/apps/doctools/vh-1.8.tar.g... or vi ctags or emacs etags, but it supports random access into the filestrfile from the fortune package works on a similar principle but uses a binary data file and no keys, just offsets:
of course if you were using a zipfile you could keep the index in the zipfile itself, and then there's no point in using base64 for the file offsets, or limiting them to 32 bits[+] [-] heystefan|1 year ago|reply
(If not possible, Terminal would work too.)
[+] [-] dokyun|1 year ago|reply
[+] [-] divbzero|1 year ago|reply
[+] [-] sedatk|1 year ago|reply
[+] [-] anthk|1 year ago|reply
[+] [-] fitsumbelay|1 year ago|reply
[+] [-] mogoh|1 year ago|reply
[+] [-] fallingsquirrel|1 year ago|reply
[+] [-] kgbcia|1 year ago|reply
[+] [-] therein|1 year ago|reply
[+] [-] nashashmi|1 year ago|reply