HTTP: , FTP:, and Dict:?

[+] masswerk|1 year ago|reply

Nowadays, on macOS, "dict://Internet" will open the Dictionary app with the query "Internet". (Probably behind a security prompt.) Not sure, if there's similar functionality on other operating systems.

[+] dmd|1 year ago|reply

What do you mean by "behind a security prompt"?

[+] im3w1l|1 year ago|reply

I admire these old protocols that are intentionally built to be usable both by machines and humans. Like the combination of a response code, and a human readable explanation. A help command right in the protocol.

Makes me think it's a shame that making a json based protocol is so much easier to whip up than a textual ebnf-specified protocol.

Like imagine if in python there was some library that you gave an ebnf spec maybe with some extra features similar to regex (named groups?) and you could compile it to a state machine and use it to parse documents, getting a Dict out.

[+] dspillett|1 year ago|reply

> Makes me think it's a shame that making a json based protocol is so much…

Maybe I'm not the human you are thinking of, being a techie, but I find a well structured JSON response, as long as it isn't overly verbose and is presented in open form rather than minified, to be a good compromise of human readable and easy to digest programmatically.

[+] orf|1 year ago|reply

Unfortunately, in practice they are a nightmare. Look at the WHOIS protocol for an example.

Humans don’t look at responses very much, so you should optimise for machines. If you want a human-readable view, then turn the JSON response into something readable.

[+] kragen|1 year ago|reply

maybe we could have a format that was more human-readable than json (or especially xml) but still reliably emittable and parseable? yaml, maybe, or toml, although i'm not that enthusiastic about them. another proposal for such a thing was ogdl (https://ogdl.org/), a notation for what i think are called rose trees

> OGDL: Ordered Graph Data Language

> A simple and readable data format, for humans and machines alike.

> OGDL is a structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation.

their example:

    network
      eth0
        ip   192.168.0.10
        mask 255.255.255.0
        gw   192.168.0.1

    hostname crispin

another possibility is jevko; https://jevko.org/ describes it and http://canonical.org/~kragen/rose/ are some of my notes about the possibilities of similar rose-tree data formats

[+] donatj|1 year ago|reply

In my department's (we were formerly our own company) internal framework throwing .html on the end of any JSON response outputs it in nested HTML tables. I personally find it very helpful.

[+] zzo38computer|1 year ago|reply

> I admire these old protocols ...

The protocols that have a response code with an explanation is helpful. A help command is also helpful. So, I had written NNTP server that does that, and the IRC and NNTP client software I use can display them.

> Makes me think it's a shame that making a json based protocol is so much easier to whip up ...

I personally don't; I find I can easily work with text-based protocols if the format is easily enough.

I think there are problems with JSON. Some of the problems are: it requires parsing escapes and keys/values, does not properly support character sets other than Unicode, cannot work with binary data unless it is encoded using base64 or hex or something else (which makes it inefficient), etc. There are other problems too.

> Like imagine if in python there was some library that you gave an ebnf spec ...

Maybe it is possible to add such a library in Python, if there is not already such things.

[+] somat|1 year ago|reply

REST (REpresentational State Transfer) as a concept is very human orientated. The idea was a sort of academic abstraction of html. but it can be boiled down to: when you send a response, also send the entire application needed to handle that response. It is unfortunate that collectively we had a sort of brain fart and said "ok, REST == http, got it" and lost the rest of the interesting discussion about what it means to send the representational state of the process.

[+] sixdimensional|1 year ago|reply

May I humbly submit “parsing expression grammars”[1] for your consideration?

Fairly simple and somewhat fun.. Python has PEG parsing built in, but also the pyparsing or parsimonious modules too.

I have built EDI X12 parsers and toy languages with this.

[1] https://en.wikipedia.org/wiki/Parsing_expression_grammar

[+] fouc|1 year ago|reply

textual ebnf-specified protocol > json

[+] praveen9920|1 year ago|reply

> in an age of low-size disk drives and expensive software, looking up data over a dedicated protocol seems like a nifty2 idea. Then disk size exploded, databases became cheap, and search engines made it easy to look up words.

I love this particular part of history about How protocols and applications got build based on restrictions and got evolved after improvements. Similar examples exists everywhere in computer history. Projecting the same with LLMs, we will have AIs running locally on mobile devices or perhaps AIs replacing OS of mobile devices and router protocols and servers.

In future HN people looking at the code and feeling nostalgic about writing code

[+] Cthulhu_|1 year ago|reply

But on the other hand, for some applications, disk requirements exploded as well and require dedicated protocols and servers for it; for example Google's monorepo, or the latest Flight Simulator, the 2024 version will take up about 23 GB as the base install and stream everything else - including the world, planes, landmarks, live real-world ship and plane locations, etc - off the internet. Because the whole world just won't fit on any regular hard drive.

[+] latexr|1 year ago|reply

> Projecting the same with LLMs, we will have AIs running locally on mobile devices

That’s not much of a projection. That’s been announced for months as coming to iPhones. Sure, they’re not the biggest models, but no one doubts more will be possible.

> or perhaps AIs replacing OS of mobile devices and router protocols and servers.

Holy shit, please no. There’s no sane reason for that to happen. Why would you replace stable nimble systems which depend on being predictable and low power with a system that’s statistical and consumes tons of resources? That’s unfettered hype-chasing. Let’s please not turn our brains off just yet.

[+] mycall|1 year ago|reply

Imagine if dict://internet was renamed to agent://source, then agentic calls to model sources could interconnect with ease. With HTTP/3, one stream could be the streaming browser and the other streams could be multi-agent sessions.

[+] 38|1 year ago|reply

Given that most current AI generated code is dogshit, I would say we are well off from that.

[+] zenoprax|1 year ago|reply

I recently began testing my own public `dictd` server. The main goal was to make the OED (the full and proper one) available outside of a university proxy. I figured I would add the Webster's 1913 one too.

Unfortunately the vast majority of dictionary files are in "stardict" format and the conversion to "dict" has yielded mixed results. I was hoping to host _every_ dictionary, good and bad, but will walk that back now. A free VPS could at least run the OED.

[+] tomsmeding|1 year ago|reply

> to make the OED (the full and proper one) available outside of a university proxy.

Was the plan to do this in a legal fashion? If so, how?

[+] kragen|1 year ago|reply

what's the stardict format? which edition of the oed are you hosting? i scanned the first edition decades ago but i don't think there's a reasonable plain-text version of it yet

[+] wormius|1 year ago|reply

Wow, either I've forgotten this existed, or had no clue, I was around for this era, and I remember Veronica, Archie, WAIS, Gopher, etc, but never recall reading about a Dict protocol, nice find!

[+] hkt|1 year ago|reply

I've been aware of dict for a while since I wrapped up an esperanto to english dictionary for KOReader in a format KOReader could understand. What I'd really have liked is a format like this:

dict://<server/<origin language>/<definition language>/<word>

Still, it is pretty cool that dict servers exist at all, so no complaints here.

[+] cratermoon|1 year ago|reply

Oh yes, I remember dictionary servers. Also many other protocols.

What happened to all of those other protocols? Everything got squished onto http(s) for various reasons. As mentioned in this thread, corporate firewalls blocking every other port except 80 and 443. Around the time of the invention of http, protocols were proliferating for all kinds of new ideas. Today "innovation" happens on top of http, which devolves into some new kind of format to push back and forth.

[+] giantrobot|1 year ago|reply

I wouldn't place all the blame on corporate IT for low level protocols dying out. A lot of corporate IT filtering was a reaction to malicious traffic originating from inside their networks.

I think filtering on university networks killed more protocols than corporate filtering. Corporate networks were rarely the place where someone stuck a server in the corner with a public IP hosting a bunch of random services. That however was very common in university networks.

When university networks (early 00s or so) started putting NAT on ResNets and filtering faculty networks is when a lot of random Internet servers started drying up. Universities had huge IPv4 blocks and would hand out their addresses to every machine on their networks. More than a few Web 1.0 companies started life on a random Sun machine in dorm rooms or the corner of a university computer lab.

When publicly routed IPs dried up so did random FTPs and small IRC servers. At the same time residential broadband was taking off but so were the sales of home routers with NAT. Hosting random raw socket protocols stopped being practical for a lot of people. By the time low cost VPSes became available a lot of old protocols had already died out.

[+] nunobrito|1 year ago|reply

Nice find, didn't knew the protocol either. The site lists all available dictionaries here: https://dict.org/bin/Dict?Form=Dict4

I'll then be writing a java server for DICT. Likely add more recent types of dictionaries and acronyms to help keeping it alive.

[+] t-3|1 year ago|reply

dict is a must for me on any daily driver. Removing the friction of opening a web browser or specialized app and just looking up text from the terminal is just so nice. Just like bc, you don't miss it when you don't know it's there, but once you get used to using it you can't live without. Making custom dictionaries is not very well documented though.

[+] commandersaki|1 year ago|reply

I love dict/dictd but I had an issue using it in hostile networks that block the port/protocol.

I've been tempted to revamp dict/dictd to shovel the dict protocol over websokets so I can use it over the web. Just one of those ideas in the pipeline that I haven't revisited because I'm no longer dealing with that hostile network.

[+] gwervc|1 year ago|reply

The dict protocol really show it's age, notably the stateful connection part. Having a new protocol based on HTTP and JSON similar to LSP would be nice but there is no real interest. (I made and used my own nonetheless in a research project. It may even be deployed but desactivated in another one)

This biggest issue isn't technical, it's the fact organizations having dictionary data don't want third-party to interact with it without paid licensing.

[+] divbzero|1 year ago|reply

Regarding OP’s unanswered question:

> 00. Are there any other Dictionary Servers still available on the Internet?

There are a number of other dict: servers including ones for different languages:

https://servers.freedict.org/

[+] kragen|1 year ago|reply

dict and the relevant dictionaries are things i pretty much always install on every new laptop. gcide in particular includes most of the famous 1913 webster dictionary with its sparkling prose:

    : ~; dict glisten
    2 definitions found

    From The Collaborative International Dictionary of English v.0.48 [gcide]:

      Glisten \Glis"ten\ (gl[i^]s"'n), v. i. [imp. & p. p.
         {Glistened}; p. pr. & vb. n. {Glistening}.] [OE. glistnian,
         akin to glisnen, glisien, AS. glisian, glisnian, akin to E.
         glitter. See {Glitter}, v. i., and cf. {Glister}, v. i.]
         To sparkle or shine; especially, to shine with a mild,
         subdued, and fitful luster; to emit a soft, scintillating
         light; to gleam; as, the glistening stars.

         Syn: See {Flash}.
              [1913 Webster]

it's interesting to think about how you would implement this service efficiently under the constraints of mid-01990s computers, where a gigabyte was still a lot of disk space and multiuser unix servers commonly had about 100 mips (https://netlib.org/performance/html/dhrystone.data.col0.html)

totally by coincidence i was looking at the dictzip man page this morning; it produces gzip-compatible files that support random seeks so you can keep the database for your dictd server compressed. (as far as i know, rik faith's dictd is still the only server implementation of the dict protocol, which is incidentally not a very good protocol.) you can see that the penalty for seekability is about 6% in this case:

    : ~; ls -l /usr/share/dictd/jargon.dict.dz
    -rw-r--r-- 1 root root 587377 Jan  1  2021 /usr/share/dictd/jargon.dict.dz
    : ~; \time gzip -dc /usr/share/dictd/jargon.dict.dz|wc -c
    0.01user 0.00system 0:00.01elapsed 100%CPU (0avgtext+0avgdata 1624maxresident)k
    0inputs+0outputs (0major+160minor)pagefaults 0swaps
    1418350
    : ~; gzip -dc /usr/share/dictd/jargon.dict.dz|gzip -9c|wc -c
    556102
    : ~; units -t 587377/556102 %
    105.62397

nowadays computers are fast enough that it probably isn't a big win to gzip in such small chunks (dictzip has a chunk limit of 64k) and you might as well use a zipfile, all implementations of which support random access:

    : ~; mkdir jargsplit
    : ~; cd jargsplit
    : jargsplit; gzip -dc /usr/share/dictd/jargon.dict.dz|split -b256K
    : jargsplit; zip jargon.zip xaa xab xac xad xae xaf 
      adding: xaa (deflated 60%)
      adding: xab (deflated 59%)
      adding: xac (deflated 59%)
      adding: xad (deflated 61%)
      adding: xae (deflated 62%)
      adding: xaf (deflated 58%)
    : jargsplit; ls -l jargon.zip 
    -rw-r--r-- 1 user user 565968 Sep 22 09:47 jargon.zip
    : jargsplit; time unzip -o jargon.zip xad
    Archive:  jargon.zip
      inflating: xad                     

    real    0m0.011s
    user    0m0.000s
    sys     0m0.011s

so you see 256-kibibyte chunks have submillisecond decompression time (more like 2 milliseconds on my cellphone) and only about a 1.8% size penalty for seekability:

    : jargsplit; units -t 565968/556102 %
    101.77413

and, unlike the dictzip format (which lists the chunks in an extra backward-combatible file header), zip also supports efficient appending

even in python (3.11.2) it's only about a millisecond:

    In [13]: z = zipfile.ZipFile('jargon.zip')

    In [14]: [f.filename for f in z.infolist()]
    Out[14]: ['xaa', 'xab', 'xac', 'xad', 'xae', 'xaf']

    In [15]: %timeit z.open('xab').read()
    1.13 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

this kind of performance means that any algorithm that would be efficient reading data stored on a conventional spinning-rust disk will be efficient reading compressed data if you put the data into a zipfile in "files" of around a meg each. (writing is another matter; zstd may help here, with its order-of-magnitude faster compression, but info-zip zip and unzip don't support zstd yet.)

dictd keeps an index file in tsv format which uses what looks like base64 to locate the desired chunk and offset in the chunk:

    : jargsplit; < /usr/share/dictd/jargon.index shuf -n 4 | LANG=C sort | cat -vte
    fossil^IB9xE^IL8$
    frednet^IB+q5^IDD$
    upload^IE/t5^IJ1$
    warez d00dz^IFLif^In0$

this is very similar to the index format used by eric raymond's volks-hypertext https://www.ibiblio.org/pub/Linux/apps/doctools/vh-1.8.tar.g... or vi ctags or emacs etags, but it supports random access into the file

strfile from the fortune package works on a similar principle but uses a binary data file and no keys, just offsets:

    : ~; wget -nv canonical.org/~kragen/quotes.txt
    2024-09-22 10:44:50 URL:http://canonical.org/~kragen/quotes.txt [49884/49884] -> "quotes.txt" [1]
    : ~; strfile quotes.txt
    "quotes.txt.dat" created
    There were 87 strings
    Longest string: 1625 bytes
    Shortest string: 92 bytes
    : ~; fortune quotes.txt
      Get enough beyond FUM [Fuck You Money], and it's merely Nice To Have
        Money.

            -- Dave Long, <[email protected]>, on FoRK, around 2000-08-16, in
               Message-ID <200008162000.NAA10898@maltesecat>
    : ~; od -i --endian=big quotes.txt.dat 
    0000000           2          87        1625          92
    0000020           0   620756992           0         933
    0000040        1460        2307        2546        3793
    0000060        3887        4149        5160        5471
    0000100        5661        6185        6616        7000

of course if you were using a zipfile you could keep the index in the zipfile itself, and then there's no point in using base64 for the file offsets, or limiting them to 32 bits

[+] heystefan|1 year ago|reply

So, can I somehow use the 1913 Webster dictionary on MacOS? It's not in the list of configurable ones.

(If not possible, Terminal would work too.)

[+] dokyun|1 year ago|reply

Emacs includes a browsable client for this protocol; you can use it with `M-x dictionary`.

[+] divbzero|1 year ago|reply

The DICT Development Group also provides a dedicated dict: client:

  sudo apt install dict

  brew install dict

Which allows you to query dict://dict.org/ directly:

  dict foo

[+] sedatk|1 year ago|reply

There was also a translation server called Babylon that used a similar raw text protocol (like WHOIS, and DICT here) in 1998. I remember adding it to my IRC script, but it must have stopped working at some point that I had replaced it with "babelfish.altavista.com" :)

[+] anthk|1 year ago|reply

           echo "define * hacker " | nc dict.org 2628 | less

[+] fitsumbelay|1 year ago|reply

super fascinating and potentially useful for future projects with or w/o AI. obviously makes me want to maintain my own dict service love this

[+] mogoh|1 year ago|reply

hmmm

  $>curl dict://dict.org/d:Internet
  curl: (1) Protocol "dict" not supported

[+] fallingsquirrel|1 year ago|reply

Works for me. I bet your OS ships a crippled version of curl.

  $ curl --version
  curl 8.7.1 (x86_64-pc-linux-gnu) [...]

  $ curl dict://dict.org/d:Internet
  220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <[email protected]>
  250 ok
  150 1 definitions retrieved
  [...]

[+] kgbcia|1 year ago|reply

Isn't .mobi an ebook format?

[+] therein|1 year ago|reply

.COM is also a file format.

[+] nashashmi|1 year ago|reply

I dont think dict is secure enough. We need a new version called dick. K is for encryption key. /s /rant (at sftp).

124 comments