The History of the URL: Domain, Protocol, and Port

[+] gumby|9 years ago|reply

This is a good article. A few nits:

It was the ARPANET (or the arpanet since most systems were case-insensitive in those days - Multics, and later Unix, were exceptions, not the rule) as in area's network, that used arpanet protocols like NCP. You do use "the" the first time in your article but seem to have dropped it after that.

CHAOSNET was just a LAN protocol like "ethernet" or pup -- also used what we call 10base2 "thicknet" coax. It was developed at MIT's AI Lab and was pretty much used only there and at a few institutions close to MIT like Symbolics and LMI.

In the NCP days routing was handled by IMPs (Interface Message Processors) which were not PDPD-11s, and when '11s were used they were smaller than the 11/70s which you used to illustrate the article (11/70s were the largest PDP-11s made -- still 16 bit unlike the 36-bit PDP-10s which were the mainstay of academic computer science in those days).

> In this era before ‘mail servers’, if my computer was off you weren’t sending me an email.

In that era few people had what you would consider a personal computer and more likely you logged into a timesharing system that had your mail along with everything else. So your statement is true, yet an anachronism. Even if you did have your own host, the upstream host (one earlier in the ! path) would have your message so you could consider it literally your mail server.

[+] wahern|9 years ago|reply

DNS was never ASCII only, and I've never seen DNS software make that assumption--that "every piece of internet hardware from the last fourty years, including the Cisco and Juniper routers used to deliver this page to you [assumes ASCII]".

The essay links to RFC 1035 to support its claim of ASCII only, but RFC 1035 actually says is

"However, future additions beyond current usage may need to use the full binary octet capabilities in names, so attempts to store domain names in 7-bit ASCII or use of special bytes to terminate labels, etc., should be avoided."

and

"Although labels can contain any 8 bit values in octets that make up a label, it is strongly recommended that labels follow the preferred syntax described elsewhere in this memo, which is compatible with existing host naming conventions. "

Indeed, some country TLD servers were (and maybe still are) supporting non-punycoded UTF-8 directly.

Lookups are supposed to be case-insensitive, but it's always been verboten to actually modify the case of names in a DNS packet. A query reply is supposed to include the identical question name in an 8-bit clean manner. Indeed, some DNS clients will arbitrarily randomize the case of names to add an element of randomness to thwart DNS spoofing attacks. (If the answer isn't the same 8-bit name, you ignore it just as if it came from a different IP address then you sent it to.) Unfortunately there exist enough broken DNS proxies out that software like Firefox or Chrome can't do this without headaches, but I've never encountered such broken software myself (at least, not that I knew about). At worst I've seen query responses which lack the question portion altogether, and this can cause timeouts (rather than immediate failures) for software which enables anti-spoofing measures. But I've also seen responses which lack the same QID, too. There's always broken software; the threshold for when you can ignore it is highly context dependent.

[+] bluejekyll|9 years ago|reply

> some DNS clients will arbitrarily randomize the case of names to add an element of randomness to thwart DNS spoofing attacks.

I believe this is undefined behavior. It shouldn't be something you count on. The only reference I found in the spec that implies this is:

The question section of the response matches the question section of the query

From rfc 1034. Which isn't very specific, but could be interpreted by some in the way you mean.

If you want to secure the request, it's best to randomize the QID and outbound port. If a server responds with the wrong QID, I'd ignore it.

[+] rconti|9 years ago|reply

> The first 32 identified the remote host, similar to how an IP address works today. The last eight were known as the AEN (it stood for “Another Eight-bit Number”), and were used by the remote machine in the way we use a port number

Gold.

Great read, it hits home for me with the right mix of nostalgia, history from before my time, and funny little things I never knew.

[+] b15h0p|9 years ago|reply

Another nitpick: on iOS Safari, that pizza-poo-domain name actually does show up in the address bar. So there has to be another mechanism that prevents the Amazon-with-Cyrillic-"a"-trick which I guess involves normalization.

[+] gregrata|9 years ago|reply

Great read! "Thanks" for all the Wikipedia links - I ended up wasting a few hours reading more details

[+] echeese|9 years ago|reply

For http:com/example/foo/bar/baz how would you determine what the host is?

[+] wtbob|9 years ago|reply

It doesn't actually matter: in a world which used that sort of addressing, one could imagine saying to com 'give me HTTP info for your example/foo/bar/baz,' to com/example 'give me HTTP info for your foo/bar/baz' and so forth; in that case, com would just say, 'hey, go talk to 266.328.0.1 (that's what I call example)' and 266.328.0.1 would cheerfully return the information stored at the filesystem path /foo/bar/baz, or it could say, 'hey, I call foo 463.622.42.17' and your browser would keep resolving.

Me, I kinda wish we wrote URLs as http://com.example.host.invalid/path/to/resource.

[+] inopinatus|9 years ago|reply

Fun fact: nowhere in the HTTP protocol specification does it say "use DNS". It is a convention that we do. It is a further convention that we use A records. And in my opinion it was a travesty that HTTP/2 did not mandate using DNS with SRV records.

[+] zackbloom|9 years ago|reply

He, unfortunately, didn't disambiguate between example.com/foo/bar/baz, foo.example.com/bar/baz, etc. so it's unclear.

It is kind of amusing to think of the entire Internet as a giant directory though.

[+] userlabs|9 years ago|reply

very good info thanks

[+] ChristianBundy|9 years ago|reply

> This restriction on HTML was ultimately removed in 2007 and that same year Unicode became the most popular encoding on the web.

Nitpick: Unicode is a character set, UTF-8 is an encoding.

[+] zackbloom|9 years ago|reply

Should be fixed shortly, thanks!

[+] bluejekyll|9 years ago|reply

Nit pick: the modern internet is built on IP, not TCP/IP, but it would be fair to say most protocols use TCP today, but definitely not all.

[+] RickHull|9 years ago|reply

Great article. Another nitpick:

> It’s important to dispel any illusion that these decisions were made with precence for the future the domain name would have.

I don't think precence is a word, and I'm not sure what would make sense as its replacement.

[+] juliendorra|9 years ago|reply

Prescience (knowledge of things before they happen)

[+] zackbloom|9 years ago|reply

Fixed, thanks so much!

35 comments