Fun with IP address parsing

[+] geoffpado|5 years ago|reply

> This is the same IP address: 3232271615. You get that by interpreting the 4 bytes of the IP address as a big-endian unsigned 32-bit integer, and print that. This leads to a classic parlor trick: if you try to visit http://3232271615 , Chrome will load http://192.168.140.255.

This was the source of one of my favorite “bugs” ever. I was working on multiple mobile apps for a company, and they had a deep link setup that was incredibly basic: <scheme>://<integer>, which would take you to an article with a simple incrementing ID. This deep link system “just worked” on iOS and Android; take the URL, grab the host, parse it as an int, grab that story ID. Windows Phone, however… the integers we were parsing out were totally wrong, returning incredibly old stories!

Turned out that the host we were given by the frameworks from the URL was auto-converted to an IP in dotted-quad format, and then the int parser was just grabbing the last segment… which meant that we were always getting stories <256, instead of the ~40000 range we were expecting.

[+] arkadiyt|5 years ago|reply

These different representations also lead to frequent server side request forgery (SSRF) bypasses - someone might be blocking local IPv4 but you can still access their AWS metadata endpoint at ::ffff:169.254.169.254, etc.

For anyone using Ruby, I'm the author of a gem [1] that comprehensively protects against SSRF bugs. For anyone using Golang I recommend this [2] blog post.

[1]: https://github.com/arkadiyt/ssrf_filter

[2]: https://www.agwa.name/blog/post/preventing_server_side_reque...

[+] stevekemp|5 years ago|reply

For golang I wrote this:

https://github.com/skx/remotehttp

I've found, and reported, a whole bunch of services which take user-supplied URLs and don't filter out access to localhost:8080/server-status, and similar local resources.

A common route to attacking these is to access the AWS metadata URL endpoint. Something at least the Google cloud prevents, by forcing the use of the `Metadata-Flavor: Google` header.

[+] jamespwilliams|5 years ago|reply

> ::ffff:169:254:169:254

Just to note, this should be ::ffff:169.254.169.254

[+] proactivesvcs|5 years ago|reply

I wonder how many of these bugs are the result of people thinking "Well I've read the spec but most of it is 'cursed' so I'll just implement this subset which fits my idea of 'acceptable'".

[+] the_mitsuhiko|5 years ago|reply

Unfortunately the blacklisting approach that works on IPv4 is completely broken for IPv6 since you can't really know where your own services are. I still did not find a good generic way to protect IPv6 and ended up just disallowing it so far everywhere.

[+] philsnow|5 years ago|reply

This is awesome; do you know if anybody has written a rails plugin to use ssrffilter by default for all requests?

[+] lrossi|5 years ago|reply

Can confirm that visiting http://127.1 on ipad indeed works and redirects to http://127.0.0.1. This is very surprising and, at least for me, humbling.

I think I will quote this article any time I see someone using regex to validate or parse IPs.

[+] FreshFries|5 years ago|reply

This is one of the reasons why I appreciate the geekiness of Cloudflare with their DNS service IP addresses, particularly:

1.1 which to me is the shortest useful IP address I am aware of.

[+] kazinator|5 years ago|reply

Would you be further humbled if the ipad accepted http://CXXVII.I also?

I'm never writing anything that positively accepts 127.1, or 0127.000.000.0001 as a valid address no matter what garbage implementations do.

The issue we have with this are situations when we have to accept only inputs that are domain names which are sure not to be treated as an IP address by some software downstream of us.

[+] sunsetMurk|5 years ago|reply

All my regex (are now) a lie

[+] bigiain|5 years ago|reply

... now you've got two^h^h^hthree problems.

[+] z3t4|5 years ago|reply

I'm now going to change my LAN to use 10.0.0.1 instead of 192.168.0.1 so that I can just type 10.1 This will help not only when testing stuff on mobiles only to have to rewrite the whole adress again because you forgot http:// but also when telling the kids what IP to connect to when setting up LAN games. Or coworkers when telling them them some LAN/router IP. Time server is on 10.36

[+] chungy|5 years ago|reply

> I’m on the fence about that last one, the “IPv6 with an embedded dotted decimal” form. My reference parser (Go’s net.ParseIP) understands it, but it’s not really that useful any more in the real world. At the dawn of IPv6, the idea was that you could upgrade an address to IPv6 by prepending a pair of colons, as in ::1.2.3.4, but modern transition mechanisms no longer offer anything as clear-cut as this, so the notation doesn’t really show up in the wild.

I have to disagree with this conclusion. I see it very frequently on Linux. It turns out that programs can bind their listen address to just ::, and the kernel will still allow connections from IPv4, with the address mapped to ::ffff:0.0.0.0/32 -- outbound connections use the same notation.

[+] thwarted|5 years ago|reply

> It turns out that programs can bind their listen address to just ::, and the kernel will still allow connections from IPv4, with the address mapped to ::ffff:0.0.0.0/32 -- outbound connections use the same notation.

This is only true if the sysctl bindv6only or socket option IPV6_V6ONLY is 0, and is defined by RFC3493.

[+] octoberfranklin|5 years ago|reply

> At the dawn of IPv6, the idea was that you could upgrade an address to IPv6 by prepending a pair of colons, as in ::1.2.3.4

No, IPv6 explicitly rejected that idea at first. Most of the other IPng proposals did have a backwards compatibility mechanism like that. I'm still sore that the least backwards-compatible proposal was the one that won.

Later the IPv6 cabal admitted their mistake and published NAT64, but at that point it was too late to make it a mandatory required service offered by any default-route router. So now we have all of this crap about dual-stack hosts instead of simply being able to upgrade to IPv6 and trust that you will not lose any connectivity.

This is basically why, twenty years after it was standardized, IPv6 is still merely the "internet of cellphones" and no closer to replacing IPv4.

As usual, DJB saw all of this decades ahead of time:

https://cr.yp.to/djbdns/ipv6mess.html

[+] AnthonyMouse|5 years ago|reply

> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:7777:8888

Wait, what? 77.77.88.88 is in dotted decimal. It doesn't correspond to 7777:8888 in hex.

edit: Somebody else already noticed on Twitter:

> And as @alanjmcf noticed, I messed up one of the representations above.

> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:4d4d:5858, not 1:2:3:4:5:6:7777:8888. I missed out a decimal-to-hex conversion in there.

[+] unknown|5 years ago|reply

[deleted]

[+] j1elo|5 years ago|reply

> It does not process Class A/B notation, or hex or octal notation.

I got to find that notation useful once, to make a shorter one-liner... without even knowing that there were different classes of IPv4 address, and that I was looking at one of them.

It's a tiny function that gives me the IP address of my machine in the LAN, for either Linux and Mac:

  # Get main local IP address from the default external route (Internet gateway)
  iplan() {
      # Note: "1" is shorthand for "1.0.0.0"
      case "$OSTYPE" in
          linux*) ip -4 -oneline route get 1 | grep -Po 'src \K([\d.]+)' ;;
          darwin*) ipconfig getifaddr "$(route -n get 1 | sed -n 's/.*interface: //p')" ;;
      esac
  }

(sorry to people reading on small screens)

Full disclosure, I got the "1 is shorthand for 1.0.0.0" from here (which didn't get into explaining why it is a shorthand): https://stackoverflow.com/a/25851186

[+] dave_universetf|5 years ago|reply

Oh no, that's another shorthand that's different from all the others. A single number should be interpreted as a big-endian uint32, and so "1" should be "0.0.0.1". However, I can confirm that `ip` interprets it as "1.0.0.0", even though you should have to write "1.0" for that.

Ugh.

[+] anderskaseorg|5 years ago|reply

Did you really gain anything here, given that the omission of those 12 characters required a 38 character comment to explain what’s going on?

[+] gumby|5 years ago|reply

> So, it’s a de-facto standard that boils down to mostly “what did 4.2BSD understand?“

By the way 4.2BSD was being compatible with older or contemporary implementations, like ITS which was running TCP before any Unix was.

For example plenty of machines back then used octal as a preferred human representation. In fact that’s why octal is the default format of numeric constants in C: C, like Unix, was initially developed for an 18-bit (six octal digits) PDP-7. The smaller 16-bit PDP-11 version came later.

[+] lucb1e|5 years ago|reply

"All possible notations of this IPv4 address" https://lucb1e.com/rp/php/funnip.php?link&ip=80.100.131.150

It was a surprising amount of work to figure out all the different formats an IP address can be shown in and convert a given IP into all those formats.

[+] jsrcout|5 years ago|reply

That's impressive. And somewhat scary :-)

[+] octoberfranklin|5 years ago|reply

How about the PGP word list? https://en.wikipedia.org/wiki/PGP_word_list

    $ ping stairway scavenger tracker upcoming

    PING 209.216.230.240 (209.216.230.240) 56(84) bytes of data.
    64 bytes from 209.216.230.240: icmp_seq=1 ttl=50 time=68.2 ms
    64 bytes from 209.216.230.240: icmp_seq=2 ttl=50 time=69.5 ms
    64 bytes from 209.216.230.240: icmp_seq=3 ttl=50 time=67.2 ms

[+] phoe-krk|5 years ago|reply

> Fully canonically, :: is 0000:0000:0000:000:0000:0000:0000:0000.

Nitpick: missed a single zero in the middle there.

[+] skissane|5 years ago|reply

The following comment "My apologies to trypophobic readers" makes me think that the mistake was intentional.

[+] Dagger2|5 years ago|reply

Bigger nitpick: as per RFC 5952, canonically :: is ::. 0000:0000:0000:0000:0000:0000:0000:0000 is a valid way of writing the same address, but it's not the canonical way.

[+] egocentric|5 years ago|reply

You are a hero

[+] jpxw|5 years ago|reply

As Go’s net package IP parsing was mentioned, here’s a fun fact: under their API it is impossible to distinguish between an IPv4-mapped IPV6 address and the equivalent normal IPv4 address.

[+] daenney|5 years ago|reply

I find this to be a great feature. net.IPNet.Contains takes this into account, so you don’t have to worry about or deal with shenanigans like IPv4 mapped addresses. It makes implementing SSRF protection much easier.

[+] strenholme|5 years ago|reply

Since I write a Lua-parsed DNS server which works with IPv6, even when compiled for an ancient version of MINGW on Windows XP (which has IPv6 support but no built-in IPv6 parser), I had to write an IPv6 address parser (no inet_pton(), which is what most programs use for IPv6 parsing, on that system).

No, I did not add dotted quad notation to the parser. No, you can not have more than four hex digits in a single quad; 00000001:2::3 is a syntax error. It supports “normal” stuff like ::, ::1, 2001:db8::1, and even non-normal stuff like “2001-0db8-1234-5678 0000-0000-0000-0005” (to be compatible with the really basic IPv6 parser I put in MaraDNS’s recursive resolver nearly two years ago), but does not support any of the IPv6 corner cases in the linked article.

The IPv6 test cases in the automated test for the parser are at: https://github.com/samboy/MaraDNS/blob/master/deadwood-githu... (The final three lines are supposed to return errors)

[+] thomashabets2|5 years ago|reply

I especially love it when address parsers on the same OS don't agree:

http://openbsd-archive.7691.n7.nabble.com/inet-net-pton-seem...

[+] cnst|5 years ago|reply

> https://marc.info/?l=openbsd-bugs&m=124425104531501&w=2

Love it! No conversation about SUS is complete without Theo bashing up the absurdity of some historic bugs being documented as features. :-)

---

I do like the hex specification, though. Especially in the age of /29 and such, it's way easier to deal with space using such notation than the decimal numbers, which make little sense for network boundaries in such case. It looks like ping supports most of these (try `ping 0x08080808`, or `ping 0x08.0x080808`, but note that 0x0808.0x0808 is not valid, only 0x08.0x08.0x0808 would be), but `dig @` doesn't.

BTW, I guess this finally explains why the netmask is often shown as `inet 127.0.0.1 netmask 0xff000000` on the BSDs, which is actually a valid IP address notation, as it turns out!

[+] proactivesvcs|5 years ago|reply

I'm not convinced these are "cursed". They may be the result of bygone networking conventions, implementation ideas that never came to mainstream fruition, flexibility for use-cases etc. Just because we don't understand something that looks strange, doesn't mean it's cursed, nor that one can simply turn one's nose up and say "I don't understand why these exist so I'll just ignore them when I implement x".

[+] skeletonjelly|5 years ago|reply

I think they've got Class A/B/C wrong? Or at least they're using it in a way that I never learnt

> The familiar 192.168.140.255 notation is technically the “Class C” notation. You can also write that address in “class B” notation as 192.168.36095, or in “Class A” notation as 192.11046143. What we’re doing is coalescing the final bytes of the address into either a 16-bit or a 24-bit integer field.

According to this:

https://www.digitalocean.com/community/tutorials/understandi...

Which details my understanding, classes refer to the ranges, not so much grouping the latter part

Happy to be corrected!

[+] voxic11|5 years ago|reply

from the linked article

> Traditionally, each of the regular classes (A-C) divided the networking and host portions of the address differently to accommodate different sized networks. Class A addresses used the remainder of the first octet to represent the network and the rest of the address to define hosts. This was good for defining a few networks with a lot of hosts each.

[+] m463|5 years ago|reply

An "fun" use of ip addresses is in NTP.

in the ntp config file, you will have stuff like this:

  server 127.127.1.0 # local clock

or:

  server 127.127.20.0 minpoll 4 iburst prefer  # gps clock

where the "ip address" is of the form: 127.127.<clocktype>.<instance>

here's a page explaining the clock types:

https://www.eecis.udel.edu/~mills/ntp/html/refclock.html

but basically it's a weird anachronism. I'm not sure if NTP will actually bind to those addresses using the tcp/ip stack, or if it someone just got lazy and coopted the ip address parser for off-label use.

[+] kortilla|5 years ago|reply

What is the use-case of a decimal representation of a v6 address or a 32-bit int representation of an ipv4 address?

I’ve never had someone tell me, “see if you can ping 143267841”. I’ve worked in networking for coming up on 30 years now and just haven’t found the use.

[+] soneil|5 years ago|reply

I suspect it's actually the other way around. On the wire, a v4 address is four bytes. uint_32 is the natural type for this. So when we start looking at cidr scopes, /24 means the first 24 bits of those 32. "The first 24 bits of 4 bytes" sounds wrong to me, "the first 24 bits of 32 bits" sounds logical.

So as I see it - 143267841 (or 0x88A1801) is the address, and quad-dotted decimal is a (slightly more) human-readable representation of it.

[+] gizmo686|5 years ago|reply

>32-bit int representation of an ipv4 address

Internally, I would imagine that almost every IPv4 stack uses 32bit ints to represent an address. Its not that crazy to think this would leak out somewhere.

I've written (un)parsers where we would just treat IPv4 addresses as integers because A) that is how they were treated in the binary data and B) given what we were doing with the data, we didn't actually care about the IPv4 field.

[+] augusto-moura|5 years ago|reply

Serialization for non human readable code. Usually IPv4 addresses are stored as int32 in databases or memory

[+] Sharlin|5 years ago|reply

IPC at least. If you want to pass an IP address (whose natural native representation is a uint32) from program to program as text, having to format it as dotted decimal would be just unnecessary and inconvenient.

[+] esnard|5 years ago|reply

Not the answer you're expecting I guess, but I've used it to bypass some anti-XSS filters.

[+] jeffbee|5 years ago|reply

There is no use case. It's a meaningless outcome of the fact that `strtoul` is involved somewhere.

[+] tomcooks|5 years ago|reply

Boomers like me know all of the IPv4 obfuscation techniques thanks to Fravia' Searchlores, may he forever rest in peace.

https://www.theoryforce.com/fravia/searchlores/obscure

[+] kaoD|5 years ago|reply

Ohhh you brought so many memories. Fravia and +ORC marked my teenage reverse-engineering years.

Not a boomer myself (I'm just a poor millennial) but I was lucky enough to enjoy the early days of the internet.

May he rest in peace.

[+] alasdair_|5 years ago|reply

Not a boomer but still saddened every time I remember +Fravia is dead. I remember checking his site every day for most of 1995 and 1996.

[+] abotsis|5 years ago|reply

Wow, this. One thing I didn’t see mentioned was “0”. You mentioned it, but it didn’t grok to something I know to work in some implementations: “ping 0” behaves like “ping 127.0.0.1”.

[+] nealabq|5 years ago|reply

Maybe ping is treating 0 like 0.0.0.0 aka INADDR_ANY ( https://en.wikipedia.org/wiki/0.0.0.0 ). And interpreting it as all the IPv4 addrs mapped to the local machine (including localhost).

146 comments