The Map of the Internet

Some additional information on the map in today’s comic:

The labeled websites (“Google”, “Flickr”, “Slashdot”, etc) are just based on the location of the particular server I got when I pinged the site from my home machine on the day I was working on the map. Each site there has many servers and mirrors, and the IP of each may change from day to day. But their location is actually based on both the first and second quad — although the map is drawn in the style of a geographic map, with wobbly borders, it’s laid atop a regular grid which can be extended fractally downward. When I was putting down the location of a server for a particular website, I worked out where within the grid it should fall. This isn’t really for the purpose of telling you anything useful about that website, but rather to suggest that IPs can be placed as unique points on this map. Everything here was done by hand across a couple large sheets of paper, and it’s not trying to be completely accurate in all the details (although I did my best). It’s more of a survey than a practical road map.

The fractal behind the comic is a mapping of the integers to a plane (it could be extended, I assume, to the real numbers). I came up with it in 1999 when I was trying to figure out a way to graphically show the contents of long blocks of computer memory, which is basically linear. I wanted a mapping that would show strings of consecutive addresses as contiguous, compact blobs, and that would work on all scales. I played with and coded a couple mappings, but this was the only one I found that really fit that qualification.

Edit: The fractal is the Hilbert Curve, discovered in 1887. I had done some poking around but had never seen it before. Thank you, folks here and on IRC. It cannot be extended to the reals, but rather to the binary fractions (a subset of the rationals). This lets you get arbitrarially close to any number you want (and cover all the IEEE floating points).

I never come up with an algorithm to do the mapping — that is, a function that would take an integer and return the pair of coordinates. Years later, several months ago, I remembered the mapping and showed it to my cousin, who looked at it for a while and worked out a Scheme algorithm to do the mapping both ways.

Edit: I will be posting a version of the algorithm ASAP.

However, I also posed the problem as a puzzle to the forums:

http://forums.xkcd.com/viewtopic.php?t=864 (Problem)
http://forums.xkcd.com/viewtopic.php?t=866 (Solution)

I posted a sample of the mapping mangled into an integer-to-integer function and asked people to come up with the next set of numbers. The mangling was in part to make it less obvious what they were playing with and in part to encourage a different way of thinking about the problem. I informally offered two xkcd t-shirts as prizes for whoever provided the first correct answer and the most elegant algorithm. BaldAdonis very quickly took apart the problem (very impressively) and will — once I’m finished with this batch of shipping — get at least one t-shirt (depending on whether someone else finds an algorithm to do the mapping more elegantly). The other shirt is still up in the air. I’ll leave it open for the next week or two, and if you want to send me a piece of elegant and easily-executed code that does this mapping — taking in a number and returning the coordinates (bonus points for the reverse operation) — you’ll have a shot at winning a free xkcd t-shirt of your choice. (Note: for the next few days I’ll be working on t-shirt orders and then I have a wedding to go to, so I may not be able to check over your code until next this week.)

Suggestions for future work: When I was constructing the map, I also did some ping surveys of the various /8 networks to see how densely populated with active hosts each one was. The density data didn’t make it into the simpler final map, but that’s just one of many data sets one could slap up on a map like this to visualize the ‘net in an interesting way.

147 replies on “The Map of the Internet”

  1. Actually, Cantor kinda proved you can’t do that kind of mapping of real numbers to a grid. If we’re talking segments of the real line then I guess you could just split it up into blocks an integer long, but I’m assuming you’re talking about points on a grid…

    Like

  2. Kevin, what Peter is referring to is the fact that the reals are uncountable (an easy proof – look up Cantor’s diagonal argument for details). This means there exists no bijection between the reals and the natural numbers.

    Since the N x N is also countable (N here refers to the naturals), there is no bijection between the reals and a 2D grid as we have here.
    Proof NxN is countable:
    Use bijection:
    (a,b) -> 2^a * 3^b

    It does not apply to the integers since the integers are countable:
    The following bijection exists:
    1 -> 0
    2 -> 1
    3 -> -1
    4 -> 2
    5 -> -2
    etc…

    Like

  3. Kevin: Of course not. Cantor proved that the real numbers are not the same cardinality as the integers by his “diagonal argument”. Oops, I just noticed that James Keeling is responding for me – thanks James 🙂

    Cantor’s diagonal argument works by assuming that you *can* lay out all the real numbers in a grid, and then he proves that you can systematically construct a real number that *cannot* be on the grid, however you choose the original grid. Therefore the original assumption must be wrong, QED.

    So I was being cute with the “kinda”, because Cantor’s proof is based exactly on assuming you can do what Randall speculated about, and then showing that it leads to a contradiction.

    Like

  4. It can be “extended” to the real numbers by making it a piecewise continuous curve and taking the limit of iterations->oo. It will map into the real plane, but it will not be a bijection, because it will self-intersect.

    Like

  5. By the way those class A addresses weren’t really “sold” to corps like Xerox but rather assigned back around the NCP -> TCP transition. This was around 1983.

    Like

  6. I don’t suppose you’re planning on sending Google a cake with this printed on the top…

    Like

  7. (I’m a completely different Randall, in case you thought this comic’s author might have Dissociative Personality Disorder.)

    Like

  8. You’ve got the 192.(168) RFC1918 private address space, but have omitted the 172. (15-31) one. You’ve also mislabelled the 10.(0-255) one as “VPNs” – it is, in fact, revered under RFC1918 as the 24-bit address block

    Like

  9. @Greg,

    Nortel owns 47/8, but it still shows up in whois with an OrgName of “Bell Northern Research” and a NetName of BNR. BNR is one of two companies that initially merged to form Nortel, but unless you knew that already, you’d likely just assume that it’s owned by Bell Northern Research based on the whois info.

    BNR would have been a better abbreviation for it than “Bell North” though.

    Like

  10. Stanford University used to own 36.0.0.0/8, but gave it up in 1993 because, well, they didn’t need all of those addresses. (In fact, they used a 255.255.0.0 netmask.) There was some pressure for them to do so, but I don’t think they fought too hard. Now they sit within 171.64.0.0/10, but I don’t know what netmasks they have within that space.

    Like

  11. Nice! I’d been thinking about the division of IPv4 space recently, after hearing stories of entire universities in Nigeria that exist off a single IPv4 IP *behind five layers of NAT*. (Which is why OLPC is going all-IPv6..)

    Like

  12. BRAVO!

    You need to make large, detailed versions of this available for purchase as a poster. I would throw down $10 + shipping for one, and i know a lot of other people would too. Well done, sir!

    Like

  13. > I’d been thinking about the division of IPv4 space recently, after
    > hearing stories of entire universities in Nigeria that exist off a
    > single IPv4 IP *behind five layers of NAT*.

    I’m pretty sure that’s an exaggeration. (There are small ISPs in rural northwest Cameroon that have their own global IP address, so a major university certainly ought to be able to get one. I doubt if it’s really much harder than getting a reliable phone line.) However, it is certainly true that the supply of IPv4 addresses is too small for comfort and unevenly allocated.

    Like

  14. So… what exactly is the WinXP default desktop picture that is referenced? On my machine it was the Dell logo, and I somehow don’t think that relates, amusing/horrifying though it is to think of Dell owning the entire intarwubs.

    Like

  15. I’m curious about the statement that this curve maps consecutive integers to “contiguous and compact” areas. It’s clear to me that the areas are contiguous, but it’s not clear they are “compact”. What does compact mean in this context?

    Not to imply that this isn’t cool – it’s actually the coolest thing I’ve seen today.

    Thanks.

    Like

  16. > I’m pretty sure that’s an exaggeration. (There are small ISPs in rural northwest Cameroon that have their own global IP address, so a major university certainly ought to be able to get one. I doubt if it’s really much harder than getting a reliable phone line.) However, it is certainly true that the supply of IPv4 addresses is too small for comfort and unevenly allocated.

    One explanation I’ve heard for why the IPv4 situation is so bad is that African cultures often dislike asking for something when there’s a possibility of being turned down; so, they put up with the NAT situation rather than ask for their own IP and risk it being denied.

    I’m not African, though, just passing on the explanation I heard from someone currently spending a lot of time on a connectivity project.

    Like

  17. Peano curves are also “related to” quadtrees, which are useful in indexing 2d space, for instance in geographic information systems (googling on quadtree leads to some not-unreasonable pages). All good clean fun, and can lead to some very pretty diagrams.

    Like

  18. AFAIK, being extendable to the reals is exactly what the Hilbert curve is so well known for. More precisely: the curve construction gives a function g from finite binary fractions in [0,1] to a points in the unit square. Define a function f from the unit interval to the unit square by f(x) = lim g(x_n), where x_n is a sequence of finite binary fractions that converges to x.

    The resulting function f (sortof the limit as you iterate the curve construction) is a continuous and surjective function from [0,1] to [0,1]*[0,1]. That property quite wowed 19th century mathematicians, since you might intuitively not expect that to be possible to do with continous functions. (I think a related curve was known as “Peano’s monster curve”…)

    The function is not 1-to-1; some points will be visited more than once by the curve. One might ask if it is possible to “improve” it, creating an even more monstruous 1-to-1 continuous function. The answer is no: using some topology we can show that the unit interval and the unit square are not homeomorphic, so there is no continuous function with a continuous inverse between them.

    If the continuity requirement is dropped, it is of course easy to construct a 1-to-1 function. This was shown in the early 1800s by Cantor, when he showed that R and R^2 are of the same cardinality.

    Like

  19. ‘It’s clear to me that the areas are contiguous, but it’s not clear they are “compact”. What does compact mean in this context?’

    Compact means “small ratio of perimeter to area”. In 2D, a circle is optimally compact, a square is compact, and a long skinny rectangle is non-compact.

    Like

  20. Hi, I’m very curious to see your algorithm for hibelrt curve. The recursive one from wikipedia is not very much useful.

    I’m poking around to try to make a more detailed map based on your idea. Currently I have downloaded data from opte.org, picked out all IPs in it, and scanned all those IPs. I have a data like that:

    aaa.bbb.ccc.ddd COUNTRY ping_time

    like this:

    85.89.69.1 US 114.137

    I have made it with

    for (( i=1 ; i> DATA

    where bin/scanIP is:

    #!/bin/bash

    HOSTLINE=”`host -n $1 | fgrep -c “not found”`”

    if [ “$HOSTLINE” = “0” ]; then
    echo $1 `curl “http://api.hostip.info/get_html.php?ip=63.192.202.1” | awk -F ‘(‘ ‘{print $2}’ | head -c 2` `ping -q -W 1 -c 1 $1 | tail -1 | awk -F ‘/’ ‘{print $5}’`;
    fi ;

    Now, since the method for checking the country of IP is automatic, it will be possible to make a more detailed map, similar to yours. :)))

    Like

  21. oops, I cannot submit the loop. “for” perhaps beucase of problems with html tags. Anyway, I’ll try again.

    for (( i=1 ; i<128011 ; i=i+1 )) ; do bin/scanIPs `head -n $i allIPs | tail -n 1` ; done >> DATA

    Like

  22. As Robert says, this is remarkably similar to Brian Cort’s InfoViz exhibit that he displayed in October. Have you seen it, or is this just a bizarre coincidence that they look so similar?

    Like

  23. In order to truly appreciate this sort of map… one must be
    a student of industry looking for a clearer and broader vision
    of the playing field. This map does that quite nicely.

    Notice how General Electric is right up there in the left hand
    corner… it is because they are at the top of the food chain
    in our economy, and they have been ever since they created
    an energy monopoly by ousting Thomas Alva Edison off their
    board and forcing hin into early retirement for trying to have
    an electric automobile mass produced back in 1919.

    Details on my website at:
    http://www.geocities.com/carljstone45/contributions.html

    Peace love and harmony to all the oppressed geniuses that
    want to make allot of money with me while making the world
    a better place to live… at the same time.

    Like

  24. I’m actually going to apply this to the network at the university I work at, just for the sake of having it. I took the comic and have it laminated on 11×17 paper, in color. Once we move to our new office space, where I’ll have an office instead of a cubicle, it will go on my door or right outside my door.

    Like

Comments are closed.

<span>%d</span> bloggers like this: