Sunday, January 20, 2013

How the Internet Works

I never really had a proper introduction to networking until taking an online course.  I recommend the course, even for people with little experience in technology.  It's really not rocket science.  A network engineer, and former plumber, I know claims it's exactly like plumbing--"It's all pipes," he says.

Network engineers may be a bit like plumbers, but the architects of the Internet were cultivators of complex systems.  They needed to find a balance between being too rigid and thus hampering future innovation on the one hand, and being too open and thus causing Tower-of-Babel-like chaos.  These architects used at least two key conceptual tools to manage complexity: encapsulation and layering.  Software developers should be familiar with both terms.  For example, object-oriented programmers use encapsulation to abstract away the details of application code from their use, and they make use of layering to decouple functional domains.

This is all a fancy way of saying that the development of complex systems requires black boxes.  It's impossible for any one person to understand everything that is going on in a reasonably complex system, so we must be able to say, "Well, we're not going to worry about that right now."  Furthermore, if a change in one place requires a change in ten other places, you have a very fragile system.

The Internet has four layers.  In the highest layer, applications talk to each other without worrying about any of the messy details.  For example, your browser just requests a web page--it doesn't care how it gets there.  The transport protocol (probably TCP) establishes connections and ensures that data is reliably delivered in sequence.  It also manages congestion.  TCP doesn't care how the data segments are transmitted.  This is the job of the network layer, most likely IP.  IP is a simple delivery service that does its best to get a message from one location to another.  Routers direct traffic to the appropriate address based on routing tables, but there is no overall plan for the path packets should take.  Finally, the link or physical layer is composed of the actual cables or wireless hardware.  It relays bits, but it has no idea what the bits mean.

What's amazing about the Internet is that it hasn't changed much in forty years.  We've never had to start over from scratch.  We didn't have to stand by while the Masters of the Internet upgraded to version 2.0.  In fact, it works so well that we usually don't have to think about it.  Changes to the Internet, such as ARPAnet's upgrade to TCP Tahoe in 1986, happened pretty seamlessly.  The current switch to IPv6 is transparent.

The main reason for the Internet's resiliency is its use of layering and encapsulation.  Because each layer functions independently, its implementation is encapsulated or hidden from the layers above.  For example, there has been a proposal to implement IP via carrier pigeons.  The application and transport layer would still work the same, just a bit slower.

Layering and encapsulation are made possible by using a message-in-a-message technique, much like Russian stacking dolls.  Network traffic is comprised of messages.  TCP calls them segments, IP calls them datagrams, and Ethernet calls them frames, but they're all composed of a payload and a message header containing instructions for a particular protocol.  The payload may contain another message consisting of another header and payload, and so on, down to the physical layer.

For example, when you go to a website, you use the HTTP protocol (application layer).  Each HTTP response has a header and a message.  A response header returns a GET header with the web page as the message.  This response is broken up into the payloads of messages with TCP headers (transport layer).  These message headers have instructions ensuring safe delivery.  The TCP messages are the payloads of yet other messages, having headers with IP destination instructions.  Finally, these messages are the payload of yet more messages having headers containing ports and MAC addresses.

The message-in-a-message technique is simple yet powerful.  For example, VPN works by recursively layering messages.  You might have an HTTP request in a TCP segment in an IP packet in a secured TLS presentation message in a TCP segment in an IP packet in an Ethernet frame.  Requests over VPN are encapsulated within normal Internet traffic.

IPv6 works in a similar manner.  We've already started to run out of all 4.29 billion IPv4 addresses (the familiar 192.168.1.1) and we have begun using IPv6's 3.4×1038 addresses (the not-yet-familiar 2a91:0db8:85a3:0000:0000:8a2e:0370:7334).  We don't need two Internets to support both network protocols thanks to a technique called tunneling.  Network Address Translators can transmit IPv6 datagrams over IPv4 routers by simply putting IPv6 datagrams inside IPv4 ones.  That is, IPv6 tunnels through existing IPv4 networks.

We don't know what the Internet will look like 100 years from now, but it will probably have new layers stacked on top of or recursively contained within existing layers.  It will evolve piecemeal, without major upgrades causing blackouts.  It may swap out TCP for some other, more efficient communications protocol, or it may replace the physical layer with faster wireless relays.  Because of layering and encapsulation, changes to any one functional layer will have no impact on any others.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...