Op-ed: Internet centralization is not a conspiracy

(This post clarifies some of the more theoretical points made in the short interview I did with redecentralize.org.)

TL;DR summary:

The centralization of the Internet is not a conspiracy.

It results from structural pressures stemming from the Internet's network topology as deployed, and certain algorithmic and mathematical realities in the field of distributed computation.

Today's largest Internet companies haven't centralized the net. They built centralized systems because that's what the medium encouraged them to build. The problem is that the net, as it presently exists, "wants" to be centralized.

Had the medium encouraged decentralization, these companies might have built decentralized apps and services instead of centralized ones. Facebook might be a commercial Diaspora, and Google might be the world's largest distributed application software company.

To decentralize the Internet, then, requires not an adversarial contest against conspirators or incumbent technology companies but a campaign to alter the underlying structural realities that drive centralization. It requires innovation, not adversarial reactionary thinking.

"The medium is the message." - Marshall McLuhan

What is it about the medium as it stands today that encourages high degrees of centralization?

The social (and economic) environment that develops around a medium will be dictated by the structure of that medium. A one-to-many medium like radio or television will beget a very top-down pyramid-shaped social system. A flat and well-mixed many-to-many medium will beget a more cosmopolitan socioeconomic structure.

So what does the Internet look like?

Start with where you are. Are you on a tablet, a laptop, or a desktop? If so, then you are connected to the Internet through a local area network (LAN) or via some kind of cellular link. As I write this post, I am sitting behind a Macbook Pro and am also connected to a LAN.

So let's communicate! My address is 192.168.1.10. What's yours? Okay, try to connect to my system with FTP and send me a file.

It's not working, you say?

I am behind a firewall that implements network address translation (NAT). My address is not reachable to you (unless you're here in the same building with me). There's a 99.something% chance that your local area network (or cellular provider network) is similarly configured, so I can't connect directly to you either.

The Internet as deployed today is not a many-to-many communications medium. It looks a little bit like an inverted tree whose leaves are each contained, alone or in small bunches, within little prisons with narrow little windows. The only (easy) way out of these little prisons is along a single branch -- the one gated by one's local router/firewall -- and "up" the tree.

Now let's say I'm an ambitious Internet entrepreneur, and I want to make something to allow people to more easily share photos. After I do a bit of market research and nail down some of the general aspects of my business plan, I turn to the technical: how do I engineer this product? I want my users to be able to share each others photos, view other peoples' photos, search for photos, and browse through pictures that have been marked "public." I also want to bundle some services to monetize my idea, like selling physical prints or greeting cards.

Given the current topology of the Internet, I would not even attempt a decentralized design. It wouldn't even be considered.

Leaving aside some of the lesser problems -- like the abysmal state of software installation management on certain major operating systems -- the Internet's present topology just makes it very hard for peers to find one another. How many tech support questions would I get from people who "can't see anything?" Okay, let's start the diagnosis. What kind of firewall are you behind? A Linksys model X? What revision of firmware is it running? Okay, can you ping...?

Forget it. Just forget it. New businesses are fragile things; a million different little problems can suffocate a startup in its crib. Why introduce network connectivity and logistics issues into the mix? Put everything on a central server and move on. 99.9% of your customers don't care how they share photos, only that they can.

NAT and firewalls create a network whose structure makes it orders of magnitude simpler and easier for information to flow "up" the tree toward larger and larger branches and nodes.

This is the first and probably the most important reason the Internet has become so centralized. Changing this structural reality means two things: getting rid of NAT and rethinking firewalls and network security.

NAT was a necessary evil. The current Internet runs (mostly) on Internet protocol version 4, which has a very limited address space. There just aren't enough addresses to conveniently assign one to every connected device. Luckily we are moving toward IP version 6 which solves that problem.

The firewall will be a tougher nut to crack. When the Internet opened up and went public, neither operating systems nor software nor network protocols were up to the security requirements of a large distributed wide open and public network. Firewalls came to the rescue. Put a firewall in front of a large number of insecure systems and your organization's security posture is greatly improved.

To contemplate eliminating inline network firewalls, the security posture of operating systems and any network enabled application that runs on them must be greatly improved. Some of this has already happened naturally due to the pressures of increased adoption, but we are still dealing with entire classes of bugs that stem from inherently unsafe programming practices and execution environments. Many programmers still blithely write code without even considering its security. It's a bit of a vicious cycle, as the firewall creates a (false, but that's another topic) sense of safety that encourages a lax attitude toward securing systems one doesn't plan on "exposing."

Luckily some people are thinking about this.

Decentralization is hard too

The second reason the Internet favors centralization is more academic. In theoretical computer science there is something known as the CAP theorem.

From Wikipedia: "The CAP theorem, also known as Brewer's Theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency (all nodes see all data at the same time), availability (the guarantee that all requests receive a response whether successful or failed), and partition tolerance (the system continues to operate despite arbitrary message loss or failure of a part of the system)."

Like most mathematical theorems, what the CAP theorem actually says is quite specific and easily subject to misinterpretation. Nevertheless, an attempt at a more generalized layman's explanation would go something like this:

It's very, very hard to design a system that is simultaneously reliable, trustworthy, and distributed. The more distributed the system is, the harder it is to achieve the first two and the more intellectually challenging the design of the system gets.

The implication?

A centralized database is an intro to computer science problem. A good decentralized database is a Ph.D thesis problem.

Let's return to our example of a photo sharing service. Let's say I'm a brave and possibly slightly insane entrepreneur, and despite the hassles of NAT and firewalls I decide to go with a decentralized approach. My company will develop an app somewhat like iPhoto but with a network API exposed for other instances of the same app on the network. (And maybe other compatible apps-- a platform play! My investors will love it!) Now it's time to build the thing.

Okay, I'm going to need to implement... wait... what are these scary looking equations? I'm a starving startup, and I don't have the time to teach myself a doctorate in distributed computing. I also don't have the money to hire one of those people. Let's see what's out there I can already use. Hmm... all the distributed databases I can find are rather large chunks of code. Do I really want to incorporate all that into my product? Some of them are half-finished uninstallable academic projects that are likely to be riddled with bugs. The polished ones are expensive.

You know, never mind. Forget it. Just put everything on a central server.

Developers everywhere making precisely that choice is why so much of the Internet's infrastructure has consolidated into the hands of so few providers.

So what can we take away from this?

If you want to help decentralize the net, you might consider working on better security tooling for developers or software to help manage decentralized and deperimeterized enterprise networks.

You might also consider working on an open source, easy to use, zero-configuration system for distributed information storage that skirts the very edges of what the CAP theorem will allow.

These two things might accomplish more than another meshnet or peer to peer protocol. They might accomplish more than ZeroTier One.

That's why ZT1's goal is not privacy or decentralization per se. Its goal is to create an easy way for people to create and join deperimeterized, flat networks. This helps people evade NAT and firewall barriers, and in so doing creates an environment where research and development can easily be pursued on distributed systems. Instead of hassling with NAT traversal or complicated peer to peer libraries, developers can just use plain vanilla IP and commodity protocols.

Another take-away is that while centralization might contribute to political problems, it is itself a phenomenon with technical and historical origins rather than political ones. Railing against the Internet giants or the NSA won't change anything. Only innovation will.