An oil refinery is an industrial cathedral, a place of power, drama and dark recesses: ornate cracking towers are its gothic pinnacles, flaring gas its stained glass, the stench of hydrocarbons its heady incense. Data centres, in contrast, offer a less obvious spectacle: windowless grey buildings that boast no height or ornament, they seem to stretch to infinity.
Yet the two have much in common. For one thing, both are stuffed with pipes. In refineries these collect petrol, propane and other components of crude oil, which have been separated by heat. In big data centres they transport air to cool tens of thousands of computers that extract value — patterns, predictions and other insights — from raw digital information.
Both also fulfil the same role: producing crucial feedstocks for the world economy. Whether cars, plastics or many drugs — without the components of crude, much of modern life would not exist. The distillations of data centres, for their part, power all kinds of online services and, increasingly, the real world as devices become more and more connected.
Data is to this century what oil was to the last one: a driver of growth and change. Flows of data have created new infrastructure, new businesses, new monopolies, new politics and — crucially — new economics. Digital information is unlike any previous resource; it is extracted, refined, valued, bought and sold in different ways. It changes the rules for markets and it demands new approaches from regulators. Many a battle will be fought over who should own, and benefit from, data.
There is an awful lot to scrap over. IDC, a market-research firm, predicts that the “digital universe” (the data created and copied every year) will reach 180 zettabytes (180 followed by 21 zeros) in 2025. To ingest it all, firms are speedily building data refineries. Last year Amazon, Alphabet and Microsoft together racked up nearly $US32 billion in capital expenditure and capital leases, up by 22 per cent from the previous year.
The quality of data has changed, too. It is no longer mainly stocks of digital information — databases of names and other well-defined personal data, such as age, sex and income. The new economy is more about analysing rapid real-time flows of often unstructured data: the streams of photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine. From subway trains and wind turbines to toilet seats and toasters, all sorts of devices are becoming sources of data. The world will bristle with connected sensors so that people will leave a digital trail wherever they go, even if they are not connected to the internet.
Most important, the value of data is increasing. Facebook and Google initially used the data they collected from users to target advertising better. But in recent years they have discovered that data can be turned into any number of artificial intelligence or “cognitive” services, some of which will generate new sources of revenue.
These services include translation, visual recognition and assessing someone’s personality by sifting through their writings — all of which can be sold to other firms to use in their own products.
Although signs of the data economy are everywhere, its shape is only now becoming clear. And it would look pretty familiar to JR Ewing. There are the data majors, a growing number of wildcatters and plenty of other firms trying to get a piece of the action. All are out to exploit a powerful economic engine called the “data-network effect” — using data to attract more users, who then generate more data, which helps to improve services, which attracts more users.
The majors pump from the most bountiful reservoirs. The more users write comments, “like” posts and otherwise engage with Facebook, for example, the more it learns about those users and the better targeted the ads on newsfeeds become. Similarly, the more people search on Google, the better its search results turn out.
Uber, for its part, is best known for its cheap taxi rides. But if the firm is worth an estimated $US68bn ($92bn), it is in part because it owns the biggest pool of data about supply (drivers) and demand (passengers) for personal transportation. Similarly, for most people Tesla is a maker of fancy electric cars. But its latest models collect mountains of data, which allow the firm to optimise its self-driving algorithms and update the software accordingly.
“Data-driven” start-ups are the wildcatters of the new economy: they prospect for digital oil, extract it and turn it into clever new services, from analysing X-rays and CAT scans to determining where to spray herbicide on a field. Nexar, an Israeli start-up, has devised a clever way to use drivers as data sources. Its app turns their smartphones into dashcams that tag footage of their travels via actions they normally perform. If many unexpectedly hit the brake at the same spot on the road, this signals a pothole or another obstacle. As compensation for using Nexar’s app, drivers get a free dashcam and services, such as a detailed report if they have an accident. The firm’s goal is to offer all sorts of services that help drivers avoid accidents — and for which they, or their insurers, will pay. One such is alerts about potholes or when a car around a blind corner suddenly stops.
As in oil markets, bigger data firms keep taking over smaller ones. But another aspect of the data economy would look strange to dealers in black gold. Oil is the world’s most traded commodity by value. Data, by contrast, is hardly traded at all, at least not for money. That is a far cry from what many had in mind when they talked about data as a “new asset class”, as the World Economic Forum, the Davos conference organiser cum think tank, did in a report published in 2011.
The data economy, that term suggests, will consist of thriving markets for bits and bytes. But as it stands, it is mostly a collection of independent silos.
This absence of markets is the result of the same factors that have given rise to firms. All sorts of “transaction costs” on markets — searching for information, negotiating deals, enforcing contracts — make it simpler and more efficient to bring these activities in-house. Likewise, it is often more profitable to generate and use data inside a company than to buy and sell them on an open market.
Their abundance notwithstanding, flows of data are not a commodity: each stream of information is different, in terms of timeliness, for example, or how complete it may be.
This lack of “fungibility”, in economic lingo, makes it difficult for buyers to find a specific set of data and to put a price on it: the value of each sort is hard to compare with other data. There is a disincentive to trade as each side will worry that it is getting the short end of the stick.
Researchers have only just begun to develop pricing methodologies, something consultancy Gartner calls “infonomics”. One of its pioneers, Jim Short of the University of California in San Diego, studies cases where a decision has been made about how much data is worth.
One such involves a subsidiary of Caesars Entertainment, a gambling group, that filed for bankruptcy in 2015. Its most valuable asset, at $US1bn, was determined to be the data it is said to hold on the 45 million customers who had joined the company’s customer loyalty program over the previous 17 years.
The pricing difficulty is an important reason one firm might find it simpler to buy another, even if it is mainly interested in data. This was the case in 2015 when IBM reportedly spent $US2bn on the Weather Company, to get its hands on mountains of weather data as well as the infrastructure to collect them.
Another fudge is barter deals: parts of Britain’s National Health Service and DeepMind, Alphabet’s AI division, have agreed to swap access to anonymous patient data for medical insights extracted from them.
The fact digital information, unlike oil, is also “non-rivalrous”, meaning that it can be copied and used by more than one person (or algorithm) at a time, creates further complications. It means that data can easily be used for other purposes than those agreed. And it adds to the confusion about who owns data (in the case of an autonomous car, it could be the carmaker, the supplier of the sensors, the passenger and, in time, if self-driving cars become self-owning ones, the vehicle itself).
“Trading data is tedious,” says Alexander Linden of Gartner. As a result, data deals are often bilateral and ad hoc. In the case of personal data, things are even trickier.
“A regulated national information market could allow personal information to be bought and sold, conferring on the seller the right to determine how much information is divulged,” Kenneth Laudon of New York University wrote in an influential article titled “Markets and Privacy” in 1996.
More recently, the WEF proposed the concept of a data bank account. A person’s data, it suggested, should “reside in an account where it would be controlled, managed, exchanged and accounted for”. The idea seems elegant, but neither a market nor data accounts have materialised yet. The problem is the opposite to that with corporate data: people give personal data away too readily in return for “free” services.
The terms of trade have become the norm almost by accident, says Glen Weyl, an economist at Microsoft Research. After the dotcom bubble burst in the early 2000s, firms badly needed a way to make money. Gathering data for targeted advertising was the quickest fix. Only recently have they realised that data could be turned into any number of AI services.
Whether this makes the trade of data for free services an unfair exchange largely depends on the source of the value of these services: the data or the algorithms that crunch them?
Data, argues Hal Varian, Google’s chief economist, exhibits “decreasing returns to scale”, meaning that each additional piece of data is somewhat less valuable and at some point collecting more does not add anything. What matters more, he says, is the quality of the algorithms that crunch the data and the talent a firm has hired to develop them. Google’s success “is about recipes, not ingredients”.
That may have been true in the early days of online search but seems wrong in the brave new world of AI. Algorithms increasingly are self-teaching — the more and the fresher data they are fed, the better. And marginal returns from data may actually go up as applications multiply, says Weyl. After a ride-hailing firm has collected enough data to offer one service — real-time traffic information, say — more data may not add much value. But if it keeps collecting data, at some point it may be able to offer more services, such as route planning.
Such debates, as well as the lack of a thriving trade in data, may be teething problems. It took decades for well-functioning markets for oil to emerge. Ironically, it was Standard Oil, the monopoly created by John D. Rockefeller in the late 19th century, that speeded things up: it helped create the technology and — the firm’s name was its program — the standards that made it possible for the new resource to be traded.
Markets have long existed for personal data that is of high value or easy to standardise.
Some young firms hope to give consumers more of a stake in their data. Citizenme allows users to pull all their online information together in one place and earn a small fee if they share it with brands. Datacoup, another start-up, is selling insights from personal data and passing on part of the proceeds to its users.
But consumers and online giants are already locked in an awkward embrace. People do not know how much their data is worth, nor do they really want to deal with the hassle of managing it, says Alessandro Acquisti of Carnegie Mellon University. But they are also showing symptoms of what is called “learned helplessness”: terms and conditions for services are often impenetrable and users have no choice than to accept them (smartphone apps quit immediately if one does not tap on “I agree”).
For their part, online firms have become dependent on the drug of free data: they have no interest in fundamentally changing the deal with their users. Paying for data and building expensive systems to track contributions would make data refiners much less profitable.
Data would not be the only important resource that is not widely traded; witness radio spectrum and water rights. But for data this is likely to create inefficiencies, argues Weyl. If digital information lacks a price, valuable data may never be generated. And if data remains stuck in silos, much value may never get extracted. The big data refineries have no monopoly on innovation; other firms may be better placed to find ways to exploit information.
The dearth of data markets will also make it more difficult to solve knotty policy problems. Three stand out: antitrust, privacy and social equality. The most pressing one, arguably, is antitrust — as was the case with oil. In 1911 America’s Supreme Court upheld a lower court ruling to break up Standard Oil, which then controlled about 90 per cent of oil refining in the country.
Some are already calling for a similar break-up of the likes of Google, including Jonathan Taplin of the University of Southern California. But such a radical remedy would not really solve the problem. A break-up would be highly disruptive and slow down innovation. It is likely that a Googlet or a Babyface would quickly become dominant again.
Yet calls for action are growing. The “super-platforms” wield too much power, says Ariel Ezrachi of the University of Oxford. With many more and fresher data than others, he argues, they can quickly detect competitive threats. Their deep pockets allow them to buy start-ups that could one day become rivals.
They can also manipulate the markets they host by, for example, having their algorithms quickly react so that competitors have no chance of gaining customers by lowering prices. “The invisible hand is becoming a digital one,” says Ezrachi.
At a minimum, trustbusters have to sharpen their tools for the digital age. The European Commission did not block the merger of Facebook and WhatsApp. It argued that although these were operating the two largest text-messaging services, there were plenty of others around and that the deal would also not add to Facebook’s data hoard because WhatsApp did not collect much information about its users.
But Facebook was buying a firm it feared might evolve into a serious rival. It had built an alternative “social graph”, the network of connections between friends, which is Facebook’s most valuable asset. During the approval process of the merger Facebook had pledged it would not merge the two user bases but started doing so last year, which has led the commission to threaten it with fines.
The frustration with Facebook helps explain why some countries in Europe have already started to upgrade competition laws. In Germany, legislation is winding through parliament that would allow the Federal Cartel Office to intervene in cases in which network effects and data assets play a role. The agency has already taken a special interest in the data economy. It has launched an investigation into whether Facebook is abusing its dominant position to impose certain privacy policies.
A good general rule for regulators is to be as inventive as the companies they keep an eye on. In a recent paper Ezrachi proposed that antitrust authorities should operate what they call “tacit collusion incubators”. To find out whether pricing algorithms manipulate markets or even collude, regulators should run simulations on their own computers.
Another idea is to promote alternatives to centralised piles of data. Governments could give away more of the data they collect, creating opportunities for smaller firms.
For some crucial classes of data, sharing may even need to be made mandatory. Ben Thompson, who publishes Stratechery, a newsletter, recently suggested that dominant social networks should be required to allow access to their social graphs. Instagram, a photo-sharing service that also has been swallowed by Facebook, got off the ground by having new users import the list of their followers from Twitter. “Social networks have long since made this impossible, making it that much more difficult for competitors to arise,” Thompson points out.
The EU’s new General Data Protection Regulation, which will start to apply in May next year, requires online services to make it easy for customers to transfer their information to other providers and even competitors.
But “data portability”, as well as data sharing, highlights the second policy problem: the tension between data markets and privacy. If personal data is traded or shared it is likelier to leak.
To reduce this risk, the GDPR strengthens people’s control over their data: it requires that firms get explicit consent for how they use data.
Such rules will be hard to enforce in a world in which streams of data are mixed and matched. And there is another tension between tighter data protection and more competition: not only have big companies greater means to comply with pricey privacy regulation, it also allows them to control data more tightly.
In time new technology, which goes beyond simple, easy-to-undo anonymisation, may ease such tensions. Bitmark, another start-up, uses the same “blockchain” technology behind bitcoin, a digital currency, to keep track of who has accessed data.
But legal innovation will be needed too, says Viktor Mayer-Schonberger of the University of Oxford. He and other data experts argue that not only the collection of data should be regulated but its use. Just as foodmakers are barred from using certain ingredients, online firms could be prohibited from using certain data or using it in such a way that could cause harm to an individual. This, he argues, would shift responsibility towards data collectors and data users who should be held accountable for how they manage data rather than relying on obtaining individual consent.
Such “use-based” regulation would be just as hard to police as the conventional rules of notice and consent that govern what data is collected and how it is used. It is also likely to worsen what some see as the third big challenge of the data economy in its present form: that some will benefit far more than others, both socially and geographically.
For personal data, at least, the present model seems barely sustainable. As data becomes more valuable and the data economy grows in importance, data refineries will make all the money. Those who generate the data may baulk at an unequal exchange that sees them getting only free services. The first to point this out was Jaron Lanier, who also works for Microsoft Research, in his book Who Owns the Future?, published in 2014.
Weyl, who collaborates with Lanier and is writing a book about renewing liberal economics with Eric Posner of the University of Chicago, advances another version of this argument: ultimately, AI services are not provided by algorithms but by the people who generate the raw material. “Data is labour,” says Weyl, who is working on a system to measure the value of individual data contributions to create a basis for a fairer exchange.
The problem, says Weyl, is getting people to understand that their data has value and that they are due some compensation. It will take even more convincing to get the “siren servers”, as Lanier calls the data giants, to change their ways, as they benefit handsomely from the status quo.