The Inversion of Big Data

Big Data is the Big Thing at the moment.

It won’t be ten years from now. And not because we tackled the technical aspects of handling it, or created huge business intelligence systems capable of harvesting the treasures hidden in it. Or because it has become generally accepted and arrived at Gartner’s plateau of productivity.

It won’t be, because it is not scalable. Because the related privacy problems are insurmountable. Because its worth is directly related to the size of the harvesting company and in no way related to where it has been derived from and based on: the customers or individuals.

However, what worries me most, is that so few people voice fundamental concerns against this development. Some do. Just the other day (related to hospital data) there was a really insightful discussion on LinkedIn, unfortunately in Dutch.

Big Data is a monster, a bloated and sick by-product of technology and marketing and a distorted respect for the autonomy of big corporations. To be clear: in this article I will mainly focus on big data related to persons. I am not primarily concerned with big data derived from, say, particle generators or NASA telescopes.

Well. Not a very popular view I guess. And too highly political for my taste and (I guess) many others. Nevertheless.

Thing is, there seems to me a simple and acceptable solution for this issue.

I remember Stanley Kubrick who said that people are the owners of their own image. In the same vein: people should be the owners of their own “data”. Corporations should be prevented by law to collect data on people.

We have seen the problems related to medical data in The Netherlands, where an ambitious project, EPD (Electronic Patient Data) effectively failed because of the insurmountable issues around privacy and security.

What would be the problem if we “inverted” the problem? That is: why not store the data related to people and individuals with the people themselves? Why not effectively prohibit corporations to collect and store data about persons? What would be the issues resulting from implementing this strategy? And could those issues be solved without casualties on both sides?

I think it is possible. Even more: I think this is the only possible and scalable solution.

Let’s say you are Marie-Louise (if you are her, you know why I use your name :-)). You walk through a street with shops. For weeks you are thinking about buying those brown boots, but you are acutely aware of the precipitous state of your mortgage. So you have postponed buying those boots for a while.

The shops in the street are “aware” of your passing by. Being prohibited from “knowing” Marie-Louise’s purchase history, they nevertheless would be very interested in selling brown boots to her at a reduced price, if they would be able to close the sale. So how could the shop send a message to Marie-Louise that those boots, exactly the ones she is interested in (because of her recent search history on the web) are now available at a 20% discount in the shop 5 meters from her?

Simple. Marie-Louise herself knows about her interests. And her software alter ego, which she carries around with her all the time in the form of what I will call for the time being a “gem”:


This gem, with a data storage capacity of 500 TB, contains everything Marie-Louise has indicated as something she wants to “remember”, that is: her software alter ego. The shops she passes are allowed to query her gem, but not to store the results of those queries for longer than one hour. The store is informed that a potential customer is passing by, and the message is sent.

Even better: because the gem is aware of the priorities of Marie-Louise, it will inform the shop that Marie-Louise is interested in those boots, but only if the shop is willing to drop the price with 25% or more. That way Marie-Louises mortgage payments will not be affected. So Marie-Louise can safely read those messages which would otherwise be pesky irritations, because she knows the gem will protect her from those that are unwelcome. The fact that the shop has queried the gem could even be stored in the gem. I see no reason why corporations should have the same civil rights as a person.

Scalability solved. Privacy solved. Commercial interests protected.

Let’s do this.

ADDENDUM 1 (September 2014)

Encountering an article on Alex Pentland (interview with him in the Dutch newspaper Trouw from September 21, 2014) prompted me to append some links to the research done by this MIT Media Lab creator.

Alex Pentland is a strong advocate of people being the owner of their own data, especially in view of the “value” this data has in the economic interactions of this age. His “Social Physics” approach (published in Social Physics: How Good Ideas Spread-The Lessons from a New Science) includes research results that adds an important counterweight to the current consensus in neoclassical economics free market thinking. Many aspects of neoclassical economics have never been scientifically underpinned, partly because of the complex nature of economic interactions.

I always advocated using simulation techniques to contribute to a better understanding of financial and economic processes on a worldwide scale, but this has not been done structurally, especially not attempting to evaluate alternative scenarios.

When the book by Thomas Piketty Capital in the Twenty-First Century was published, I was hoping to find some research on alternative economic models (especially models attempting to create viable strategies for economies based on shrinking instead of growing, but I was disappointed.

Creating scenarios for those alternative models can only be done with complex simulation models that can be executed against a vast set of parameters.

