How Big Data was created, or why we have neither privacy nor transparency.

Why are we currently in a situation where privacy and lack of transparency have become central legal issues? Obviously, it is because of rapid technological development, but it is perhaps useful for our discussion about transparency, privacy, and profiling to dig a bit deeper.

By understanding a bit more about how technology has changed our world so radically in recent years we shall therefore briefly review two major technological circumstances that made this transformation possible:

One is about computer hardware development and the other is about the development of computer software that enables many computers to work as one.

But, to understand Big Data, and these two technological milestones, we must talk about Moore’s Law, created by Gordon Moore, Intel’s co-founder, and is better known for a prediction he made in 1965 in an article he wrote for Electronics magazine with the title ‘Cramming More Components onto Integrated Circuits’. He stated:

“The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly, over the short term, this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.”

The prediction was quoted as 18 months as the doubling period for general computing power. Moore’s law, in part, explains the sustained exponential growth in the Big Data era.

It implies ever-expanding huge numbers and is explained by Ray Kurzweil in his book “The age of spiritual machines: When computers exceed human intelligence” through the story of the inventor of chess in India.

When the inventor of chess presented his game to the emperor, the Emperor was very impressed by the game, and he asked the inventor to ask for any reward he wanted.

The inventor only wanted rice to feed his family and he used the chessboard to show the amount of rice he would like. He put one grain of rice in the first square of the chessboard, two in the second square, four in the third one, eight in the fourth square, and repeated this process until the last square of the chessboard was filled with rice grains.

On the first part of the chessboard, the human brain can imagine the number of rice grains, but on the latter part of the chessboard, the numbers become too big to imagine: trillions, quadrillions and quintillions.  

When this action is repeated until the last square of the chessboard, more than a quintillion grains of rice is obtained. It is more rice that has been produced in the history of the world.

Moore’s law was formulated in 1965 and close to 18 months was predicted as the doubling time for transistors in use.

No hay texto alternativo para esta imagen

Moore’s law shows exponential growth in transistors as a doubling approximately every 18 months, and how the price of transistors is falling every 18 months. See Hutcheson, D. (2015) ‘Graphic: transistor production has reached astronomical scales’, available at: https://spectrum.ieee.org/computing/ hardware/transistor-production-has-reached-astro- nomical-scales (accessed 12th December, 2018)). Resource: VLSI Research.

Moore’s law shows exponential growth in transistors as a doubling approximately every 18 months, and how the price of transistors is falling every 18 months. See Hutcheson, D. (2015) ‘Graphic: transistor production has reached astronomical scales’, available at: https://spectrum.ieee.org/computing/ hardware/transistor-production-has-reached-astro- nomical-scales (accessed 12th December, 2018)). Resource: VLSI Research.

After 32 of these doublings, since 1965, we are now on the second half of the chessboard. From this point on, we were able to digitize almost everything and the immense numbers of computers enabled us to store all of these new data.

But, there was a challenge: how could we access and manipulate data stored across many different computers? We needed ‘the cloud’.

The era of big data computing started in 2007, when it became widely possible to ‘upload data to the cloud’ because effective shared memory software became available so that thousands of computers could work as one. 

In 2003, Google published a paper that included a basic innovation called the Google File System (GFS). This software allowed Google to access and manage a huge amount of data from thousands of computers.

At this time, Google’s main goal was to organize all the world’s information through its search engine. However, they were not able to do that without their second basic innovation, MapReduce, which was published in 2004.

These two innovations allowed Google to process and explore a huge quantity of data in a manageable way.

Google shared these two basic innovations with the Open Source community so that the community could build on their insights. Even better, the community was able to improve the software and as a result, Hadoop was created in 2006.

Hadoop is an open source software that allows hundreds of thousands of computers to work like one giant computer. Hadoop developers: Mike Cafarella y Doug Cutting.

No hay texto alternativo para esta imagen

Facebook, LinkedIn, and Twitter already existed in 2006, and they started building on Hadoop straightaway. This is the reason why these platforms companies became global in 2007. 

With Hadoop, easily accessible storage capacity for computing exploded making ‘big data’ available for all. Thanks to Hadoop, internet platforms could store all their data across many computers while still having access to their data. Furthermore, they could store every click of every user on every web page. This gave them a much better understanding of what users were doing over time, thus providing the basis for Big Data Analytics.

Thanks to Hadoop, other companies were born in 2007, including Airbnb. Amazon also launched Kindle and the first iPhone was released. According to AT&T, mobile data traffic on its national wireless network increased by more than 100,000% from January 2007 to December 2014.

The year 2007 was a crucial point in the global economy. This paved the way for the emergence of a new category of companies that reshaped how people and machines communicate, create, collaborate and think.

Since 2007, the big internet platform companies, through Big Data Analytics, have had the opportunity to store all their data in one place and thus have a greater in-depth knowledge of the market than traditional companies.

Furthermore, more customers on one platform mean better service (eg. social media or Airbnb), which favors larger platforms that over a few years can act as monopolies owing to their market dominance. 

The main consequence for users was the benefit of a number of new services but at the same time a total loss of control over their personal data and the possibility of being analyzed and profiled thanks to big data analytics and machine learning algorithms.

This means that company decisions started to be made via automated decision- making processes through profiling of individuals and groups. In some cases, this was for advertising purposes; in other cases, these automated decisions could become life-changing owing to biased results and discrimination.

In other cases, however, owing to lack of privacy and through market domination, these automated decision- making processes can distort fair markets (tailoring prices) as well as fair elections (Cambridge Analytics case).

There are many examples of assessments being made based on online automated decision-making processes. For example, Facebook can predict your political views, your race, religion, and sexual orientation, and even predict when you are going to die. Facebook can predict individual future behaviors, allowing third parties to target these individuals with advertisements that can change their decisions entirely. Facebook calls it ‘improved marketing efficiency’.

Another example is given by Amazon’s ‘Alexa Hunches’ feature and its capacity to predict future needs based on a user’s behavior to make suggestions, and furthermore, predict a user’s health status through analyzing voice and coughing, which is followed by sending advertisements for sore throat products. Insurance companies are also collecting data from social networks to predict how much users’ healthcare could cost them.

The central question is if a data subject has real control over her personal data and whether the General Data Protection Regulation (GDPR) protects a data subject from their inherent risks.

Bad news: Even if profiles are personal data, it does not mean individuals have full access to them and is not possible to rectify any of the assessments.

Data subject rights to transparency are described in Articles 13–15 GDPR. The right to be notified (Articles 13-14 GDPR), is a data controller’s duty and covers data provided directly by the data subject, observed data, and data from a third party. Also, the right to access (Article 15 GDPR) has to be appealed for by the data subject. 

Does article 15 GDPR allow individuals to their profiles? The phrase ‘envisaged consequences’ suggests that the data controller has to give an explanation to the data subject about the consequences of the automated decision- making before the processing of the data. And with a lack of an explicit deadline for appealing, the right of access is limited to explanations of systems functionalities. This is an ex-ante explanation.

We can access only to input data, no output data (profiles). If we look to the diagram below, we only can access until the point C).

No hay texto alternativo para esta imagen

From point C) until point F) companies are protected by other legislation, like Trade Secret Directive, and they are allowed to be opaque. If they decide to be Transparent on these points it is because of an ethical decision.

Leave a Reply

Your email address will not be published. Required fields are marked *