Abstract
Keywords
Introduction
Web tracking happens across platforms; it is the unseen and unauthorised extraction, storage, analysing, selling, buying and auctioning of personal online data appropriated by one or more remote online corporate actors. Apart from appearing in recent patent applications (e.g. Parreira, 2013), synonyms include ‘Internet tracking’ (Cumbley and Church, 2014) or ‘online tracking’ (Knight and Saxby, 2014). In the industry, the definitions offered inform the online user that she is followed through a network of sites while her online activities are recorded (e.g. Laredo Group Inc. n.d.; Office of the Privacy Commissioner of Canada, 2012).
Big social data and web tracking are intricately connected and their development appears mutually dependent. Without data storage capacities reaching far beyond the exabyte point (Kambatla et al., 2014), web tracking might be no more than an interesting application. But as the
All online entries of people who use social media or web search are extracted and stored in Big Data warehouses (Chen et al., 2014; Doctorow, 2008; Lu et al., 2014). Some of the data are used for behaviourally targeted advertising, some for real-time-bidding (Boston Consulting Group, 2012; Weide, 2011). These theoretically limitless personal data ‘teach’ algorithms to increase economic transactions with online agents (Lu et al., 2014; Shroff, 2013). Online agents have no possibilities to limit the extraction, storage and use of their personal data (Acar et al., 2014; Chen et al., 2014); on the contrary, digital footprints containing personal data are constantly growing, due to ongoing advances in big social data extractions, storage and management (Lanier, 2013).
Personal data include all online blogs, pictures, texts, Tweets, emails, videos and technical details attached. Data points contain metadata as well as unstructured and finely granular information, i.e. emotional expressions or affective exchanges (Dwoskin, 2014). Although the term personal data signifies individualised information in an electronic format, their personal nature does not shield them from commodification (Gandy, 1993). For example, people’s information about their leisure activities, social networking activities and Internet usage are bought and sold in the consumer data broker industry (Roderick, 2014: 732).
The scope of my analysis covers North America with illustrative empirical material from Canada. For practical purposes, the scope is much wider. Online US business practices set an international precedent on the Internet that seems difficult to reverse. What is more, most governments’ desires for people’s online information seem well served by the weak or non-existing policies for the treatment of personal data (El Akkad, 2014). 1 The discussion here revolves around web applications like search engines and social media, but they extend beyond this. Many online retailers for goods and services (from booksellers to online smut) increase profits through web tracking and the storage and sales of online customers’ personal data (Lanier, 2013; Wondracek et al., 2010).
As a working thesis, I propose that online users do not use the Internet to ‘donate’ personal data to unknown corporate entities. This may be contentious, as some people might enjoy being targeted by adverts to inform them about consumer goods that fit their personal consumer profile (Turow, 2012). A case can be made, though, for an inherent preference amongst individuals to control the extraction and distribution of personal data. People prefer to be the experts of their own situation (Benello, 1981; Bourdieu, 1990).
As an ideal typical social construct, I propose to view web applications like social media or web search as an online public sphere, a virtual space to advance, discuss, elaborate or search for new personal ideas. Theoretically, social media and web search can bring together a large number of people in a many-to-many discussion and offer myriad platforms to facilitate online exchanges of ideas for mutual benefits. A public sphere, as Habermas defines it, is ‘[…] a forum in which the private people come together to form a public [and] read[y] themselves to compel public authority to legitimate itself before public opinion’ (1989: 25f).
Of course, not just discursive but also deviant behaviour can thrive in places that seem to offer anonymity, for example, the so-called ‘trolling’ or ‘flaming’. But while online deviance is a galling reminder of a breakdown in communication, civility and maturity are not prerequisites for a public sphere. Therefore, deviant online communication does nothing to diminish the notion that social media and web search can serve as an online public sphere. The online public sphere serves as a counter position to the reductive notion of viewing the Internet as an online marketplace.
In theory, a public sphere like the Internet is open to all and non-exclusionary, but the way people make use of it varies in quantity and quality. For example, roughly 83% of the non-institutionalized Canadian population logs on to the Internet at least once a year (Statistics Canada, 2013). So it seems almost everyone is online. Now, although online social networks and web search continue to engage a growing number of users, with 93% of them accessing email, the latter continues to be the predominant reason for people to use the Internet (Korupp, 2006; Korupp et al., 2006; Peacock and Kühnemund, 2007; Statistics Canada, 2011, 2013). Likewise, worldwide diffusion rates differ vastly. In 2014, 40% of the world population uses the Internet, but individual country estimates range between 20% and 80% (International Telecommunication Union (ITU), 2014). 2 So the current focus on the use of web search and social media captures the situation of a subgroup of online users, although it is an increasingly important one (Minister of Industry, 2010: 14; Peacock, 2014; Statistics Canada, 2013). Web search and social media have opened up additional channels of social interaction including new online friendships, companionship and a fresh sense of belonging. Or so it seems.
A closer look at social media and web search sharply narrows one’s first impression of online diversity. Three of the five most visited online social platforms are owned by two large corporations, Google Inc. and Facebook Inc. The source on which my website ranking is based – alexa.com – is owned by another Internet giant, Amazon.com, that started as an online bookseller. Not accidentally, all these large online corporations are deeply involved in Big Data, more specifically, in big
I focus on two questions: firstly, what is the extent of self-regulation that an online user may expect from an online information industry involved in web-tracking technologies? Secondly, how does web tracking affect online user agency? Together with the quasi-monopolisation of social media, non-regulation in online information markets has profound impacts on well-known symptoms of market failure, discussed in the following section. The section thereafter includes a content analysis of the past and current public online subgroup discussions from the Internet Engineering Task Force (IETF) to illustrate the controversial introduction of HTTP cookie technology in the 1990s. I include descriptions of newer web-tracking technologies and how they overcome the defences employed by online users. In the final section, I summarise my most important results and discuss the uncomfortable agreement online agents currently enter into, whenever they use web search or social media.
Theoretical background: Market failure
Companies like Acxiom, Seisint, Datalogix or Epsilon collect, analyse and sell people’s personal financial data to profile credit worthiness or calculate people’s credit scores (for a critical discussion, see O’Harrow, 2005 or Roderick, 2014). They are highly profitable because accurate information on people’s net wealth is at the heart of economic risk reduction. Although personal data are of a more emotional or affective nature and only indirectly offer insights into people’s market position, they are now collected in large quantities to inform companies about possible future economic opportunities (Boston Consulting Group, 2012; Numan and DiDomenico, 2013). These future economic possibilities operate in the framework of economic risk reduction, too, because consumer products are determined by data mining results based on personal data. Of course, from a utilitarian view, a certain inconsistency still persists because of the sheer quantity of data extracted, but that seems resolved by current developments in Big Data. To exemplify a number of data points collected, Figure 1 shows what the approximately 31 million users of a well-known voice-over-Internet protocol (VoIP) service agree to when using their service.
3
Detailed personal data profile accessed by voice-over-IP service (‘Skype’). Image credits for gingerbread-man: veryicon.com.
Most of the data points stored are affective and individualised – online behaviour, messages, talk content, and the technologies owned and services accessed by the user to connect to VoIP or while her system is idling. Some of the data cover the details transmitted to third-party Internet service providers and other services the user contacts. In sum, most of the data extracted, stored, analysed or otherwise captured by Skype, – or rather Microsoft 4 – do not include financial information; they depict a unique private person during her private social interactions. 5
Whereas it once was the state that was interested in finding out what people were up to in their private homes, today it seems that the business of personal data extraction is firmly in the hands of publicly unaccountable corporations (Bamford, 1982; Denardis, 2014; Jussawalla and Cheah, 1987). Access, retention and analyses of all bits and bytes of personal electronic data are becoming increasingly sophisticated and helped along by current developments in Big Data (Acar et al., 2014; Krishnamurthy et al., 2007; Soltani et al., 2009; van Eijk et al., 2012). The built-in trade-off between offering online content and hoarding personal data, the ‘convenience versus privacy’ exchange, seems firmly resolved in favour of the former (Dumas, 2012: 217f). Industry representatives call this a self-regulated outcome (Dusseault, 2013: 5). It is the online user of social media and web search who bears the burden of their own data extraction, the lack of meaningful alternatives and very few regulations.
Most industrialised countries have a ‘marketplace solution’ for personal online data protection, with very little regulatory interference due to a dominant belief in a neoliberal economic paradigm (Denardis, 2014: 53ff). As current wisdom has it, the market is the best supplier of online goods and services because it allegedly follows mandates of efficient supply and demand. 6 But notable exceptions exist as to how well a market can distribute goods and services, and one of them is the distribution of information (Baker, 2002). Regulatory inactivity in information markets leads to market failure, partly due to the nature of the information production process (Gandy, 1993). One of the troubles with the current market solution is that information is neither measurable as a unit nor is it priced as an item (Babe, 1983). Of course, there is a price for information – where there is a buyer for people’s personal data there will be a price – but essentially it is the production of information that is valued at a price, not immaterial information itself (Babe, 1983).
Other problems are that information is intangible, non-exclusive and has important public good characteristics (Curran, 1991; Gandy, 1993). Production and dissemination cannot be weighed and information is easily carried around as well as effortlessly distributed. Theoretically, information is inexhaustible and if redistributed or sold, still remains with the original seller. These characteristics of nonexcludability and nonrivalry are important aspects of public goods (Baker, 2002; Gandy, 1993). The benefits of public goods to society go by the numbers, and websites on the World Wide Web might illustrate a case in point. Most of them are accessible without a price and as more people produce and share information virtually everybody becomes better off: Information distribution on the Internet is a good example of a Pareto improvement. 7
Equally problematic and often forgotten is the salience of content, what is
The information industry is very much aware of the often priceless qualitative contents of information. Whenever the invisible extraction of online personal data is detected in connection with an abuse of its essence, opposition forms, corporate statements are given and damage control enacted. 8 Despite this, the information industry continues to treat personal data as goods. As it stands, anonymous Internet users are not lucrative and personal data have a unit price. The mere existence of a price, though, is not the sign of a functioning market. Below, several symptoms are discussed that point towards current market failure.
Large online social media and web search companies are benefitting from their quasi-monopoly status, and time is on their side. Depending on the number of written entries users make on a social media platform, it might constitute quite a feat to migrate their personal data to a future competitor. It is highly questionable, too, whether their migration to a different platform leads to the removal of their personal data in the big database of the previous site. Consequently, the online user is the sole price taker who has few alternatives but to agree to her personal data extraction. 9 The price is set, there is no alternative. 10 The absence of the possibility to walk away from a market exchange indicates a dysfunction in the market.
What is more, there is no equitable allocation of negative effects between buyers and sellers. Some examples of negative effects produced by the current market-led approach are a loss of autonomy, expertise, integrity, developmental possibilities, personal growth, public debate, search for new information and the underuse of online capacities for personal exchange. Economists tend to use the generic term of
Another indication is the unethical use of regulatory loopholes to gain profitable advantages (Furubotn and Pejovich, 1972). Penalties and rewards matter in order to explain the current incentives for the tracking and storing of personal data. People as well as institutions choose between a set of ‘sanctioned behavioural relations’ when they extract personal data from online users to gain profitable advantages (Furubotn and Pejovich, 1972: 1131). Currently, incentives for transparent, limited and consensual personal data extractions are low, while profits for invisible web tracking and unlimited data storage are high, all the while costs for storage are decreasing.
Within a decade, corporate access to personal online data has morphed into an economic advantage. Influential economic interest groups are celebrating the increased access to personal data as the creation of a new asset class, indeed, as the ‘new oil’ (Boston Consulting Group, 2012; World Economic Forum, 2011). These current conclusions are too simplistic: corporate profit alone does not signify a functioning market. ‘More market’ seems a poor answer to the current conundrum because it may solidify market failure instead of enhancing efficiency (Furubotn and Pejovich, 1972: 1141). Little optimism remains that the online information industry can manage to self-regulate. Currently, there are more reasons to believe that the uninhibited corporate pursuit of online personal data will continue. Below, I discuss some of the institutional frameworks that facilitated today’s simultaneous data exchange and extraction on the Internet.
Technocratic fail
The commercial introduction of the Internet laid the groundwork in the early 1960s for today’s many-to-many communication. Originally, the Arpanet installation, the early 1960s predecessor of the Internet, did two things: it opened up secure communication channels that facilitated the collaboration of scientists who worked in far-flung places, and it increased the speed and efficiency of information exchanges and the connectivity (Dumas and Schwartz, 2009). To date, this is still what the Internet does best, but arguably not as securely as originally envisioned. With the introduction of browser cookies, secure online communication has deteriorated.
The history of the browser cookie is briefer than that of the Internet, although in its repercussions for online agency a lot more influential. Before the successful implementation of browser cookies, data exchanges between an Internet user’s computer and a remote server were anonymous. That changed in 1994, when Lou Montulli assembled a piece of code in hypertext transfer protocol language (HTML), usually called the HTTP cookie. It was initially labelled a
The persistent and invasive nature of Montulli’s invention was recognized immediately by other members of the IETF. An IETF subgroup was initiated to discuss the standards for this budding new technology. Following the most important discussion group threads, it is interesting to observe how engineers were swayed to embrace a piece of code with obvious security shortfalls and invasive protocol transfer properties. The invasive and risky properties of HTTP cookies are recognized but not central in group discussions. Lou Montulli is part of a loosely formed group led by David M Kristol, then head of Bell research laboratories. Regardless of the group’s formal openness, it seems the publication of the discussions on this central bit of technology is kept to a select, small, semi-private circle of interested individuals.
One of the initial papers authored by Kristol and Montulli, called the Working document of the IETF (expired January 1998). 
Despite some in-group opposition to the invasive and persistent nature of HTTP cookies, the aforementioned section in the Kristol and Montulli draft of 1997 was removed to ‘facilitate convergence’ a few weeks later. 13 What is more, the exclusion of user rights to remove and cap cookies turns into a key strategy in the first RFC published (RFC 2109) on the website of the IETF (Kristol and Montulli, 1997). 14 Neither the first nor the second RFC offers more than mere weak support for online agency (Kristol and Montulli, 1997, 2000). The second one matter-of-factly states: ‘Informed consent should guide the systems that use cookies’ (Kristol and Montulli, 2000: 19). No other references to user rights are incorporated.
The latest update to the
As a balancing act, the advisory role of the IETF is maintained, while their proposed – and widely ignored – Internet standards accommodate HTTP cookie technology. Because their standards are widely ignored anyway, one might argue that the impact of their accommodating institutional framework is minute. With perfect hindsight, a more pointed statement of resistance and a wider societal debate on this important topic might have been preferable.
Analytically, the
HTTP cookies are the most common tracking technology employed on the Internet, but more insidious technologies have been developed over the past years. In itself this development represents another current conundrum for the IETF: their work rapidly seems outdated as new intrusive tracking technologies replace the old ones. Currently, the development of enhanced functionality with new coding styles and script-based web pages spreads vulnerabilities that circumvent user controls (Acar et al., 2014; W3C, 2009). Online users are dealing with embedded objects called ‘supercookies’, ‘zombiecookies’, ‘uebercookies’ or ‘evercookies’ and these tags are no exaggeration. Circumventing supercookies is almost impossible, given that much of the web content includes videos requiring widely used applications like Adobe flashplayer with in-built backdoors. An online security company offers a stark description: And the next generation of supercookies – the Evercookie – takes tracking to even greater heights. The Evercookie can use a multitude of mechanisms to store user data in order to compile unique identities across domains. These mechanisms can include standard tracking cookies, supercookies, Silverlight-isolated storage, RGB values in PNG files, ETags and HTML 5 sessions. (Sheldon, 2013)
A good number of online users would have to look up most of these rarely used terms. ‘Evercookies’ are browser-independent and stored in folders not read by the users’ browsers; they continuously track online activity, are independent of the software used, and cannot be deleted (Narayanan, 2010).
New invasive web-tracking mechanisms include browser and canvas fingerprinting that appear to be spreading with no known user interventions to stop them (Acar et al., 2014; Mowery and Shacham, 2012; Munoz-Garcia et al., 2012). Nobody seems quite sure whether additional tracking techniques exist. Rather more certain, though, is that informed consent of online users is not sought, despite numerous do-not-track initiatives in North America and recent European online consent forms, presented to web users for the placement of HTTP cookies (Dusseault, 2013; Federal Trade Commission, 2012; Lo, 2009; Office of the Privacy Commissioner of Canada, 2012; van Eijk et al., 2012; Vincent, 2012; Whitman, 2004).
An appropriate starting point would be a user agreement that explains invasive technologies or extraction techniques to people who are affected (Pollach, 2007). Further suggestions include behavioural targeting without the use of tracking technologies, a voluntary participation in web tracking for a subgroup of users who choose to be tracked, a pay-per-track app, or the introduction of multiple online profiles with different web-tracking agreements (Eckersley, 2010; Lahlou, 2008; Leber, 2012).
While it is commendable that the IETF issues statements which decry the increasing vulnerabilities on people’s computers caused by web tracking, their emphatic institutional support for a repeal of invasive web-tracking technologies is still outstanding (see, for example, Yen et al., 2012). Its sister organisation, the Internet Society, promotes itself as ‘the world's trusted independent source of leadership for Internet policy, technology standards, and future development’ (see internetsociety.org), but neither has issued a public statement on the significance of commercially extracted personal data to Internet users.
What has been publicly issued, in a collaborative effort by more than 40 international privacy and security experts, is an extended final version of the ‘International Principles on the Application of Human Rights to Communications Surveillance’.
16
However, commercial web tracking is not mentioned while corporate responsibility is referenced only once: Business enterprises bear responsibility for respecting individual privacy and other human rights, particularly given the key role they play in designing, developing, and disseminating technologies; enabling and providing communications; and in facilitating certain State surveillance activities. Nevertheless, these principles articulate the duties and obligations of States when engaging in Communications Surveillance.
The above quote underlines that apparently not only the IETF is mired in what I would describe as ‘passive abstentionism' (Gordon, 1991: 17). Clearly, state surveillance laws affect the well-being of their citizens, but a relentless submission of online users to their forced data extraction by unseen corporate actors ought to attract equal attention. In the last section, I will show how market failure and technocratic failure explain and solidify the murky situation of online users when they use web search or social media.
Conclusion
Companies that use big social data in combination with web-tracking technologies store personal data on an unprecedented scale. In an unregulated information market and a
Online corporations operating in a dysfunctional information market do not self-regulate because it puts them at an economic disadvantage, as has become sufficiently clear. Information property rights researchers sometimes advocate for micro payments to reimburse people for the use of their personal data (e.g. Furubotn and Pejovich, 1972; Lanier, 2013; Leber, 2012). This approach misses the point. It fails to understand that ‘[…] preserving information is changing from a technological puzzle into a moral dilemma’ (Aiden and Baptiste, 2013: 203). Market liberalization cannot solve this dilemma.
The current unequal online exchange is carried out neither between two fully informed agents nor does it improve the online public sphere. It may be covered, though, by another term that is far less flattering than that of a trade-off: It has the characteristics of an
Current incentives set by non-regulation nudge corporate actors to engage in more intense web tracking. All the while, Big Data storage capacities are growing and becoming less expensive. What is more, the current under-regulation of online personal data extractions is beneficial for governmental agencies. Numerous incidences exist where, in the past, close connections between the government and information industries were deemed useful (Hedrick, 1991). So, hedged with some caveats, the current wilful political neglect to limit personal data hoarding may be linked to a governmental reliance on increased commercial efforts to extract and store personal data. Any way we look at it, a strong case can be made to direct the scope of the research away from state collectors of personal data towards the unregulated collection of personal data by corporate actors.
Further research ought to look at who is the most heavily targeted population segment that is tracked around the web. Also of interest is the question whether web-tracking companies may be free-riding on the Internet commons, extracting more than their fair share of the available profits. Furthermore, the next emerging path dependency in advancing technology seems to be a convergence on storage instead of computing speed (Shroff, 2013). Part of this trend might be connected to web tracking, indicating a shift away from progressive towards more regressive technological developments.
