CL Simplex

Ethics: Metadata

Ethics: Metadata

Metadata is a word that has recently become widely known. Data about other data - this is the essence of metadata. “Sally went to the store” is the data, but the log was created at 5:30 pm PST. Things like “when did this happen,” or “where did this happen” are examples of metadata. Metadata and ethics meet when we talk about real people, how metadata is applied, and the industry’s approach to maintaining its reputation and trust.

Data is Cheap to Store

Previously, scientists and statisticians had to interpolate, or extrapolate with sparse data. Data was originally difficult to collect (we didn’t have the social networks we do now), and expense to store [1]. Contrast that today where CERN’s Large Hadron Collider produces 30 petabytes annually [2]. In addition to data production, CERN boasts exceptional data storage and processing. Including CERN, data centers were the things of only the largest organizations. Now with cloud IT providers (we use one for our hosting), a supercomputer is a click away (point being - immense computing is very accessible and relatively inexpensive.) It is easier than ever for people to have access to world class data facilities.

People are Predictable

Believe it or not, but people are creatures of habit. That habit of travelling somewhere every weekday for several hours is how OK Google knows where you work. Everyone has to eat. Everyone has to sleep. Working on the knowledge that people fall into patterns, it becomes easy to extract powerful insights even with anonymized datasets. These datasets are worth billions (that’s how Facebook makes its money - minus the anonymized part) and are also part of that whole “big data” thing you’ve heard about.

Applications of Metadata

Metadata is most commonly used to sell advertising. Google’s monstrous trove of metadata fuel youtube ads, adsense, google adwords, funding the various data products consumers enjoy for “free.” Metadata is used for national intelligence, and law enforcement purposes. Metadata is used to show you “what other people bought as well” on Amazon. Metadata is important for public health care. Metadata is why credit card companies are so good at detecting fraud.

Ramifications of Metadata

Unfortunately, processing metadata is not a dispassionate process. Despite what consultants wish, big data is statistics. These statistics are only as good as the conclusion people draw from the data. What big data attempts to provide is overwhelming evidence to support a given hypothesis. A hypothesis, goal, or metric has to come from someone in the first place - so confirmation bias, selection bias, and all the problems statistics has traditionally wrestled does not go away. New numbers, goals, and metrics often lead to policy changes which is a delicate process.

Processing metadata on a human level can be difficult - we are not wired to understand it very well. Take the law enforcement example - a purple person with red shoes is on the run. To eliminate false-positives, one should eliminate everyone not wearing red shoes from the search pool and look for additional criteria. Unfortunately, people are predictable and lazy - so the search pool becomes “look for purple people.” While a racially sensitive example isn’t our ideal topic here - an everyday example of the ramifications of how people handle metadata is our ideal topic.

The Tech Industry and Metadata

The tech industry is in a unique position to be in possession of a lot of metadata or even in possession of primary data from organizations or consumers (passwords, sales data, etc.) Every website can host a vast wealth of information (Google Analytics for websites is free) to anyone even naively recording visits. Unfortunately, the industry seems to have a rather laissez faire attitude with respect to privacy, and protecting consumer data. There’s too much money to be made - and often there is a “nothing bad will happen” attitude. It’s someone else’s information after all. Privacy/security and ease-of-use seem to be on somewhat of a continuum, and ease-of-use generally gets the nod.

Metadata and Ethics

Metadata in academic pursuits is subject to ethical scrutiny. That scrutiny disappears in the social media world, especially when people give it willingly. We live in a world where “if the service is free, you’re the product.” In the end, it comes down to how people handle their metadata. For us, we handle our client’s data as if it were our own - the burden of trust is not lost on us. Trust plays a large part in the ethical considerations, and unfortunately we feel the industry has a lot of work to do in order to restore trust to the public while still earning revenue off of their products.

[1] http://www.jcmit.com/mem2015.htm

[2] http://home.cern/about/computing

Navigation

Tap or click on these posts to navigate to the next or previous posts.

Post Series

This post is part of a larger series. Tap or click on a post to view more in this series.