All Data Collection Is Burdensome

In the world of data collection, there’s no such thing as a free lunch.

Apr 12, 2021

I’ve been lazily assuming in this newsletter that ALL data collection is a burden on end-users, focusing mostly on the experience of providing data. In retrospect, this warrants more explanation. So here it is.

What are the Costs to the End User?

The costs to a user can be split into three distinct categories:

Direct costs
Disclosure risk
Privacy risk

Direct Costs

Direct costs are the most obvious. Time is a nearly ubiquitous cost. Any data that requires some action by a user takes time. Filling out a survey takes time. The more questions asked the more likely users will drop off. Looking up information that isn’t readily available also takes time.

Costs can be material as well. Anyone who has ordered an official transcript knows there’s usually a financial cost in addition to the trouble of navigating byzantine school websites.

It’s important to note that a lot of data collection doesn’t have direct costs to users. Reading this post in a browser or an app? It’s probably logging some information on you. This can occur in the physical world as well. Security cameras capture your image without any intervention and consent.

Disclosure risk

Any time data is captured, transferred, or duplicated, there’s a risk that it is disclosed to a third party. Often, an end-user will not want that party to have access to this information. If it’s an opinion on a survey, it’s likely not a huge deal. If it’s a social security number or health information, it’s a problem.

Organizations regularly sell or exchange data. This is the primary currency for a lot of free services and it can often go awry. A period tracking app called Flo sold data on its users to Facebook and Google.1 Even if users trusted Flo, it’s not hard to imagine people wanting to keep that information private from larger advertisers, especially if Facebook and Google repackaged the data and made it available through their vast advertising networks.

Data breaches are a fact of life and there’s always a risk of disclosure. It doesn’t matter how vast an organization’s security team is; Facebook recently leaked over 500M phone numbers.2 A day after the leak was announced I got a text message trying to phish for data.

The question I ask myself when providing data is akin to the so-called “sunlight test” in journalism: Would I be ok if these data were on the front page of a newspaper?3

Privacy risk

Even if data isn’t exposed to third parties, it could be used improperly by the. organizations collecting it. A famous example was when Target started mailing baby product coupons to a pregnant teen’s address before she had told her family about her pregnancy.4 Her parents found out about her health status from Target’s marketing department.

A more infamous example is when Uber allegedly tracked journalists using the ridesharing app to intimidate them.5 There wasn’t improper disclosure in these cases, but there was a misuse of data. Whenever there’s sensitive data stored - even it’s secure - it can be misused.

Minimizing burden

I’ve previously written about how not collecting data is the best way to minimize the burden. To reiterate, people should have to justify every data collection decision. Don’t collect data because it may become useful. Collect data because it’s essential.

When you have to collect data, here are a few more ideas of how to reduce disclosure and privacy risk. Please add more in the comments.

Reducing Risk of Disclosure and Privacy

Commit to not selling data or sharing without consent. Ideally, users should know upfront if selling data is part of your business model. And you should restrict what’s shared to the minimum possible.
Don’t store data in the first place! If you need data for some reason, it doesn’t mean you need to put it into long-term storage. Or if it could be stored without personal identifiers, which reduces the risk per the sunlight test.
If you do need to store it, automatically delete user data after it’s not useful.
Restrict data access as a default setting. The fewer the hands, the less potential for disclosure or misuse.

Responsible Data Practices as a Competitive Advantage

As a final thought, responsible data practices are increasingly a competitive advantage. Apple is betting big on privacy even at the expense of some of its more popular apps.6 Encrypted messaging services, like Signal, are becoming more popular. Governments are regulating data through laws like GDPR and CCPA. Blockchain technologies are being deployed in creative ways to protect user privacy and ensure ownership chains.7

The trend is clear. Compliance with evolving regulations isn’t nearly good enough. Getting ahead of the curve will be critical for sustaining trust with end-users. Balancing the sunshine test against the business value of the data is a great place to start the conservation around responsible data practices.

Data storage may be (almost) free. But data collection never will be.

https://www.theverge.com/2021/1/13/22229303/flo-period-tracking-app-privacy-health-data-facebook-google

https://www.wired.com/story/facebook-data-leak-500-million-users-phone-numbers/

https://ethics.org.au/ethics-explainer-the-sunlight-test/

https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/

https://www.theverge.com/2014/11/19/7245447/uber-allegedly-tracked-journalist-with-internal-tool-called-god-view

https://www.theverge.com/2020/12/16/22179721/apple-defends-upcoming-privacy-changes-standing-up-for-users-facebook-data

For a bizarre use, see: https://www.cryptokitties.co/

Data Better

Discussion about this post