Data

Do we collect and store only necessary user information?

As data storage becomes less expensive, it may start to seem as though there is little reason not to collect and retain as much data as possible about your users. However, the apparent ease of accumulating masses of data can hide enormous costs due to user dissatisfaction, security breaches, time-consuming subpoena requests, and privacy and free speech firestorms.

  • Capture only the data you need for your service or that you are legally required to capture. AOL reportedly receives more than 1,000 subpoenas every month requesting information about its users. Other tech companies may face similar numbers of requests, although they do not reveal exact numbers. An efficient way to avoid these costs is to capture only the data you need for your service. Do you really need an individual's name, address, and phone number? Alternatively, could your company get by just as well with only one of these pieces of identifying information? Or none?
59% of adults in a 2008 study had refused to provide information to a business or company because they thought it was not necessary or too personal.
  • Store only necessary data. Even if you needed to capture identifying information in order to handle a specific transaction, there may be no need to retain it after the transaction is complete. Any data collected should be purged in its entirety after it is no longer necessary. Personally identifying information should rarely be retained for more than a few weeks.
Ask, Google, Microsoft, Yahoo!: Major search engines have started to recognize the importance of limiting data-retention periods for all data. Ask developed the AskEraser, allowing users to conduct online searches without the company logging any information. Microsoft deletes the full IP address, cookies, and any other identifiable user information from its logs after 18 months. Yahoo! is now planning to anonymize all search records after three months. Google now engages in a very limited form of log anonymization after nine months for those using the search engine and not logged into a Google account. After 18 months, the company deletes a portion of the stored IP address and de-identifies the cookie information stored in its logfiles.
Do we minimize the links between personal information and transactional data?

By minimizing the connections between personal information about users and data about the users' activities, companies may be able to achieve desired business goals such as optimizing performance or delivering targeted advertisements and services while cultivating user trust and insulating a company from voluminous legal demands and costly security breaches. Anonymization, aggregation, and similar techniques can help you extract value from your data while protecting your users' privacy.

68% of consumers in 2000 were "not at all comfortable" with companies that create profiles that link browsing and shopping habits to identity. The numbers spiked to 82% when profiles include income, driver's license numbers, credit data, or medical status.
  • Associate user records or personal information with transactional records only where necessary. Tying identifiable data, including IP addresses or account information, to transactional records invites privacy breaches and lawsuits. Evaluate aggregation and anonymization as tools to protect privacy while preserving the value of collected information.
YouTube: In 2008, YouTube was ordered to turn over records of every video watched by its users, including names and IP addresses, to Viacom, which was suing the company for copyright infringement. Since YouTube collected and maintained "deeply private information" linking individuals and their viewing habits, this information was available when Viacom came calling. Eventually, a compromise was reached and the data was anonymized before being turned over to Viacom. However, this close call resulted in extensive press coverage and outrage by YouTube users and privacy advocates.

AOL: In 2006, AOL and its Chief Technical Officer learned the hard way that users do not appreciate disclosure of their online search activities. The company thought that it had properly anonymized the data when it posted online the search records of 500,000 of its users for use by researchers. It was wrong. The private search habits of AOL users became public knowledge. AOL quickly pulled the dataset from its Web site, but not before the information had been mirrored on Web pages around the world and AOL's privacy breach was plastered on front pages around the globe. The incident led to the firing of the researchers involved with the database's release and the resignation of the company's Chief Technical Officer.